subject:"\[PATCH\] Remove process freezer from suspend to RAM pathway"

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-12 Thread Miklos Szeredi

> > Ok, I'll just blame fuse here. 'You have to write to /sys
> > files for SIGKILL to work' is not funny.
> 
> SIGKILL won't work on a stopped task.  Neither on a traced task.
> Neither on a zombie (how many newbies are thoroughly confused about
> that ;)
> 
> And it won't work on a task that is hanging on an NFS mount, when the
> network broke.
> 
> Oh, and it won't work on a deadlocked fuse filesystem.  How many of
> these have you met?  How many of the others?

And maybe someday, when I'm feeling _really_ bored, I may start
hacking the VFS to make it possible for tasks to be killed while
waiting on a mutex.  Can a task doing a page fault be interrupted?  I
don't see why not.

But, it's _not_ going to happen now, to fix the problems with suspend.
Hacking fuse (however broken it may be) or the VFS is not the right
way to fix those problems.  It may be easier than fixing the drivers,
though I doubt it.  But even if it's the easier way it's not the
_right_ way.

We want the various subsystems in the kernel to be less dependent on
each other not more.  And the freezer is just adding unnecessary
dependence on totally unrelated code.  And that does not only impact
fuse, but a lot of other code in the kernel, in the form of
try_to_freeze() calls all over the place.

Is that so hard to understand?

You can bash fuse all you want, I really don't mind.  But please stop
trying to prove, that this situation is best handled in fuse.  It just
won't happen, not because I'm lazy to do the work but because I don't
have the guts to do something so obviously wrong.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-12 Thread Miklos Szeredi

> > Actually there's also a non-malicious case in which waiting for
> > requests to finish won't work: when one fuse filesystem is accessing
> > another.
> > 
> > Since we are blocking new fuse requests, that might block a fuse
> > daemon, which in turn makes it impossible to finish the pending
> > request.
> 
> Hmm, yes, this is ugly. But will it hurt? We'll just wait for number
> of active fuse requests to go down to 0 before proceeding.

I'm thinking of the following situation, immediately after we begin to
wait for fuse requests to subside:

  - task A is performing file operation O1
  - O1 is being served by task B
  - task B needs to perform O2 before finishing O1

Since we don't allow new requests, O2 will never start.  So O1 will
never finish.  So the number of fuse requests will never go to 0.

> It may still livelock on system with busy fused-s, but we'll detect
> that with timeout, and I believe it is 'don't do that' situation.

There are no busy fused's.  If we allowed O2 to go through, everything
would be nice and dandy.  But we cannot know that O2 needs to be let
through, and other fuse requests must not.

> > And this is not at all theoretical, I know that encfs is used over
> > various other fuse filesystems like sshfs or ntfs-3g.
> > 
> > Yeah, stacking userspace filesystems could be done entirely in
> > userspace, and I'm actually working on that (with fuse-2.7.0 already
> > supporting some basic stacking).
> 
> Great.
> 
> > But the point is, that the "wait for fuse to quiescence" hack would
> > not work in todays environment.
> 
> Ok, I'll just blame fuse here. 'You have to write to /sys
> files for SIGKILL to work' is not funny.

SIGKILL won't work on a stopped task.  Neither on a traced task.
Neither on a zombie (how many newbies are thoroughly confused about
that ;)

And it won't work on a task that is hanging on an NFS mount, when the
network broke.

Oh, and it won't work on a deadlocked fuse filesystem.  How many of
these have you met?  How many of the others?

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sysrq-t dumps of s2ram/fuse deadlock (was Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-12 Thread Pavel Machek

On Wed 2007-07-11 14:28:20, Matthew Garrett wrote:
> On Mon, Jul 09, 2007 at 01:29:05PM +, Pavel Machek wrote:
> > Hi!
> > 
> > Can we get them? They are neccessary for debugging 'what in suspend
> > calls fuse' problem. And yes, that problem is there even when you
> > remove freezer.
> 
> I can produce them, but haven't managed to do that in any way that lets 
> me get them off the system yet.
> 
> > Matthew, you seem to be the only one able to produce them...
> 
> sys_sync may not do anything on a fuse filesystem, but if you've 
> loopback-mounted an ext3 filesystem from that fuse filesystem then it's 
> going to result in writes to it and deadlock.

Aha, yes, that explains the problem. Thanks!
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-12 Thread Pavel Machek

Hi!

> > So you want me to handle _malicious_ filesystems now?
> > 
> > That should be easy... :-). You already have nasty deadlocks in FUSE,
> > and you solve them by "root can echo 1 > abort"... so allow me the
> > same possibility.
> > 
> > We can tell fused we are freezing, and if all the requests are not
> > serviced within, say, 30 seconds, we call the filesystem malicious and
> > do echo 1 > abort.
> > 
> > Not ideal, but neither is allowing malicious filesystems in the first
> > place...
> 
> Actually there's also a non-malicious case in which waiting for
> requests to finish won't work: when one fuse filesystem is accessing
> another.
> 
> Since we are blocking new fuse requests, that might block a fuse
> daemon, which in turn makes it impossible to finish the pending
> request.

Hmm, yes, this is ugly. But will it hurt? We'll just wait for number
of active fuse requests to go down to 0 before proceeding. It may
still livelock on system with busy fused-s, but we'll detect that with
timeout, and I believe it is 'don't do that' situation.

> And this is not at all theoretical, I know that encfs is used over
> various other fuse filesystems like sshfs or ntfs-3g.
> 
> Yeah, stacking userspace filesystems could be done entirely in
> userspace, and I'm actually working on that (with fuse-2.7.0 already
> supporting some basic stacking).

Great.

> But the point is, that the "wait for fuse to quiescence" hack would
> not work in todays environment.

Ok, I'll just blame fuse here. 'You have to write to /sys
files for SIGKILL to work' is not funny.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-12 Thread Pavel Machek

Hi!

> > They will not trigger 100% of the time, but sporadically and generally at
> > random.
> > 
> > At least the freezer problems are reproducible. ;-)
> 
> Our experience with powermacs has been that it isn't actually all that
> hard to get it right for the drivers you care about.

So... will you fix input and usb? Their maintainers seem to prefer
simplicity of frozen system.

Also, can you write down rules for drivers to follow?

Like... 'if you get a request after system suspend started, you may
not kmalloc(GFP_KERNEL)'

 - because disk driver may be already suspended

- hmm, what happens if disk driver _is_ suspended? Let's say I'm v4l
driver on cpu1. Someone may talk to me _while_ disk driver is being
suspended... and if I'm after disk driver in pci order, I may only
learn about suspend _after_ I do the wrong kmalloc.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sysrq-t dumps of s2ram/fuse deadlock (was Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-11 Thread Matthew Garrett

On Mon, Jul 09, 2007 at 01:29:05PM +, Pavel Machek wrote:
> Hi!
> 
> Can we get them? They are neccessary for debugging 'what in suspend
> calls fuse' problem. And yes, that problem is there even when you
> remove freezer.

I can produce them, but haven't managed to do that in any way that lets 
me get them off the system yet.

> Matthew, you seem to be the only one able to produce them...

sys_sync may not do anything on a fuse filesystem, but if you've 
loopback-mounted an ext3 filesystem from that fuse filesystem then it's 
going to result in writes to it and deadlock.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

sysrq-t dumps of s2ram/fuse deadlock (was Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-11 Thread Pavel Machek

Hi!

Can we get them? They are neccessary for debugging 'what in suspend
calls fuse' problem. And yes, that problem is there even when you
remove freezer.

Matthew, you seem to be the only one able to produce them...
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Kyle Moffett


Thanks for the detailed reply!

On Jul 09, 2007, at 22:07:15, Nigel Cunningham wrote:

On Friday 06 July 2007 15:01:48 Kyle Moffett wrote:

Suppose hibernate is implemented like this:

(1) Userspace program calls sys_freeze_processes()
   (a) Pokes all CPUs with IPMIs and tells them to finish the  
currently running timeslot then stop
   (b) Atomically sends SIGSTOP to all userspace processes in a  
non-trappable way, except the calling process and any process  
which is ptracing it.

   (c) Returns to the calling process.


Ok. First, I'll ignore the specification that userspace does this -  
I don't think it matters whether it's userspace or kernel that does  
the suspending and I'm yet to see a good reason for it to be  
[required to be] done from userspace.


The reason it's _required_ to be done from userspace is that  
userspace is the only one which can figure out "These processes need  
to run for suspend to work", and then let those processes continue  
running after the freeze.  The *ONLY* reason this even stops  
processes at all is so we can do the post-device-mapper-snapshot code  
with very little usably-free RAM (IE: only about 1MB for a standard  
desktop system).



In this first step, you've reinvented the first part of the current  
freezer implementation. The reason we don't use a real signal is  
precisely so we can have an untrappable SIGSTOP. In this regard, I  
particularly remember Win4Lin from a few years ago. It would die if  
you sent it a real signal, so we had to do it this way. No doubt  
there are other instances I'm not aware of.


Well, you *do* want it to have semi-signal semantics, processes which  
receive it must not get back to userspace code so that they don't  
start allocating more memory when we're trying to do the freeze.  You  
also don't want a process to be able to trap it (IE: like SIGSTOP or  
SIGKILL).


On the other hand, it should be delivered asynchronously (IE: It  
doesn't break an interruptable sleep or respond to most is-a-signal- 
present checks).  You don't actually care if its sleeping in the  
kernel somewhere, just as long as it doesn't allocate much memory.


You would probably need a new signal "SIGFREEZE" which causes the  
process to be ignored as runnable the next time they schedule but  
never actually gets delivered, and a "SIGUNFREEZE" which does the  
reverse.  That way userspace could selectively resume processes based  
on its policy of "this needs to run for hibernation".



(2) Userspace process sends SIGCONT to only those processes which  
are necessary for sync and a device-mapper snapshot.


How do you determine which ones are needed?


It's userspace's job to know which ones are needed.  For example, if  
you are hibernating over NFS then you need to resume the various NFS/ 
RPC daemons and threads.



Why stop them in the first place?


So they aren't allocating memory when we are doing the device-mapper  
snapshot.




(3) Userspace calls sys_snapshot_kernel(snapshot_overhead_pages)
   (a) Kernel starts freeing memory and swapping stuff out to make  
room for a copy of *kernel* memory (not pagecache, not process  
RAM).   It does the same for at least snapshot_overhead_pages  
extra (used by userspace later).  It then allocates this memory to  
keep it from going away.  Since most processes are stopped we  
won't have much else competing with us for the RAM.


Ok. So now you also need processes running that are needed for  
swapping, because freeing that memory might involve swapping. Fully  
agree with the logic though (not really surprising - this is what I  
do in Suspend2^wTuxOnIce).


   (a) Kernel uses the device-mapper up-call-into-filesystem  
machinery to get all mounted filesystems synced and ready for a DM  
snapshot.  This may include sending data via the userspace  
processes resumed in (2).  Any deadlocks here are userspace's  
fault (see (2)).  Will need some modification to handle doing  
multiple blockdevs at a time.  Anything using FUSE is basically  
perma-synced anyways (no dep-handling needed), and anything using  
loop should already be handled by DM.  This includes allocating  
memory for the basic snapshot

datastructures.
   (b) At this point all blockdev operations should be halted and  
disk caches flushed; that's all we care about.
   (c) Go through the device tree and quiesce DMA and shut off  
interrupts.  Since all the disks are synced this is easy.
   (d) Use IPMIs again to get all the CPUs together, which should  
be easy as most processes are sleeping in IO or SIGSTOPed, and  
we're getting no interrupts.
   (e) One CPU turns off all interrupts on itself and takes an  
atomic snapshot of kernel memory into the previously allocated  
storage.  Once again, does not include pagecache.  The kernel also  
records a list of what pages *are* included in the pagecache.  It  
then marks all userspace pages as copy-on-write.


Hotplugging cpus (when all those locking issues are taken care of)  
is simp

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Nigel Cunningham

Hi.

Sorry for the long delay. Busy weekend and my motivation for working on 
programming is almost zero at the moment...

On Friday 06 July 2007 15:01:48 Kyle Moffett wrote:
> On Jul 06, 2007, at 00:03:15, Nigel Cunningham wrote:
> > On Friday 06 July 2007 13:54:15 Benjamin Herrenschmidt wrote:
> >> On Fri, 2007-07-06 at 09:35 +1000, Nigel Cunningham wrote:
> >>>
> >>> Nice try :) Okay then, you remove the freezer, try hibernating,  
> >>> then get back to me after you've fixed your filesystem because  
> >>> some process that wasn't frozen started writing things after the  
> >>> atomic copy (making the on disk filesystem inconsistent with the  
> >>> snapshot).
> >>>
> >>> As Pavel rightly said, you can get rid of the freezer, but you're  
> >>> only going to have to implement another one that does the  
> >>> essentially the same thing, even if it is at some other level.
> >>
> >> I was mostly talking about STR... Regarding STD, we have a  
> >> different problem and we all know it. The freezer is one somewhat  
> >> horrible way to get it working for now, I would prefer something  
> >> more along the way that blocks the page cache from writing out new  
> >> dirty pages though, except those specifically flagged by the  
> >> snapshot.
> >>
> >> That is, some kind of proper snapshotting facility, as linus was  
> >> describing some time ago.
> >
> > The kind of thing Linus was talking about would limit you (as  
> > swsusp and uswsusp do now) to only half the amount of memory.
> 
> How so?  Suppose hibernate is implemented like this:
> 
> (1) Userspace program calls sys_freeze_processes()
>(a) Pokes all CPUs with IPMIs and tells them to finish the  
> currently running timeslot then stop
>(b) Atomically sends SIGSTOP to all userspace processes in a non- 
> trappable way, except the calling process and any process which is  
> ptracing it.
>(c) Returns to the calling process.

Ok. First, I'll ignore the specification that userspace does this - I don't 
think it matters whether it's userspace or kernel that does the suspending 
and I'm yet to see a good reason for it to be [required to be] done from 
userspace.

In this first step, you've reinvented the first part of the current freezer 
implementation. The reason we don't use a real signal is precisely so we can 
have an untrappable SIGSTOP. In this regard, I particularly remember Win4Lin 
from a few years ago. It would die if you sent it a real signal, so we had to 
do it this way. No doubt there are other instances I'm not aware of.
 
> (2) Userspace process sends SIGCONT to only those processes which are  
> necessary for sync and a device-mapper snapshot.

How do you determine which ones are needed? Why stop them in the first place?
 
> (3) Userspace calls sys_snapshot_kernel(snapshot_overhead_pages)
>(a) Kernel starts freeing memory and swapping stuff out to make  
> room for a copy of *kernel* memory (not pagecache, not process RAM).   
> It does the same for at least snapshot_overhead_pages extra (used by  
> userspace later).  It then allocates this memory to keep it from  
> going away.  Since most processes are stopped we won't have much else  
> competing with us for the RAM.

Ok. So now you also need processes running that are needed for swapping, 
because freeing that memory might involve swapping. Fully agree with the 
logic though (not really surprising - this is what I do in 
Suspend2^wTuxOnIce).

>(a) Kernel uses the device-mapper up-call-into-filesystem  
> machinery to get all mounted filesystems synced and ready for a DM  
> snapshot.  This may include sending data via the userspace processes  
> resumed in (2).  Any deadlocks here are userspace's fault (see (2)).   
> Will need some modification to handle doing multiple blockdevs at a  
> time.  Anything using FUSE is basically perma-synced anyways (no dep- 
> handling needed), and anything using loop should already be handled  
> by DM.  This includes allocating memory for the basic snapshot  
> datastructures.
>(b) At this point all blockdev operations should be halted and  
> disk caches flushed; that's all we care about.
>(c) Go through the device tree and quiesce DMA and shut off  
> interrupts.  Since all the disks are synced this is easy.
>(d) Use IPMIs again to get all the CPUs together, which should be  
> easy as most processes are sleeping in IO or SIGSTOPed, and we're  
> getting no interrupts.
>(e) One CPU turns off all interrupts on itself and takes an atomic  
> snapshot of kernel memory into the previously allocated storage.   
> Once again, does not include pagecache.  The kernel also records a  
> list of what pages *are* included in the pagecache.  It then marks  
> all userspace pages as copy-on-write.

Hotplugging cpus (when all those locking issues are taken care of) is simpler. 
Prior to cpu hotplugging, I used IMPIs to put secondary cpus into a tight 
loop, so I know it's possible to do it this way too. That way, though, y

Re: [linux-pm] Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-09 Thread Paul Mackerras

Nigel Cunningham writes:

> (Finally catching up on emails since Saturday!)

Yes, I know what you mean; I was occupied with other work stuff on
Friday and not well on Saturday, and by that stage there was a backlog
of about 500 emails for me to wade through... :/

> The freezer is also essential for doing a full image of memory - you
> have to be able to guarantee that some of the image won't change in
> order to be able to write it out before the atomic copy. Of course the
> freezer alone isn't enough, but it solves 95% of the issue.

Sure.  As I pointed out in another email, it's unrealistic to expect
the freezer to be able to stop everything in a state where any
arbitrary mutex is free.  But as long as you don't need to do anything
that depends on getting a mutex while the system is frozen, there
should be no problem.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-09 Thread Nigel Cunningham

Hi.

On Monday 09 July 2007 09:01:27 Rafael J. Wysocki wrote:
> Hi,
> 
> On Monday, 9 July 2007 00:13, Pavel Machek wrote:
> > Hi!
> > 
> > > > Actaully, I'm perfectly fine with that, as long as each task blocked 
by the
> > > > driver due to suspend has PF_FROZEN (or something similar) set.  Then, 
at
> > > > least theoretically, we'll be able to drop the freezer from the 
suspend code
> > > > path and move it after device_suspend() (or the hibernation-specific
> > > > equivalent) for hibernation (in that case there shouldn't be a problem 
with
> > > > any task waiting on I/O while the freezer is running ;-)).
> > > 
> > > I don't see the need for a freezer for snapshot but that's a different
> > > issue. (stop_machine looks good enough to me).
> > 
> > Freezer is not needed for snapshot -- it is needed so that we can
> > write out the snapshot to disk without the need for special
> > drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> > write to disk from userland code in uswsusp).
> 
> Yes.
> 
> BTW, this patch:
> 
> 
http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch
> 
> that's queued up in -mm contains a freezer documentation update, in which 
the
> reasons of using it, as well as its limitations, are described.
> 
> To summarize what was previously said in this thread:
> 
> * Apparently, we agree that the freezer is _generally_ not needed for 
suspend
>   (ie. any transition to a system sleep state other than hibernation), but 
some
>   of us (eg. me) think that it wouldn't be reasonable to drop the freezer 
from
>   the suspend code path _right_ _now_ .
> 
> * Some of us, including you, Nigel and me, think that the freezer is needed
>   for hibernation (please see the document in the patch above for details).
>   In the (very) long run this might be avoided too, but (IMO) certainly not 
at
>   this point.
> 
> * We seem to agree that in order to remove the freezer from the suspend code
>   path some work needs to be done on device drivers, driver midlayers and 
the
>   PM core.  We also need to do some work on the PM core in order to 
introduce
>   a separate hibernation framework and IMO it would be reasonable to
>   synchronize these efforts.
> 
> * We are now to decide what to do so that the freezer can be safely removed
>   from the suspend code path and how to integrate that change with the
>   hibernation code path (if possible and reasonable).
> 
> * The freezer vs FUSE issue that started this thread remains unresolved, so
>   it would be desirable to provide a short-term fix (need not be very nice).

(Finally catching up on emails since Saturday!)

The freezer is also essential for doing a full image of memory - you have to 
be able to guarantee that some of the image won't change in order to be able 
to write it out before the atomic copy. Of course the freezer alone isn't 
enough, but it solves 95% of the issue.

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgpyEufTeC6X5.pgp
Description: PGP signature

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-09 Thread Miklos Szeredi

> > > We can just wait for all fuse requests to be serviced before
> > > proceeding further with freeze, right?
> > 
> > Right.  Nice way to slow down or stop the suspend with an unprivileged
> > process.  Avoiding that sort of DoS is one of the design goals of
> > fuse.
> 
> So you want me to handle _malicious_ filesystems now?
> 
> That should be easy... :-). You already have nasty deadlocks in FUSE,
> and you solve them by "root can echo 1 > abort"... so allow me the
> same possibility.
> 
> We can tell fused we are freezing, and if all the requests are not
> serviced within, say, 30 seconds, we call the filesystem malicious and
> do echo 1 > abort.
> 
> Not ideal, but neither is allowing malicious filesystems in the first
> place...

Actually there's also a non-malicious case in which waiting for
requests to finish won't work: when one fuse filesystem is accessing
another.

Since we are blocking new fuse requests, that might block a fuse
daemon, which in turn makes it impossible to finish the pending
request.

And this is not at all theoretical, I know that encfs is used over
various other fuse filesystems like sshfs or ntfs-3g.

Yeah, stacking userspace filesystems could be done entirely in
userspace, and I'm actually working on that (with fuse-2.7.0 already
supporting some basic stacking).

But the point is, that the "wait for fuse to quiescence" hack would
not work in todays environment.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Alan Stern

On Mon, 9 Jul 2007, Benjamin Herrenschmidt wrote:

> > This will be up to the people responsible for the subsystems.  I can 
> > take care of USB.
> 
> USB is not that much of a problem in the sense that for most "leaf"
> drivers, USB is a provider (ie, the bus they sit on), not the client
> (like the network stack is to network drivers).

That's why I thought I would be able to handle it!  :-)

> There is still the race of:
> 
>   drivers_sysfs_write()
>   try_to_icebox()
>   < 
>   hit hardware
> 
> Those are akin, in some ways, to the freezer races. Some kind of RCU
> might take care of them if we enable the icebox, then wait for all tasks
> to hit an explicit schedule point once (or return to userland). That
> would mean that drivers need to try_to_icebox() again if they do
> something that may schedule (such as __get_user). So it's not a magic
> solution, it has issues, but it can handle a lot of the simplest cases. 

It depends on what kind of race you're worried about.  In general a 
sysfs access that causes I/O isn't much different from a char device's 
read or write.  For such cases the driver would have to do something 
like this:

drivers_sysfs_write()
{
 restart:
mutex_lock(private_io_mutex);
if (device is suspended) {
mutex_unlock(private_io_mutex);
icebox();
goto restart;
}
... hit hardware ...
mutex_unlock(private_io_mutex);
}

And of course the suspend method would have to acquire the
private_io_mutex.  Some drivers might be able to use the device
semaphore instead of adding a new mutex.

I'm more concerned about races involving device registration.  The best 
solution I can come up with is to use an rwsem:

drivers_sysfs_write()
{
down_read(private_rwsem);
... register/unregister devices ...
up_read(private_rwsem);
}

where the suspend notifer callout would do down_write() and the resume
callout would do up_write().  Can you suggest anything better?

> > Drivers are already prepared for device registration to fail (or they
> > ought to be), so this change shouldn't knock the bottom out of things.  
> > device_add() isn't on a hot path, so adding an extra check and
> > srcu_read_lock() won't hurt.
> 
> True. Also, bus drivers could just flag the port with something saying
> "try registering again later". Don't underestimate the power of "try
> later" constructs :-)

-ESUSPENDING would naturally imply that the caller should retry after 
the resume.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Adrian Bunk

On Mon, Jul 09, 2007 at 07:33:56PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2007-07-09 at 11:23 +0200, Oliver Neukum wrote:
> > Suspension is to be transparent. Apart from a jump in the system clock
> > user space must not notice, thus returning errors due to suspension is
> > not an option.
> 
> Who says ?
> 
> If I'm talking to a USB printer, it will notice the USB bus being
> suspended, believe me. It's actually likely to screw up whatever job is
> in progress. It make perfect sense to error out attempts to talk to it
> when suspended.
> 
> I don't think "transparent" is an absolute requirement. In some cases,
> it doesn't make sense and the printer is a good example of that. The
> printing daemon is typically something that will ideally need to grow
> knowledge about suspend/resume to interrupt things at the right time.
> 
> That's one of the thing... we don't even have the kernel right, so we
> are light-years from having the rest of userland right.
> 
> For example, from a user point of view, what should happen if you
> suspend while printing ?

There might be other existing things with a semantics that might be 
worth looking at, e.g.:

I don't have an USB printer, but what happens if you unplug the printer 
while printing, wait 30 seconds, and plug it in again?
Wait 3 hours?

Could this work theoretically?
Does it work in practice?

This are non-theoretical use cases kernel and userspace already somehow 
handle today, and for userspace an USB printer disconnected for 3 hours 
and a 3 hour suspend might simply be the same with the latter not 
requiring any extra userspace handling.

Another example might be mounted NFS file systems:

The case that a client's ethernet cable gets unplugged for 24 hours and 
plugged in again is a real life use case NFS and all applications 
already have to handle.

> Well, it's a matter of policy (thus should be configurable), but I would
> exect (as a former Mac user) something around the lines of this as a
> default:
> 
>  - If the laptop's clamshell has been closed, it's likely that the
> user's just picking up the laptop for a ride, makes sense to abort jobs
> in progress.
> 
>  - If this is a manual action via the menus, displaying a dialog asking
> if you really want to suspend before your page is finished makes sense
> (with that dialog disappearing and the machine going to sleep if you
> just wait and let it print).
> 
> That sort of thing requires much more than just kernel collaboration of
> course, but a whole infrastructure in userland as well.
> 
> So no, suspend is -not- transparent, doesn't have to be, it's something
> that is to be decided on a case-by-case basis (or rather,
> driver-per-driver and possibly via policy settings).

But keep in mind that every piece that does not work transparently 
might result in less use cases for suspend.

There are cases like CD burning while suspending that might not be 
solvable, but as much as possible should work as transparently as 
possible.

> Ben.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Rafael J. Wysocki

On Monday, 9 July 2007 11:14, Benjamin Herrenschmidt wrote:
> On Mon, 2007-07-09 at 08:52 +0200, Oliver Neukum wrote:
> > Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt:
> > > Another issue that's been a problem forever with suspend is the
> > > synchronous request_firmware interface. Lots of drivers do that in
> > > resume() which will generally not work.
> > 
> > Unfortunately yes they do. We now have the notifier chains, which
> > can be used. Firmware, especially if it is large, must be requested
> > while the system is still fully functional and paging will work.
> > 
> > Asynchronocity will not help much because the devices must work
> > after resumingand delaying that will not help with respect to needing
> > devices to get at the firmware.
> 
> True, pre-loading is the best approach.

Agreed.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Benjamin Herrenschmidt

On Mon, 2007-07-09 at 12:02 +0200, Pavel Machek wrote:
> Hi!
> 
> > > > But I'm not sure it's a good idea in the long run.  Think of a printer 
> > > > daemon, for example.  It shouldn't have to experience unexpected I/O 
> > > > problems merely because someone has decided to put the system to sleep.
> > > 
> > > Why not ? Printer is offline when machine is asleep... trying to print
> 
> ...filesystems are offline, too, when the machine is asleep. Yet,
> unmounting everything on suspend would not result in useful suspend
> support.
> 
> Yes, I believe we should be transparent.

You just compared apple and oranges... Try printing and half way through
the page, suspend your USB bus, and see how the printer reacts.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Pavel Machek

Hi!

> > > But I'm not sure it's a good idea in the long run.  Think of a printer 
> > > daemon, for example.  It shouldn't have to experience unexpected I/O 
> > > problems merely because someone has decided to put the system to sleep.
> > 
> > Why not ? Printer is offline when machine is asleep... trying to print

...filesystems are offline, too, when the machine is asleep. Yet,
unmounting everything on suspend would not result in useful suspend
support.

Yes, I believe we should be transparent.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-09 Thread Miklos Szeredi

> > In that case the "we need suspend to be invisible to userspace" as a
> > reason to use the freezer would also be moot, since if you don't
> > schedule userspace after offlining the CPUs, it can't notice this.
> 
> After? Can you do the offlining atomically?

Don't know.  Wait for all CPUs to reach a scheduling point and then
take them offline?  Doesn't sound too difficult.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-09 Thread Oliver Neukum

Am Montag, 9. Juli 2007 schrieb Miklos Szeredi:
> In that case the "we need suspend to be invisible to userspace" as a
> reason to use the freezer would also be moot, since if you don't
> schedule userspace after offlining the CPUs, it can't notice this.

After? Can you do the offlining atomically?

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Benjamin Herrenschmidt

On Mon, 2007-07-09 at 11:23 +0200, Oliver Neukum wrote:
> Suspension is to be transparent. Apart from a jump in the system clock
> user space must not notice, thus returning errors due to suspension is
> not an option.

Who says ?

If I'm talking to a USB printer, it will notice the USB bus being
suspended, believe me. It's actually likely to screw up whatever job is
in progress. It make perfect sense to error out attempts to talk to it
when suspended.

I don't think "transparent" is an absolute requirement. In some cases,
it doesn't make sense and the printer is a good example of that. The
printing daemon is typically something that will ideally need to grow
knowledge about suspend/resume to interrupt things at the right time.

That's one of the thing... we don't even have the kernel right, so we
are light-years from having the rest of userland right.

For example, from a user point of view, what should happen if you
suspend while printing ?

Well, it's a matter of policy (thus should be configurable), but I would
exect (as a former Mac user) something around the lines of this as a
default:

 - If the laptop's clamshell has been closed, it's likely that the
user's just picking up the laptop for a ride, makes sense to abort jobs
in progress.

 - If this is a manual action via the menus, displaying a dialog asking
if you really want to suspend before your page is finished makes sense
(with that dialog disappearing and the machine going to sleep if you
just wait and let it print).

That sort of thing requires much more than just kernel collaboration of
course, but a whole infrastructure in userland as well.

So no, suspend is -not- transparent, doesn't have to be, it's something
that is to be decided on a case-by-case basis (or rather,
driver-per-driver and possibly via policy settings).

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-09 Thread Miklos Szeredi

> Please have a look at the documentation update at the bottom of this patch:
> 
> http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch
> 
> It says what the freezer is for in the first place. :-)

Thanks, good description.

> > Yes, we need to make sure, that nothing is scheduled during (and
> > possibly after) taking the snapshot.  But AFAICS that could be
> > achieved by unplugging all but one CPU.
> 
> Actually, we want things to get scheduled, because we need some of them to
> save the image (to make things more difficult, we don't know what will be
> needed to save the image in advance ;-)).

Well, kexecing a new kernel (as being pursued in one of the
subthreads) to do the saving should take care of that.

In that case the "we need suspend to be invisible to userspace" as a
reason to use the freezer would also be moot, since if you don't
schedule userspace after offlining the CPUs, it can't notice this.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Oliver Neukum

Am Montag, 9. Juli 2007 schrieb Benjamin Herrenschmidt:
> On Mon, 2007-07-09 at 08:47 +0200, Oliver Neukum wrote:
> > Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt:
> > > > But I'm not sure it's a good idea in the long run.  Think of a printer 
> > > > daemon, for example.  It shouldn't have to experience unexpected I/O 
> > > > problems merely because someone has decided to put the system to sleep.
> > > 
> > > Why not ? Printer is offline when machine is asleep... trying to print
> > 
> > Not necessarily. The machine must survive going to sleep while you are
> > printing. Any other error return than -ERESTARTSYS is not an option.
> > We can't simply change the ABI.
> 
> Ugh ? Why returning an error from the printer driver to the userland
> print server/daemon would prevent the machine from "surviving" ? I would
> be happy with -EIO personally :-)

Surviving is a bit strongly worded.

Suspension is to be transparent. Apart from a jump in the system clock
user space must not notice, thus returning errors due to suspension is
not an option.

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Benjamin Herrenschmidt

On Mon, 2007-07-09 at 08:47 +0200, Oliver Neukum wrote:
> Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt:
> > > But I'm not sure it's a good idea in the long run.  Think of a printer 
> > > daemon, for example.  It shouldn't have to experience unexpected I/O 
> > > problems merely because someone has decided to put the system to sleep.
> > 
> > Why not ? Printer is offline when machine is asleep... trying to print
> 
> Not necessarily. The machine must survive going to sleep while you are
> printing. Any other error return than -ERESTARTSYS is not an option.
> We can't simply change the ABI.

Ugh ? Why returning an error from the printer driver to the userland
print server/daemon would prevent the machine from "surviving" ? I would
be happy with -EIO personally :-)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-09 Thread Benjamin Herrenschmidt

On Mon, 2007-07-09 at 08:52 +0200, Oliver Neukum wrote:
> Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt:
> > Another issue that's been a problem forever with suspend is the
> > synchronous request_firmware interface. Lots of drivers do that in
> > resume() which will generally not work.
> 
> Unfortunately yes they do. We now have the notifier chains, which
> can be used. Firmware, especially if it is large, must be requested
> while the system is still fully functional and paging will work.
> 
> Asynchronocity will not help much because the devices must work
> after resumingand delaying that will not help with respect to needing
> devices to get at the firmware.

True, pre-loading is the best approach.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Oliver Neukum

Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt:
> Another issue that's been a problem forever with suspend is the
> synchronous request_firmware interface. Lots of drivers do that in
> resume() which will generally not work.

Unfortunately yes they do. We now have the notifier chains, which
can be used. Firmware, especially if it is large, must be requested
while the system is still fully functional and paging will work.

Asynchronocity will not help much because the devices must work
after resumingand delaying that will not help with respect to needing
devices to get at the firmware.

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Oliver Neukum

Am Sonntag, 8. Juli 2007 schrieb Benjamin Herrenschmidt:
> > But I'm not sure it's a good idea in the long run.  Think of a printer 
> > daemon, for example.  It shouldn't have to experience unexpected I/O 
> > problems merely because someone has decided to put the system to sleep.
> 
> Why not ? Printer is offline when machine is asleep... trying to print

Not necessarily. The machine must survive going to sleep while you are
printing. Any other error return than -ERESTARTSYS is not an option.
We can't simply change the ABI.

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt


> You could do this with a straight mutex except for the  
> dev_suspend_trylock/unlock bit in uninterruptable contexts, but I  
> seem to recall somebody saying that could be made to work if there  
> was a real need for it.  Alternatively you could just drive the  
> "Generic Mutex" guys up the wall by inventing your own pseudo-mutex  
> with a spinlock, a boolean value, and a waitqueue.

Most of the time, the interrupt/atomic contexts are part of what I call
the "main path" of the driver, which doesn't need most of this stuff.

Depending on what is "upstream" from the driver, you typically set a
condition stopping the flow of "interrupt" activity and wait for it to
drain in suspend. For example, a block driver would instruct the driver
to stop processing the request queue and wait for pending in flight
requests to drain, or a network driver would stop the tx queue and wait
for pending/concurrent xmit() callbacks to complete (basically sync. bh
and sync. with NAPI poll).

In both examples, helpers can be provided in the block layer or network
stack to make that easier.

In many cases, the driver may just turn off something in the chip in
"suspend" (using a low level spinlock that it already has if it has an
irq path to sync. HW access) and set some flag/state to instruct the irq
handler to not restart, then synchronize_irq().

All those techniques are fairly simple and well known. The main source
of problems are the "sideband" channels, such as sysfs accesses to
driver tunables, ioctl's, etc... for which I believe mutexes can solve
90% of them.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Kyle Moffett


On Jul 08, 2007, at 20:33:37, Benjamin Herrenschmidt wrote:

drivers_sysfs_write()
while(!suspend_trylock())
			try_to_icebox() --> or even try_to_freeze(), what's the  
difference?


hit hardware
unlock_suspend()

where the PM core must wait for the suspend lock to get released  
(say with a timeout)?


Somewhat. What I'd like is to have that a construct of that sort on  
a per-driver basis though :-) Now the question is, in what way  
would the above be different from a simple suspend mutex ? I've  
been using mutexes in the past for a couple of drivers (iirc,  
that's how I did it for dmasound_pmac, though that driver is long  
past obsolescence now).


I agree completely.  It's a bit trickier if you want to do work in  
uninterruptable contexts:


driver_suspend_callback(...)
{
dev_suspend_lock(dev);
put_hardware_to_sleep(dev);
}

driver_resume_callback(...)
{
wake_hardware_up(dev);
dev_suspend_unlock(dev);
}

Then for sleep-capable contexts:
dev_suspend_lock(dev);
dev_suspend_unlock(dev);

And for no-sleep contexts like interrupts etc:
if (!dev_suspend_trylock(dev))
return postpone_work_for_later(dev, ...);
do_stuff_with(dev);
dev_suspend_unlock(dev);

You could do this with a straight mutex except for the  
dev_suspend_trylock/unlock bit in uninterruptable contexts, but I  
seem to recall somebody saying that could be made to work if there  
was a real need for it.  Alternatively you could just drive the  
"Generic Mutex" guys up the wall by inventing your own pseudo-mutex  
with a spinlock, a boolean value, and a waitqueue.


Then when you put your driver to sleep, things trying to do IO will  
automatically "freeze" themselves exactly until the device is woken  
again.


Assuming the driver model and subsystem get the ordering right for  
which devices to suspend/resume, then it's impossible to deadlock or  
cause hardware confusion.  And even further, if you manage to make  
the automagic mutex-debugging code work with the noninterruptable  
trylock it will yell at you when the driver model does nasty deadlock- 
y things.


On the other hand, if the driver model *doesn't* get the ordering  
right then it's fundamentally impossible to reliably suspend and resume.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt


>   drivers_sysfs_write()
>   while(!suspend_trylock())
>   try_to_icebox() --> or even try_to_freeze(), what's the 
> difference?
> 
>   hit hardware
>   unlock_suspend()
> 
> where the PM core must wait for the suspend lock to get released (say with a
> timeout)?

Somewhat. What I'd like is to have that a construct of that sort on a
per-driver basis though :-) Now the question is, in what way would the
above be different from a simple suspend mutex ? I've been using mutexes
in the past for a couple of drivers (iirc, that's how I did it for
dmasound_pmac, though that driver is long past obsolescence now).

But yes, that's the idea, something under control of drivers, instead of
a 3rd party freezer that tries to hit at processes. 

> Hmm, perhaps we can use a special workqueue that's created before the suspend
> (say by the PM core) and starts processing jobs after the resume?

Yeah, whatever the implementation is, though, the interface should be
transparent. But then, I see little use of that. Mostly things like this
sysfs write to unbind Alan was talking about. Driver workqueues probably
need to be functioning all the way until the driver itself is suspended,
in which case, the driver may just flush_worqueues in it's suspend path
at the right time to make sure it doesn't exit with something still
pending.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread david


On Mon, 9 Jul 2007, Pavel Machek wrote:


On Sun 2007-07-08 16:20:46, [EMAIL PROTECTED] wrote:

On Mon, 9 Jul 2007, Pavel Machek wrote:


Actaully, I'm perfectly fine with that, as long as each task blocked by
the
driver due to suspend has PF_FROZEN (or something similar) set.  Then,
at
least theoretically, we'll be able to drop the freezer from the suspend
code
path and move it after device_suspend() (or the hibernation-specific
equivalent) for hibernation (in that case there shouldn't be a problem
with
any task waiting on I/O while the freezer is running ;-)).


I don't see the need for a freezer for snapshot but that's a different
issue. (stop_machine looks good enough to me).


Freezer is not needed for snapshot -- it is needed so that we can
write out the snapshot to disk without the need for special
drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
write to disk from userland code in uswsusp).


instead of trying to freeze most of the system, could you do something
like start a virtual machine sandbox to write the data out, and not let
any userspace other then the sandbox operate?

you would need to throw away disk buffers so that you don't mix current
pending I/O with I/O from the sandbox, and this would be a visable change
for how suspend is setup, but wouldn't this work?


It feels kind of expensive, but yes, we could use another kernel for
doing the dump. Kdump people are using that. We could use hypervisor
for doing the dump. Xen people are doing that. (But I do not think any
of those solutions is suitable for "lets hibernate my notebook" case).


expensive and reliable beats efficiant and unrelaible.

why do you say that neither would work for the "lets hibernate my
notebook" case?


Both would work. One would eat 8-64MB of your RAM, permanently; second
would eat 5-15% of your cpu, permanently. Not very suitable.


how much overlap is there between the two approaches? are they close 
enough to be able to give the user the choice of which to use depending on 
their machine (new machines with the hardware virtualization support may 
want the hypervisor, other hardware may want to sacrafice 8M of ram)



Who says current solution is unreliable?


users report problems, suspend* developers repeatedly state that the 
problems are that the rest of the kernel needs to be fixed to work 
properly with the existing approach.


I think it's safe to say that it doesn't work in the general case, even 
though it does work in some specific cases.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Pavel Machek

On Sun 2007-07-08 16:20:46, [EMAIL PROTECTED] wrote:
> On Mon, 9 Jul 2007, Pavel Machek wrote:
> 
> >Actaully, I'm perfectly fine with that, as long as each task blocked by
> >the
> >driver due to suspend has PF_FROZEN (or something similar) set.  Then, 
> >at
> >least theoretically, we'll be able to drop the freezer from the suspend
> >code
> >path and move it after device_suspend() (or the hibernation-specific
> >equivalent) for hibernation (in that case there shouldn't be a problem
> >with
> >any task waiting on I/O while the freezer is running ;-)).
> 
> I don't see the need for a freezer for snapshot but that's a different
> issue. (stop_machine looks good enough to me).
> >>>
> >>>Freezer is not needed for snapshot -- it is needed so that we can
> >>>write out the snapshot to disk without the need for special
> >>>drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> >>>write to disk from userland code in uswsusp).
> >>
> >>instead of trying to freeze most of the system, could you do something
> >>like start a virtual machine sandbox to write the data out, and not let
> >>any userspace other then the sandbox operate?
> >>
> >>you would need to throw away disk buffers so that you don't mix current
> >>pending I/O with I/O from the sandbox, and this would be a visable change
> >>for how suspend is setup, but wouldn't this work?
> >
> >It feels kind of expensive, but yes, we could use another kernel for
> >doing the dump. Kdump people are using that. We could use hypervisor
> >for doing the dump. Xen people are doing that. (But I do not think any
> >of those solutions is suitable for "lets hibernate my notebook" case).
> 
> expensive and reliable beats efficiant and unrelaible.
> 
> why do you say that neither would work for the "lets hibernate my 
> notebook" case?

Both would work. One would eat 8-64MB of your RAM, permanently; second
would eat 5-15% of your cpu, permanently. Not very suitable.

Who says current solution is unreliable?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread david


On Mon, 9 Jul 2007, Pavel Machek wrote:


Actaully, I'm perfectly fine with that, as long as each task blocked by
the
driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
least theoretically, we'll be able to drop the freezer from the suspend
code
path and move it after device_suspend() (or the hibernation-specific
equivalent) for hibernation (in that case there shouldn't be a problem
with
any task waiting on I/O while the freezer is running ;-)).


I don't see the need for a freezer for snapshot but that's a different
issue. (stop_machine looks good enough to me).


Freezer is not needed for snapshot -- it is needed so that we can
write out the snapshot to disk without the need for special
drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
write to disk from userland code in uswsusp).


instead of trying to freeze most of the system, could you do something
like start a virtual machine sandbox to write the data out, and not let
any userspace other then the sandbox operate?

you would need to throw away disk buffers so that you don't mix current
pending I/O with I/O from the sandbox, and this would be a visable change
for how suspend is setup, but wouldn't this work?


It feels kind of expensive, but yes, we could use another kernel for
doing the dump. Kdump people are using that. We could use hypervisor
for doing the dump. Xen people are doing that. (But I do not think any
of those solutions is suitable for "lets hibernate my notebook" case).


expensive and reliable beats efficiant and unrelaible.

why do you say that neither would work for the "lets hibernate my 
notebook" case?


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Pavel Machek

Hi!

> > Freezer is not needed for snapshot -- it is needed so that we can
> > write out the snapshot to disk without the need for special
> > drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> > write to disk from userland code in uswsusp).
> 
> Yes.
> 
> BTW, this patch:
> 
> http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch
> 
> that's queued up in -mm contains a freezer documentation update, in which the
> reasons of using it, as well as its limitations, are described.
> 
> To summarize what was previously said in this thread:
> 
> * Apparently, we agree that the freezer is _generally_ not needed for suspend
>   (ie. any transition to a system sleep state other than hibernation), but 
> some
>   of us (eg. me) think that it wouldn't be reasonable to drop the freezer from
>   the suspend code path _right_ _now_ .
> 
> * Some of us, including you, Nigel and me, think that the freezer is needed
>   for hibernation (please see the document in the patch above for details).
>   In the (very) long run this might be avoided too, but (IMO) certainly not at
>   this point.
> 
> * We seem to agree that in order to remove the freezer from the suspend code
>   path some work needs to be done on device drivers, driver midlayers and the
>   PM core.  We also need to do some work on the PM core in order to introduce
>   a separate hibernation framework and IMO it would be reasonable to
>   synchronize these efforts.
> 
> * We are now to decide what to do so that the freezer can be safely removed
>   from the suspend code path and how to integrate that change with the
>   hibernation code path (if possible and reasonable).

Nice summary, thanks.

> * The freezer vs FUSE issue that started this thread remains unresolved, so
>   it would be desirable to provide a short-term fix (need not be very nice).

Actually there are _2_ freezer vs FUSE issues, and one of them should
be simple to solve, once we have sysrq-t of the deadlock. (Or did I
miss it somewhere with discussion going on 10 lists in parallel?)
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Pavel Machek

Hi!

> >>>Actaully, I'm perfectly fine with that, as long as each task blocked by 
> >>>the
> >>>driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
> >>>least theoretically, we'll be able to drop the freezer from the suspend 
> >>>code
> >>>path and move it after device_suspend() (or the hibernation-specific
> >>>equivalent) for hibernation (in that case there shouldn't be a problem 
> >>>with
> >>>any task waiting on I/O while the freezer is running ;-)).
> >>
> >>I don't see the need for a freezer for snapshot but that's a different
> >>issue. (stop_machine looks good enough to me).
> >
> >Freezer is not needed for snapshot -- it is needed so that we can
> >write out the snapshot to disk without the need for special
> >drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> >write to disk from userland code in uswsusp).
> 
> instead of trying to freeze most of the system, could you do something 
> like start a virtual machine sandbox to write the data out, and not let 
> any userspace other then the sandbox operate?
> 
> you would need to throw away disk buffers so that you don't mix current 
> pending I/O with I/O from the sandbox, and this would be a visable change 
> for how suspend is setup, but wouldn't this work?

It feels kind of expensive, but yes, we could use another kernel for
doing the dump. Kdump people are using that. We could use hypervisor
for doing the dump. Xen people are doing that. (But I do not think any
of those solutions is suitable for "lets hibernate my notebook" case).

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Rafael J. Wysocki

Hi,

On Monday, 9 July 2007 00:13, Pavel Machek wrote:
> Hi!
> 
> > > Actaully, I'm perfectly fine with that, as long as each task blocked by 
> > > the
> > > driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
> > > least theoretically, we'll be able to drop the freezer from the suspend 
> > > code
> > > path and move it after device_suspend() (or the hibernation-specific
> > > equivalent) for hibernation (in that case there shouldn't be a problem 
> > > with
> > > any task waiting on I/O while the freezer is running ;-)).
> > 
> > I don't see the need for a freezer for snapshot but that's a different
> > issue. (stop_machine looks good enough to me).
> 
> Freezer is not needed for snapshot -- it is needed so that we can
> write out the snapshot to disk without the need for special
> drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
> write to disk from userland code in uswsusp).

Yes.

BTW, this patch:

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch

that's queued up in -mm contains a freezer documentation update, in which the
reasons of using it, as well as its limitations, are described.

To summarize what was previously said in this thread:

* Apparently, we agree that the freezer is _generally_ not needed for suspend
  (ie. any transition to a system sleep state other than hibernation), but some
  of us (eg. me) think that it wouldn't be reasonable to drop the freezer from
  the suspend code path _right_ _now_ .

* Some of us, including you, Nigel and me, think that the freezer is needed
  for hibernation (please see the document in the patch above for details).
  In the (very) long run this might be avoided too, but (IMO) certainly not at
  this point.

* We seem to agree that in order to remove the freezer from the suspend code
  path some work needs to be done on device drivers, driver midlayers and the
  PM core.  We also need to do some work on the PM core in order to introduce
  a separate hibernation framework and IMO it would be reasonable to
  synchronize these efforts.

* We are now to decide what to do so that the freezer can be safely removed
  from the suspend code path and how to integrate that change with the
  hibernation code path (if possible and reasonable).

* The freezer vs FUSE issue that started this thread remains unresolved, so
  it would be desirable to provide a short-term fix (need not be very nice).

Greetings,
Rafael

-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread david


On Mon, 9 Jul 2007, Pavel Machek wrote:


Actaully, I'm perfectly fine with that, as long as each task blocked by the
driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
least theoretically, we'll be able to drop the freezer from the suspend code
path and move it after device_suspend() (or the hibernation-specific
equivalent) for hibernation (in that case there shouldn't be a problem with
any task waiting on I/O while the freezer is running ;-)).


I don't see the need for a freezer for snapshot but that's a different
issue. (stop_machine looks good enough to me).


Freezer is not needed for snapshot -- it is needed so that we can
write out the snapshot to disk without the need for special
drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
write to disk from userland code in uswsusp).


instead of trying to freeze most of the system, could you do something 
like start a virtual machine sandbox to write the data out, and not let 
any userspace other then the sandbox operate?


you would need to throw away disk buffers so that you don't mix current 
pending I/O with I/O from the sandbox, and this would be a visable change 
for how suspend is setup, but wouldn't this work?


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

hibernation/snapshot design [was Re: [PATCH] Remove process freezer from suspend to RAM pathway]

2007-07-08 Thread Pavel Machek

Hi!

> > Actaully, I'm perfectly fine with that, as long as each task blocked by the
> > driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
> > least theoretically, we'll be able to drop the freezer from the suspend code
> > path and move it after device_suspend() (or the hibernation-specific
> > equivalent) for hibernation (in that case there shouldn't be a problem with
> > any task waiting on I/O while the freezer is running ;-)).
> 
> I don't see the need for a freezer for snapshot but that's a different
> issue. (stop_machine looks good enough to me).

Freezer is not needed for snapshot -- it is needed so that we can
write out the snapshot to disk without the need for special
drivers/block/simple-ide-for-suspend.c. (We are doing snapshot, then
write to disk from userland code in uswsusp).
Pavel 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 23:20, Benjamin Herrenschmidt wrote:
> 
> > But I'm not sure it's a good idea in the long run.  Think of a printer 
> > daemon, for example.  It shouldn't have to experience unexpected I/O 
> > problems merely because someone has decided to put the system to sleep.
> 
> Why not ? Printer is offline when machine is asleep... trying to print
> errors out, I don't see the problem there. At one point, we'll need a
> cleaner way to also notify userland in which case our daemon could
> become more intelligent and stop servicings things before sleep and
> resume afterward :-)
>  
> > This will be up to the people responsible for the subsystems.  I can 
> > take care of USB.
> 
> USB is not that much of a problem in the sense that for most "leaf"
> drivers, USB is a provider (ie, the bus they sit on), not the client
> (like the network stack is to network drivers).
> 
> In most cases, that "helper" thing would sit on the client subsystem,
> since it's the one feeding drivers with requests. The main ones I see at
> hand are block, alsa, net, fb/drm... Some of them already have
> infrastructure to do it, some my need some more work.
> 
> > > I think it's a fairly significant change from the current freezer and I
> > > also think it's a very good idea. The more I think about it, the more I
> > > like it, in the sense that it's a simple drop-in that you could put in a
> > > lot of the ioctl path of drivers to just block tasks that are trying to
> > > call in while suspending, and could be used selectively by things like
> > > the USB hub threads.
> > 
> > That's what I had in mind.  Rafael, can we add an "icebox" routine?  
> > Like Ben says, it doesn't need to be much more than a waitqueue
> > that the current task puts itself on if a suspend is in progress.  
> > Callers arriving at a time when the icebox isn't activated should
> > simply return without blocking.  Basically the icebox should be active 
> > at the same times as the existing freezer.
> 
> There is still the race of:
> 
>   drivers_sysfs_write()
>   try_to_icebox()
>   < 
>   hit hardware
> 
> Those are akin, in some ways, to the freezer races.

I'm not sure what races you're referring to, but never mind. :-)

This is the reason why the freezer waits for tasks to freeze, actually.

> Some kind of RCU might take care of them if we enable the icebox,
> then wait for all tasks to hit an explicit schedule point once (or return to
> userland).

This is what the freezer does, isn't it? ;-)

> That would mean that drivers need to try_to_icebox() again if they do
> something that may schedule (such as __get_user). So it's not a magic
> solution, it has issues, but it can handle a lot of the simplest cases. 

Well, can't we do:

drivers_sysfs_write()
while(!suspend_trylock())
try_to_icebox() --> or even try_to_freeze(), what's the 
difference?

hit hardware
unlock_suspend()

where the PM core must wait for the suspend lock to get released (say with a
timeout)?
 
> > Here's a wacky idea which just might work:
> > 
> > In order to prevent binding and unbinding, while suspending devices all
> > the PM core has to do is avoid dropping the device semaphores!  It can
> > release the semaphores as it resumes the devices.
> > 
> > Of course, for this to work it's necessary to avoid changes to the 
> > device list during the suspend.  However I believe the iteration can be 
> > made safe against unregistration, so we only have to prevent device 
> > registration.  (And anyway, it won't be possible to unregister a device 
> > while the PM core is holding its semaphore.)
> > 
> > If we are willing to be somewhat non-transparent, this is easy to
> > accomplish.  After the notifier chain has been alerted about the
> > upcoming suspend, we tell the driver core to disallow adding new
> > devices.  Maybe use SRCU to synchronize with registration calls that
> > are in progress.  Thus, until the suspend is over device_add() will
> > immediately return an error.  We could even add a new ESUSPENDING code
> > to errno.h; it would come in handy in a few places.
> > 
> > Drivers are already prepared for device registration to fail (or they
> > ought to be), so this change shouldn't knock the bottom out of things.  
> > device_add() isn't on a hot path, so adding an extra check and
> > srcu_read_lock() won't hurt.
> 
> True. Also, bus drivers could just flag the port with something saying
> "try registering again later". Don't underestimate the power of "try
> later" constructs :-)
> 
> > I have had the same thought, that unbinding and unregistration would be 
> > easier to handle than binding and registration.  As it happens, holding 
> > the device semaphore will block both all three -- which makes life 
> > simpler.
> 
> Yup. Fair and simple.
> 
> > There are other possibilities too.  For example, instead of using
> > keventd these attri

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt

On Sun, 2007-07-08 at 23:45 +0200, Rafael J. Wysocki wrote:
> 
> Workqueues are kernel threads and the creator decides if they are going to
> freeze.  There are only two freezable worqueues in the entire tree right now.

That and keventd workqueues... my point is you may well end up with
something in a workqueue doing things that should have been blocked
after the freeze.

> Actaully, I'm perfectly fine with that, as long as each task blocked by the
> driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
> least theoretically, we'll be able to drop the freezer from the suspend code
> path and move it after device_suspend() (or the hibernation-specific
> equivalent) for hibernation (in that case there shouldn't be a problem with
> any task waiting on I/O while the freezer is running ;-)).

I don't see the need for a freezer for snapshot but that's a different
issue. (stop_machine looks good enough to me).

Ben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 23:03, Benjamin Herrenschmidt wrote:
> On Sun, 2007-07-08 at 21:15 +0200, Rafael J. Wysocki wrote:
> > On Sunday, 8 July 2007 07:14, Benjamin Herrenschmidt wrote:
> > [--snip--]
> > > 
> > > I just think that the freezer approach, as it is, is backward. We can't
> > > have a 3rd party try to discriminate what to freeze and what not, it
> > > will always get something wrong, and in some cases with the wrong timing
> > > or ordering.
> > 
> > Nice discussion, except for one thing: the freezer doesn't decide what to
> > freeze.  For example, even right now kernel threads decide if they want to 
> > be
> > frozen.
> 
> Somewhat... userspace doesn't and workqueues are a gray area.

Workqueues are kernel threads and the creator decides if they are going to
freeze.  There are only two freezable worqueues in the entire tree right now.

> Also, I've been thinking this "icebox" idea a bit more and it seems in
> fact a bit racy in some areas, at least for use by things like drivers,
> unless we end up doing something aking to an RCU on suspend, waiting for
> all tasks to reach userland once, but that has the same annoyances as
> the current freezer.
> 
> Thus I'm tempted to go back to saying that driver can handle things
> locally :-)

Actaully, I'm perfectly fine with that, as long as each task blocked by the
driver due to suspend has PF_FROZEN (or something similar) set.  Then, at
least theoretically, we'll be able to drop the freezer from the suspend code
path and move it after device_suspend() (or the hibernation-specific
equivalent) for hibernation (in that case there shouldn't be a problem with
any task waiting on I/O while the freezer is running ;-)).

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 22:16, Alan Stern wrote:
> I'll make this reply short by agreeing up front with most of what you 
> say.
> 
> On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote:
> 
> > But that's only the "main" path. Aside for that, almost all drivers also
> > have sideband "request" input and some driver don't actually live behind
> > a subsystem. That ranges from ioctl, to direct read/write on a char dev
> > from userland.
> 
> Yes, these are the problem cases.
> 
> > I think many of those cases can fairly well deal with just taking a PM
> > semaphore, that's how I did for a couple of things in the past, provided
> > that the request path isn't deadlocking with the semaphore held because
> > of the system suspending of course.  
> 
> That's what USB does as well (for the drivers which have runtime PM
> support -- at the moment only a few of them).
> 
> > But in a whole lot of cases, it's, I beleive, perfectly kosher to just
> > return an error. You're trying to capture frame from your camera while
> > the machine is suspended ? error. At worst, your capture app will be
> > unhappy when you wakeup, nothing terrible and totally fixable in
> > userland if it's a problem.
> 
> We can try falling back on this approach for now.  If the drivers are
> smart enough to fail cleanly when the device is already suspended, it
> should work.

And are they?

> But I'm not sure it's a good idea in the long run.  Think of a printer 
> daemon, for example.  It shouldn't have to experience unexpected I/O 
> problems merely because someone has decided to put the system to sleep.

Agreed.

> > In some cases, we could use a little bit more help from the subsystem.
> > Network for example, could have some explicit knowledge of the suspend
> > state, and in addition to stopping the queue would also stop calling
> > into things like change_mtu or set_multicast, provided it's agreed that
> > the driver will account for those changes on resume (the actual MTU
> > values or multicast lists are still updated in the netdev).
> 
> This will be up to the people responsible for the subsystems.  I can 
> take care of USB.
> 
> > There are two things I believe. There's a generic issue with usermode
> > helpers that make no sense to call between pre-suspend and
> > post-resume, and there's the specific issue of adding/removing
> > devices.
> > 
> > I believe that "bus" drivers such as USB should indeed get a first
> > round of notifications to tell them to stop performing bus
> > plug/unplug operations (it's debatable whether we want to keep unplug
> > going provided we can stack up the usermode events and re-send them
> > later though, but let's say no for the sake of simplicity).
> 
> Yes.  Rafael, how close is your new notifier chain to mainline?  Can it 
> at least be added to Greg KH's development tree so that I can start 
> using it?

It's in -mm.  The patches queued up in -mm are also in my patchset at
http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/

> > > So instead, why not have the PM core take care of all this?  There
> > > could be a block_task_until_suspend_is_over() routine available for all
> > > drivers to use.  Its effect would be exactly the same as sending the
> > > current task into the freezer, but it wouldn't be the freezer that
> > > exists now.  It would just be some routine that blocks until the system 
> > > suspend is over.  We could call it "the icebox" instead of "the 
> > > freezer".  :-)
> > 
> > I'm not totally sure about that. I like some of it, but I think it's
> > fairly different conceptually from the freezer (and the implementation
> > could be as trivial as a single system wide wait queue). 
> 
> Exactly.
> 
> > Basically it has a very big difference to the current freezer, and I
> > like that, which is that we don't have some 3rd party trying to find out
> > what to freeze and what not (the freezer), but instead, we have
> > explicitely drivers or kernel threads sending -themselves- to the
> > "icebox" when they think it's a good idea. Think of it as lazy freezing
> > -> you only freeze lazy tasks that are trying to do something that
> > cannot be done because of suspend.
> > 
> > > Does that make you happier?
> > 
> > I think it's a fairly significant change from the current freezer and I
> > also think it's a very good idea. The more I think about it, the more I
> > like it, in the sense that it's a simple drop-in that you could put in a
> > lot of the ioctl path of drivers to just block tasks that are trying to
> > call in while suspending, and could be used selectively by things like
> > the USB hub threads.
> 
> That's what I had in mind.  Rafael, can we add an "icebox" routine?  

Yes, I think so.

> Like Ben says, it doesn't need to be much more than a waitqueue
> that the current task puts itself on if a suspend is in progress.  
> Callers arriving at a time when the icebox isn't activated should
> simply return without blocking.  Basically the icebox should be activ

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt

On Sun, 2007-07-08 at 16:22 -0400, Alan Stern wrote:
> On Sun, 8 Jul 2007, Rafael J. Wysocki wrote:
> 
> > I'm all for changing this infrastructure, but in an organized way (ie. we
> > discuss what to do next, we do that and then we go to the next step) and in
> > the order that everyone will be comfortable with.
> > 
> > So, let's finish this thread and start over from discussing what needs to be
> > done, how (ie. in what order etc.) we are going to do that and who is going 
> > to
> > do what.  Shall we?
> 
> IMO we should start by using the new notifier chain and by implementing 
> a central "icebox" routine.  Then we can forbid device registration 
> during suspend.
> 
> It might also be a good idea to add a freezable keventd-like workqueue 
> specifically intended for things that need to block during a suspend.  
> Although maybe this will end up being unnecessary; it's too soon to 
> tell.

I think looking at the kmallo/gfp issue I mentioned earlier should be
done asap. I can try to spare some time for it this week but I can't
promise, I'm actually swamped by something totally unrelated at the
moment.

Another issue that's been a problem forever with suspend is the
synchronous request_firmware interface. Lots of drivers do that in
resume() which will generally not work.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt


> But I'm not sure it's a good idea in the long run.  Think of a printer 
> daemon, for example.  It shouldn't have to experience unexpected I/O 
> problems merely because someone has decided to put the system to sleep.

Why not ? Printer is offline when machine is asleep... trying to print
errors out, I don't see the problem there. At one point, we'll need a
cleaner way to also notify userland in which case our daemon could
become more intelligent and stop servicings things before sleep and
resume afterward :-)
 
> This will be up to the people responsible for the subsystems.  I can 
> take care of USB.

USB is not that much of a problem in the sense that for most "leaf"
drivers, USB is a provider (ie, the bus they sit on), not the client
(like the network stack is to network drivers).

In most cases, that "helper" thing would sit on the client subsystem,
since it's the one feeding drivers with requests. The main ones I see at
hand are block, alsa, net, fb/drm... Some of them already have
infrastructure to do it, some my need some more work.

> > I think it's a fairly significant change from the current freezer and I
> > also think it's a very good idea. The more I think about it, the more I
> > like it, in the sense that it's a simple drop-in that you could put in a
> > lot of the ioctl path of drivers to just block tasks that are trying to
> > call in while suspending, and could be used selectively by things like
> > the USB hub threads.
> 
> That's what I had in mind.  Rafael, can we add an "icebox" routine?  
> Like Ben says, it doesn't need to be much more than a waitqueue
> that the current task puts itself on if a suspend is in progress.  
> Callers arriving at a time when the icebox isn't activated should
> simply return without blocking.  Basically the icebox should be active 
> at the same times as the existing freezer.

There is still the race of:

drivers_sysfs_write()
try_to_icebox()
< 
hit hardware

Those are akin, in some ways, to the freezer races. Some kind of RCU
might take care of them if we enable the icebox, then wait for all tasks
to hit an explicit schedule point once (or return to userland). That
would mean that drivers need to try_to_icebox() again if they do
something that may schedule (such as __get_user). So it's not a magic
solution, it has issues, but it can handle a lot of the simplest cases. 

> Here's a wacky idea which just might work:
> 
> In order to prevent binding and unbinding, while suspending devices all
> the PM core has to do is avoid dropping the device semaphores!  It can
> release the semaphores as it resumes the devices.
> 
> Of course, for this to work it's necessary to avoid changes to the 
> device list during the suspend.  However I believe the iteration can be 
> made safe against unregistration, so we only have to prevent device 
> registration.  (And anyway, it won't be possible to unregister a device 
> while the PM core is holding its semaphore.)
> 
> If we are willing to be somewhat non-transparent, this is easy to
> accomplish.  After the notifier chain has been alerted about the
> upcoming suspend, we tell the driver core to disallow adding new
> devices.  Maybe use SRCU to synchronize with registration calls that
> are in progress.  Thus, until the suspend is over device_add() will
> immediately return an error.  We could even add a new ESUSPENDING code
> to errno.h; it would come in handy in a few places.
> 
> Drivers are already prepared for device registration to fail (or they
> ought to be), so this change shouldn't knock the bottom out of things.  
> device_add() isn't on a hot path, so adding an extra check and
> srcu_read_lock() won't hurt.

True. Also, bus drivers could just flag the port with something saying
"try registering again later". Don't underestimate the power of "try
later" constructs :-)

> I have had the same thought, that unbinding and unregistration would be 
> easier to handle than binding and registration.  As it happens, holding 
> the device semaphore will block both all three -- which makes life 
> simpler.

Yup. Fair and simple.

> There are other possibilities too.  For example, instead of using
> keventd these attributes could use a separate workqueue which would put
> itself in the icebox during a suspend.  Or maybe sysfs can be reworked 
> so that they don't need to use a workqueue at all.

I think having a facility for a given workqueue entry to requeue itself
for after resume might be of use for drivers too.

> > Don't get me wrong, I never said we don't need generic infrastructure
> > and utilities, such as your proposed icebox scheme, or some of those
> > workqueue bits, helpers in subsystems, etc...
> 
> I hope that everyone will agree that now is a good time to get started 
> on them.  There shouldn't be any problem about having them present 
> along with the freezer, and then it will be all that much easier to 
> remove the freezer later on if tha

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Pavel Machek

Hi!

> > I'm all for changing this infrastructure, but in an organized way (ie. we
> > discuss what to do next, we do that and then we go to the next step) and in
> > the order that everyone will be comfortable with.
> > 
> > So, let's finish this thread and start over from discussing what needs to be
> > done, how (ie. in what order etc.) we are going to do that and who is going 
> > to
> > do what.  Shall we?
> 
> IMO we should start by using the new notifier chain and by implementing 
> a central "icebox" routine.  Then we can forbid device registration 
> during suspend.
> 
> It might also be a good idea to add a freezable keventd-like workqueue 
> specifically intended for things that need to block during a suspend.  
> Although maybe this will end up being unnecessary; it's too soon to 
> tell.

We actually had patches for freezeable workqueues at one point.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Pavel Machek

Hi!

> > > I just think that the freezer approach, as it is, is backward. We can't
> > > have a 3rd party try to discriminate what to freeze and what not, it
> > > will always get something wrong, and in some cases with the wrong timing
> > > or ordering.
> > 
> > Nice discussion, except for one thing: the freezer doesn't decide what to
> > freeze.  For example, even right now kernel threads decide if they want to 
> > be
> > frozen.
> 
> Somewhat... userspace doesn't and workqueues are a gray area.

But userspace must not be neccessary for kernel functioning, so that's
quite okay. And we do need to solve the workqueues.

> Also, I've been thinking this "icebox" idea a bit more and it seems in
> fact a bit racy in some areas, at least for use by things like drivers,
> unless we end up doing something aking to an RCU on suspend, waiting for
> all tasks to reach userland once, but that has the same annoyances as
> the current freezer.
> 
> Thus I'm tempted to go back to saying that driver can handle things
> locally :-)

:-). Or perhaps freezer is not _that_ evil after all?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Benjamin Herrenschmidt

On Sun, 2007-07-08 at 21:15 +0200, Rafael J. Wysocki wrote:
> On Sunday, 8 July 2007 07:14, Benjamin Herrenschmidt wrote:
> [--snip--]
> > 
> > I just think that the freezer approach, as it is, is backward. We can't
> > have a 3rd party try to discriminate what to freeze and what not, it
> > will always get something wrong, and in some cases with the wrong timing
> > or ordering.
> 
> Nice discussion, except for one thing: the freezer doesn't decide what to
> freeze.  For example, even right now kernel threads decide if they want to be
> frozen.

Somewhat... userspace doesn't and workqueues are a gray area.

Also, I've been thinking this "icebox" idea a bit more and it seems in
fact a bit racy in some areas, at least for use by things like drivers,
unless we end up doing something aking to an RCU on suspend, waiting for
all tasks to reach userland once, but that has the same annoyances as
the current freezer.

Thus I'm tempted to go back to saying that driver can handle things
locally :-)

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Pavel Machek

Hi!

> > There are two things I believe. There's a generic issue with usermode
> > helpers that make no sense to call between pre-suspend and
> > post-resume, and there's the specific issue of adding/removing
> > devices.
> > 
> > I believe that "bus" drivers such as USB should indeed get a first
> > round of notifications to tell them to stop performing bus
> > plug/unplug operations (it's debatable whether we want to keep unplug
> > going provided we can stack up the usermode events and re-send them
> > later though, but let's say no for the sake of simplicity).
> 
> Yes.  Rafael, how close is your new notifier chain to mainline?  Can it 
> at least be added to Greg KH's development tree so that I can start 
> using it?

It should be in -mm, IIRC.

> > I think it's a fairly significant change from the current freezer and I
> > also think it's a very good idea. The more I think about it, the more I
> > like it, in the sense that it's a simple drop-in that you could put in a
> > lot of the ioctl path of drivers to just block tasks that are trying to
> > call in while suspending, and could be used selectively by things like
> > the USB hub threads.
> 
> That's what I had in mind.  Rafael, can we add an "icebox" routine?  
> Like Ben says, it doesn't need to be much more than a waitqueue
> that the current task puts itself on if a suspend is in progress.  
> Callers arriving at a time when the icebox isn't activated should
> simply return without blocking.  Basically the icebox should be active 
> at the same times as the existing freezer.

You could use try_to_freeze(), but you want it to be separate routine
so it can be grepped for... Can you #define icebox_me try_to_freeze
for testing?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 21:50, Miklos Szeredi wrote:
> > > > Well, fix userspace filesystems and maybe NFS. If they react to
> > > > sigstop in timely manner, they will work with suspend properly, too.
> > > 
> > > Which is pretty much impossible, given the unix filesystem API.  To be
> > > able to react to sigstop, the operations in question need to be
> > > restartable.  Which they are not, so they can't react to sigstop.  End
> > > of story.
> > 
> > Or not.  That depends on your willingness to cooperate, I'd say. :-)
> 
> Do you actually understand what I'm talking about?  Because it sure
> doesn't depend on my cooperation.
>
> Maybe I'm stupid, and I'm missing something obvious.  In that case
> please explain how you propose to make filesystem operations, like
> rename() restartable.

I'm not proposing that, sorry.  I'm not a filesystem expert, so don't think
I can propose a working solution here, right now.  Still, maybe something
like in fork.c, line 1409-1411 might work (I know that it won't solve the
"other tasks may be waiting on VFS mutexes" issue, but at least it may
decrease the probability of freezer failure).

> > > You may not like the fact that one process can cause another to go
> > > into uninterruptible sleep, but in fact there's nothing wrong with
> > > that.
> > 
> > Well, this introduces interdependencies between processes that do not exist
> > otherwise.  Even if that isn't wrong per se, it's something that needs
> > consideration in any case.
> > 
> > IMO, FUSE breaks one of the assumptions that the freezer is based on and
> > saying that the freezer is broken because of that is unfair.
> 
> The freezer is not broken because of that, it's broken anyway.  What
> we are seeing is a _symptom_ of it being broken.
> 
> And by broken, I don't mean it's buggy or that it was badly designed.
> I just mean, that it's simply not what suspend should depend on, to
> protect drivers.

Please have a look at the documentation update at the bottom of this patch:

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc7/patches/15-freezer-make-kernel-threads-nonfreezable-by-default.patch

It says what the freezer is for in the first place. :-)
 
> > > So the fact that the freezer can't handle this is unfortunate, but
> > > it's just a symptom of the brokenness of it, not something that fuse
> > > introduced.  Not being able to suspend with NFS (or other network
> > > filesystems) when the network is lost shows that this is a deeper
> > > problem.
> > 
> > Well, the system that cannot access its filesystems is not in a consistent
> > state, so it generally is not reasonable to suspend or hibernate it.
> 
> Saying the system must be in a "consistent" state, and defining
> consistent as "every process is stopped", is just an arbitrary
> limitation that fits what the freezer does now.  Yes the _hardware_
> state must be consistent, but that has nothing to do with either fuse
> or NFS.  Can't you see that?

Well, I can agree as far as FUSE is concerned.  Still, imagine that you have
an NFS share mounted, which is unavailable at the moment and one of the
tasks waits for an I/O on it to complete (it might hold some locks needed
for suspending, but say it doesn't).  Then you suspend, take the box somewhere
else and attach to a different network with a different NFS server.  Now, you
resume and you have a mess to recover from.  Yes, it _should_ be recoverable,
but refusing to suspend in such a case is not okerkill, IMHO.

> > > As stated otherwise in the thread, suspend2 in fact allowed processes
> > > to be in uninterruptible sleep instead, without negative side effects.
> > 
> > And yet, Nigel thinks that the freezer is necessary for the hibernation.
> > Strange, no?
> 
> I'm totally ignorant about why the freezer is necessary for hibernate.
> Please explain.

See above. :-)

> Yes, we need to make sure, that nothing is scheduled during (and
> possibly after) taking the snapshot.  But AFAICS that could be
> achieved by unplugging all but one CPU.

Actually, we want things to get scheduled, because we need some of them to
save the image (to make things more difficult, we don't know what will be
needed to save the image in advance ;-)).

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Pavel Machek

Hi!

> But in a whole lot of cases, it's, I beleive, perfectly kosher to just
> return an error. You're trying to capture frame from your camera while
> the machine is suspended ? error. At worst, your capture app will be
> unhappy when you wakeup, nothing terrible and totally fixable in
> userland if it's a problem.

Well, that way you'd have to teach applications about suspend... Which
is quite bad.  You mentioned it -- returning random errors will be
very bad for machines like OLPC that want to suspend
automatically. Plus it is a step back from current implementation, and
ABI change, too...

> > So instead, why not have the PM core take care of all this?  There
> > could be a block_task_until_suspend_is_over() routine available for all
> > drivers to use.  Its effect would be exactly the same as sending the
> > current task into the freezer, but it wouldn't be the freezer that
> > exists now.  It would just be some routine that blocks until the system 
> > suspend is over.  We could call it "the icebox" instead of "the 
> > freezer".  :-)
> 
> I'm not totally sure about that. I like some of it, but I think it's
> fairly different conceptually from the freezer (and the implementation
> could be as trivial as a single system wide wait queue). 
> 
> Basically it has a very big difference to the current freezer, and I
> like that, which is that we don't have some 3rd party trying to find out
> what to freeze and what not (the freezer), but instead, we have
> explicitely drivers or kernel threads sending -themselves- to the
> "icebox" when they think it's a good idea. Think of it as lazy
> freezing

Kernel threads already send _themselves_ to the refrigerator. [Plus we
put all the userland there, which is what you don't like, but kernel
can not rely on userland after suspend starts, anyway, so it should
not hurt].

Anyway.. PPC currently suspends without freezer, which puts rules on
drivers. ("Must handle i/o requests after .suspend() method is ran,
must not use GFP_KERNEL to do so, must not try to synchronously
communicate with userspace before _all_ devices are unfrozen") I am
not certain what the exact rules are, but you seem to know them. Could
we get Doc*/power/suspend_wo_freezer.txt describing them for driver
authors? That way we can make sure drivers work on ppc, too, and maybe
get rid of freezer in the long run.

> > You also agree that kernel threads and workqueues must be allowed to 
> > operate during suspend.
> 
> Yes, unless kernel threads explicitely decide to stop themselves (for
> example, khubd is a good candidate for that). Again, not a 3rd party
> trying to decide what to freeze and what not, but the drivers or kernel
> threads themselves deciding it.

This is how it works currently in -mm.

(Plus, the rule is that threads that decide _not to_ stop themselves
should not do any I/O.)
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Alan Stern

On Sun, 8 Jul 2007, Rafael J. Wysocki wrote:

> I'm all for changing this infrastructure, but in an organized way (ie. we
> discuss what to do next, we do that and then we go to the next step) and in
> the order that everyone will be comfortable with.
> 
> So, let's finish this thread and start over from discussing what needs to be
> done, how (ie. in what order etc.) we are going to do that and who is going to
> do what.  Shall we?

IMO we should start by using the new notifier chain and by implementing 
a central "icebox" routine.  Then we can forbid device registration 
during suspend.

It might also be a good idea to add a freezable keventd-like workqueue 
specifically intended for things that need to block during a suspend.  
Although maybe this will end up being unnecessary; it's too soon to 
tell.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Alan Stern

On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote:

> Note that we could have also a per-device "icebox" (just a waitqueue).
> Might be nice for a device to resume whatever it froze just after it
> resumed itself, might even be necessary in case whatever thread got
> frozen is necessary for handling child devices.

This wouldn't need any special infrastructure.  Just a regular 
waitqueue defined in the device's private data structure.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Alan Stern

I'll make this reply short by agreeing up front with most of what you 
say.

On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote:

> But that's only the "main" path. Aside for that, almost all drivers also
> have sideband "request" input and some driver don't actually live behind
> a subsystem. That ranges from ioctl, to direct read/write on a char dev
> from userland.

Yes, these are the problem cases.

> I think many of those cases can fairly well deal with just taking a PM
> semaphore, that's how I did for a couple of things in the past, provided
> that the request path isn't deadlocking with the semaphore held because
> of the system suspending of course.  

That's what USB does as well (for the drivers which have runtime PM
support -- at the moment only a few of them).

> But in a whole lot of cases, it's, I beleive, perfectly kosher to just
> return an error. You're trying to capture frame from your camera while
> the machine is suspended ? error. At worst, your capture app will be
> unhappy when you wakeup, nothing terrible and totally fixable in
> userland if it's a problem.

We can try falling back on this approach for now.  If the drivers are
smart enough to fail cleanly when the device is already suspended, it
should work.

But I'm not sure it's a good idea in the long run.  Think of a printer 
daemon, for example.  It shouldn't have to experience unexpected I/O 
problems merely because someone has decided to put the system to sleep.

> In some cases, we could use a little bit more help from the subsystem.
> Network for example, could have some explicit knowledge of the suspend
> state, and in addition to stopping the queue would also stop calling
> into things like change_mtu or set_multicast, provided it's agreed that
> the driver will account for those changes on resume (the actual MTU
> values or multicast lists are still updated in the netdev).

This will be up to the people responsible for the subsystems.  I can 
take care of USB.

> There are two things I believe. There's a generic issue with usermode
> helpers that make no sense to call between pre-suspend and
> post-resume, and there's the specific issue of adding/removing
> devices.
> 
> I believe that "bus" drivers such as USB should indeed get a first
> round of notifications to tell them to stop performing bus
> plug/unplug operations (it's debatable whether we want to keep unplug
> going provided we can stack up the usermode events and re-send them
> later though, but let's say no for the sake of simplicity).

Yes.  Rafael, how close is your new notifier chain to mainline?  Can it 
at least be added to Greg KH's development tree so that I can start 
using it?

> > So instead, why not have the PM core take care of all this?  There
> > could be a block_task_until_suspend_is_over() routine available for all
> > drivers to use.  Its effect would be exactly the same as sending the
> > current task into the freezer, but it wouldn't be the freezer that
> > exists now.  It would just be some routine that blocks until the system 
> > suspend is over.  We could call it "the icebox" instead of "the 
> > freezer".  :-)
> 
> I'm not totally sure about that. I like some of it, but I think it's
> fairly different conceptually from the freezer (and the implementation
> could be as trivial as a single system wide wait queue). 

Exactly.

> Basically it has a very big difference to the current freezer, and I
> like that, which is that we don't have some 3rd party trying to find out
> what to freeze and what not (the freezer), but instead, we have
> explicitely drivers or kernel threads sending -themselves- to the
> "icebox" when they think it's a good idea. Think of it as lazy freezing
> -> you only freeze lazy tasks that are trying to do something that
> cannot be done because of suspend.
> 
> > Does that make you happier?
> 
> I think it's a fairly significant change from the current freezer and I
> also think it's a very good idea. The more I think about it, the more I
> like it, in the sense that it's a simple drop-in that you could put in a
> lot of the ioctl path of drivers to just block tasks that are trying to
> call in while suspending, and could be used selectively by things like
> the USB hub threads.

That's what I had in mind.  Rafael, can we add an "icebox" routine?  
Like Ben says, it doesn't need to be much more than a waitqueue
that the current task puts itself on if a suspend is in progress.  
Callers arriving at a time when the icebox isn't activated should
simply return without blocking.  Basically the icebox should be active 
at the same times as the existing freezer.

> > Of course, another possibility is simply to fail the bind.  But that's 
> > not very satisfying, since suspends should be transparent.
> 
> I don't think suspend has to be -that- transparent (though there is some
> debate on whether it should be if we're gonna do some kind of fast
> "light" suspend for things like OLPC) but overall, I agree that a bind
> operatio

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Miklos Szeredi

> > > Well, fix userspace filesystems and maybe NFS. If they react to
> > > sigstop in timely manner, they will work with suspend properly, too.
> > 
> > Which is pretty much impossible, given the unix filesystem API.  To be
> > able to react to sigstop, the operations in question need to be
> > restartable.  Which they are not, so they can't react to sigstop.  End
> > of story.
> 
> Or not.  That depends on your willingness to cooperate, I'd say. :-)

Do you actually understand what I'm talking about?  Because it sure
doesn't depend on my cooperation.

Maybe I'm stupid, and I'm missing something obvious.  In that case
please explain how you propose to make filesystem operations, like
rename() restartable.

> > You may not like the fact that one process can cause another to go
> > into uninterruptible sleep, but in fact there's nothing wrong with
> > that.
> 
> Well, this introduces interdependencies between processes that do not exist
> otherwise.  Even if that isn't wrong per se, it's something that needs
> consideration in any case.
> 
> IMO, FUSE breaks one of the assumptions that the freezer is based on and
> saying that the freezer is broken because of that is unfair.

The freezer is not broken because of that, it's broken anyway.  What
we are seeing is a _symptom_ of it being broken.

And by broken, I don't mean it's buggy or that it was badly designed.
I just mean, that it's simply not what suspend should depend on, to
protect drivers.

> > So the fact that the freezer can't handle this is unfortunate, but
> > it's just a symptom of the brokenness of it, not something that fuse
> > introduced.  Not being able to suspend with NFS (or other network
> > filesystems) when the network is lost shows that this is a deeper
> > problem.
> 
> Well, the system that cannot access its filesystems is not in a consistent
> state, so it generally is not reasonable to suspend or hibernate it.

Saying the system must be in a "consistent" state, and defining
consistent as "every process is stopped", is just an arbitrary
limitation that fits what the freezer does now.  Yes the _hardware_
state must be consistent, but that has nothing to do with either fuse
or NFS.  Can't you see that?

> > As stated otherwise in the thread, suspend2 in fact allowed processes
> > to be in uninterruptible sleep instead, without negative side effects.
> 
> And yet, Nigel thinks that the freezer is necessary for the hibernation.
> Strange, no?

I'm totally ignorant about why the freezer is necessary for hibernate.
Please explain.

Yes, we need to make sure, that nothing is scheduled during (and
possibly after) taking the snapshot.  But AFAICS that could be
achieved by unplugging all but one CPU.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 07:14, Benjamin Herrenschmidt wrote:
[--snip--]
> 
> I just think that the freezer approach, as it is, is backward. We can't
> have a 3rd party try to discriminate what to freeze and what not, it
> will always get something wrong, and in some cases with the wrong timing
> or ordering.

Nice discussion, except for one thing: the freezer doesn't decide what to
freeze.  For example, even right now kernel threads decide if they want to be
frozen.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 06:39, Benjamin Herrenschmidt wrote:
> 
> > In my defense, you should realize that until Rafael's notifier chain
> > was added (just a few weeks ago, still not in mainline I believe) there
> > was no other way to do it.  Plug activity needs to be stopped before
> > the child devices are suspended, and the PM core does not send any
> > notification to drivers at that time.  All it does is activate the 
> > freezer.
> 
> That's true. That was one of the reason I've always wanted the
> pre-suspend and post-resume hooks. (I prefer keeping the ordering there
> too, rather than a notifier, but a notifier is fine I suppose).
> 
> Among the clients we want here the firmware stuff, the allocators,
> etc... to get themselves in conditions that won't deadlock during
> suspend cycle.
> 
> I think that's a much bigger issue overall than freezer vs. no
> freezer :-)

Yes.  The freezer is only one part of the PM infrastructure that we have
currently and that is not very sophisticated, so to speak.

I'm all for changing this infrastructure, but in an organized way (ie. we
discuss what to do next, we do that and then we go to the next step) and in
the order that everyone will be comfortable with.

So, let's finish this thread and start over from discussing what needs to be
done, how (ie. in what order etc.) we are going to do that and who is going to
do what.  Shall we?

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Al Viro

On Sun, Jul 08, 2007 at 08:08:56PM +0200, Rafael J. Wysocki wrote:
> Well, the system that cannot access its filesystems is not in a consistent
> state, so it generally is not reasonable to suspend or hibernate it.
> 
> In fact, NFS and similar filesystems should always be unmounted before the
> suspend/hibernation to avoid problems that may arise after the resume, if
> they become unavailable at that time.

Ah, thanks for clarifying that. [mentally putting the entire thing into
"unusable due to idiotic design imposing idiotic limitations" bin and
stepping out of this thread]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 04:24, Alan Stern wrote:
> On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote:
> 
> > > Which is exactly my point.  It _doesn't_ work fine without a freezer, 
> > > because the USB stack currently relies on the freezer to prevent plug 
> > > activity.
> > 
> > Putting on hold plug activity has nothing, NOTHING, to do with the half
> > assed piece of deadlocking crap we have now we call a freezer.
> 
> You're wrong; it _does_ have something to do with the freezer.  The 
> connection is that the code uses the freezer to put plug activity on 
> hold.
> 
> You probably meant to say that it _should_ have nothing to do with the 
> freezer.  That's a different matter.  But in any case you should write 
> what you actually mean, rather than just putting down something as 
> inflammatory as possible.
> 
> > As long as you guys keep mixing up all the issues and coming up with
> > totally bogus solutions that cannot work, we won't have a useful suspend
> > (either to RAM or to disk) in linux.
> 
> In my defense, you should realize that until Rafael's notifier chain
> was added (just a few weeks ago, still not in mainline I believe) there
> was no other way to do it.  Plug activity needs to be stopped before
> the child devices are suspended, and the PM core does not send any
> notification to drivers at that time.  All it does is activate the 
> freezer.

Exactly.

And if we really want to get rid of the freezer, we need to _prepare_ for
that.  At least, we need to have an idea of how we will do thing when the
freezer is not there.

For example, how will we check if the system is not in a state in which the
resume is likely to fail?

There are more questions like that, IMO, and we should at least have a list
of them _before_ we remove the freezer.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 02:42, Benjamin Herrenschmidt wrote:
> On Sat, 2007-07-07 at 12:17 -0400, Alan Stern wrote:
> > On Sat, 7 Jul 2007, Benjamin Herrenschmidt wrote:
> > 
> > > > > And guess what ? It's what we do on powerbooks, and it works fine,
> > > > > without a freezer :-)
> > 
> > > If you remember, one of the things I've been advocating has always been
> > > that we should put on hold all plug activity (unplug might be alright as
> > > long as the user events are just delayed) when we start suspending. No
> > > new devices, no new bindings. "hub" type devices are respondible for
> > > bringing in the new stuff after resume.
> > 
> > Which is exactly my point.  It _doesn't_ work fine without a freezer, 
> > because the USB stack currently relies on the freezer to prevent plug 
> > activity.
> 
> Putting on hold plug activity has nothing, NOTHING, to do with the half
> assed piece of deadlocking crap we have now we call a freezer.

Well, no matter how much bad names you will call it, that's what in use
right now and it's not _that_ easy to get rid of.

I'm perfectly fine with readying drivers etc. for dropping the freezer, but
in the meantime bugs in the freezer need to be fixed, if possible.

And if it deadlocks, there is a bug in it.

> As long as you guys keep mixing up all the issues and coming up with
> totally bogus solutions that cannot work, we won't have a useful suspend
> (either to RAM or to disk) in linux. 

Discussion at this level would be totally counterproductive and wouldn't lead
to anything constructive, so let me avoid commenting your abusive remarks.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 16:23, Miklos Szeredi wrote:
> > > > > > We can just wait for all fuse requests to be serviced before
> > > > > > proceeding further with freeze, right?
> > > > > 
> > > > > Right.  Nice way to slow down or stop the suspend with an unprivileged
> > > > > process.  Avoiding that sort of DoS is one of the design goals of
> > > > > fuse.
> > > > 
> > > > So you want me to handle _malicious_ filesystems now?
> > > 
> > > What I'd like, is a suspend, that works reliably, regardless of the
> > > state of any userspace filesystem, network servers and such.
> > 
> > Well, fix userspace filesystems and maybe NFS. If they react to
> > sigstop in timely manner, they will work with suspend properly, too.
> 
> Which is pretty much impossible, given the unix filesystem API.  To be
> able to react to sigstop, the operations in question need to be
> restartable.  Which they are not, so they can't react to sigstop.  End
> of story.

Or not.  That depends on your willingness to cooperate, I'd say. :-)

> > > > That should be easy... :-). You already have nasty deadlocks in FUSE,
> > > > and you solve them by "root can echo 1 > abort"... so allow me the
> > > > same possibility.
> > > > 
> > > > We can tell fused we are freezing, and if all the requests are not
> > > > serviced within, say, 30 seconds, we call the filesystem malicious and
> > > > do echo 1 > abort.
> > > 
> > > Arbitrary time limits, nice.  Not.
> > > 
> > > This freezer is like an old house that's close to collapsing, and you
> > 
> > Nice way to have useful discussion. Not.
> 
> Sorry, didn't mean to offend you.
> 
> > Look, Linux was not designed with malicious filesystems in mind. In
> > particular, suspend was not designed with malicious filesystems in
> > mind, and VFS was not designed with malicious filesystems in mind.
> > That's why you have the nastiness with deadlocks... which you are
> > unable/unwilling to solve because you'd have to redesign VFS and meet
> > Al Viro.
> 
> No, I don't think we need to redesign the VFS, and also I think the
> malicious filesystem thing has been adequately taken care of in fuse.
> 
> You may not like the fact that one process can cause another to go
> into uninterruptible sleep, but in fact there's nothing wrong with
> that.

Well, this introduces interdependencies between processes that do not exist
otherwise.  Even if that isn't wrong per se, it's something that needs
consideration in any case.

IMO, FUSE breaks one of the assumptions that the freezer is based on and
saying that the freezer is broken because of that is unfair.

> The effects are localized to the mount owner, it won't cause 
> any system-wide ill, it's in fact no worse than a malicious process
> doing ptrace().
> 
> And as explained above it's unavoidable due to the well established
> userspace API.  The _point_ of fuse is to let this API be used by
> unmodified programs and the filesystem be provided by a userspace
> process.  After showing me the right way to do this with Podfuk, this
> should not come as a surprise to you.
> 
> So the fact that the freezer can't handle this is unfortunate, but
> it's just a symptom of the brokenness of it, not something that fuse
> introduced.  Not being able to suspend with NFS (or other network
> filesystems) when the network is lost shows that this is a deeper
> problem.

Well, the system that cannot access its filesystems is not in a consistent
state, so it generally is not reasonable to suspend or hibernate it.

In fact, NFS and similar filesystems should always be unmounted before the
suspend/hibernation to avoid problems that may arise after the resume, if
they become unavailable at that time.

> And for some reason you seem not to accept that.  You think that the
> problem is with fuse, NFS, CIFS, whatever, and not the freezer, when
> in fact it's quite clear, that neither of the above should have
> anything to do with power management.

IMHO, it's not _that_ clear.

> Even if it was possible to fix them, it would still be just fixing the
> symptoms. 

I don't think we need to fix the network filesystems.

If there's a mounted filesystem and we have at least one process waiting for
it to become available in the D state (ie. uninterruptible), then the system is
not in a consistent state and should not be suspended.

> > Now.. freezer already includes timeout; if some part of kernel knows
> > nothing about suspend or had crashed, it will abort after some time,
> > telling you which part to blame. Another timeout for detecting
> > malicious userspace filesystems will not hurt much in this old house.
> > (Maybe you don't want to auto abort them, just tell user what happened.
> > 
> > We are stuck with refrigerator for now, and at least for hibernation,
> > I don't see any feasible alternative.
> 
> Even for hibernation, I don't see, why we would need all processes
> being effectively in a stopped state.
> 
> As stated otherwise in the thread, suspend2 in fact allowed processes
> to b

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread David Brownell

On Sunday 08 July 2007, Al Viro wrote:
> On Sun, Jul 08, 2007 at 12:37:48PM +, Pavel Machek wrote:
> > I'm talking malicious _filesystems_ here, and yes, fuse is first of
> > this kind. We want to handle unresponding NFS, but I believe handling
> > malicious NFS server nicely is slightly out of scope.
> 
> If your variant doesn't handle compromised NFS server, your variant is
> needs to be fixed...

That would depend on the type of compromise, right?

Remember that the fundamental contract between a client and
a server includes the client extending some trust to that
server.  Whether it's appropriate to extend that trust (at
any given moment) is an out-of-band security issue.  Trust
is not a protocol issue ... more like an operational issue,
and only slightly an implementation issue.

A malicious filesystem could do many things.  It could send
private data off to someone who wasn't intended to receive
that data.  It could return falsified data.  It could do any
variety of things outside the protocol specification...

And *MOST* of those would be impractical to defend against
in code.  Which is why I have such a hard time agreeing
with your comment about "fixing" a client that doesn't try
to defend itself.  (Unless by "client" you also include the
whole operational side of the client, including regular
re-validation of the trust extended to that server... which
would at best minimize damage caused by compromises.)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Miklos Szeredi

> > > > > We can just wait for all fuse requests to be serviced before
> > > > > proceeding further with freeze, right?
> > > > 
> > > > Right.  Nice way to slow down or stop the suspend with an unprivileged
> > > > process.  Avoiding that sort of DoS is one of the design goals of
> > > > fuse.
> > > 
> > > So you want me to handle _malicious_ filesystems now?
> > 
> > What I'd like, is a suspend, that works reliably, regardless of the
> > state of any userspace filesystem, network servers and such.
> 
> Well, fix userspace filesystems and maybe NFS. If they react to
> sigstop in timely manner, they will work with suspend properly, too.

Which is pretty much impossible, given the unix filesystem API.  To be
able to react to sigstop, the operations in question need to be
restartable.  Which they are not, so they can't react to sigstop.  End
of story.

> > > That should be easy... :-). You already have nasty deadlocks in FUSE,
> > > and you solve them by "root can echo 1 > abort"... so allow me the
> > > same possibility.
> > > 
> > > We can tell fused we are freezing, and if all the requests are not
> > > serviced within, say, 30 seconds, we call the filesystem malicious and
> > > do echo 1 > abort.
> > 
> > Arbitrary time limits, nice.  Not.
> > 
> > This freezer is like an old house that's close to collapsing, and you
> 
> Nice way to have useful discussion. Not.

Sorry, didn't mean to offend you.

> Look, Linux was not designed with malicious filesystems in mind. In
> particular, suspend was not designed with malicious filesystems in
> mind, and VFS was not designed with malicious filesystems in mind.
> That's why you have the nastiness with deadlocks... which you are
> unable/unwilling to solve because you'd have to redesign VFS and meet
> Al Viro.

No, I don't think we need to redesign the VFS, and also I think the
malicious filesystem thing has been adequately taken care of in fuse.

You may not like the fact that one process can cause another to go
into uninterruptible sleep, but in fact there's nothing wrong with
that.  The effects are localized to the mount owner, it won't cause
any system-wide ill, it's in fact no worse than a malicious process
doing ptrace().

And as explained above it's unavoidable due to the well established
userspace API.  The _point_ of fuse is to let this API be used by
unmodified programs and the filesystem be provided by a userspace
process.  After showing me the right way to do this with Podfuk, this
should not come as a surprise to you.

So the fact that the freezer can't handle this is unfortunate, but
it's just a symptom of the brokenness of it, not something that fuse
introduced.  Not being able to suspend with NFS (or other network
filesystems) when the network is lost shows that this is a deeper
problem.

And for some reason you seem not to accept that.  You think that the
problem is with fuse, NFS, CIFS, whatever, and not the freezer, when
in fact it's quite clear, that neither of the above should have
anything to do with power management.  Even if it was possible to fix
them, it would still be just fixing the symptoms.

> Now.. freezer already includes timeout; if some part of kernel knows
> nothing about suspend or had crashed, it will abort after some time,
> telling you which part to blame. Another timeout for detecting
> malicious userspace filesystems will not hurt much in this old house.
> (Maybe you don't want to auto abort them, just tell user what happened.
> 
> We are stuck with refrigerator for now, and at least for hibernation,
> I don't see any feasible alternative.

Even for hibernation, I don't see, why we would need all processes
being effectively in a stopped state.

As stated otherwise in the thread, suspend2 in fact allowed processes
to be in uninterruptible sleep instead, without negative side effects.

The current freezer does too much for suspend and for hibernate too.

> > Malicious programs are not something specific to fuse.  A lot of the
> > multiuser/multitasking OS design is about isolating things, so such a
> > program is limited in the damage it can do.
> 
> I'm talking malicious _filesystems_ here, and yes, fuse is first of
> this kind. We want to handle unresponding NFS, but I believe handling
> malicious NFS server nicely is slightly out of scope.

It doesn't have to be malicious, it's enough if the server crashes, or
the network connection is lost.

Do we want to maintain the status quo, just because we can live with
it?

> > > Not nice, but we don't know any better for now. "Just fix all the
> > > drivers" basically means "just fix 90% of kernel".
> > 
> > And how much of that 90% currently has any power management?
> 
> Anything that's used in today's notebooks, I'd say...

Which is what, 10% of all the drivers?  Then it really not as bad as
you try to make it sound.  And with the late suspend call (whatever it
does) that can take care of most of those, it really becomes just a
few drivers and subsystems to fix.

Miklos
-
To uns

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Al Viro

On Sun, Jul 08, 2007 at 12:37:48PM +, Pavel Machek wrote:
> I'm talking malicious _filesystems_ here, and yes, fuse is first of
> this kind. We want to handle unresponding NFS, but I believe handling
> malicious NFS server nicely is slightly out of scope.

If your variant doesn't handle compromised NFS server, your variant is
needs to be fixed...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Rafael J. Wysocki

On Sunday, 8 July 2007 09:21, Miklos Szeredi wrote:
> > > > We can just wait for all fuse requests to be serviced before
> > > > proceeding further with freeze, right?
> > > 
> > > Right.  Nice way to slow down or stop the suspend with an unprivileged
> > > process.  Avoiding that sort of DoS is one of the design goals of
> > > fuse.
> > 
> > So you want me to handle _malicious_ filesystems now?
> 
> What I'd like, is a suspend, that works reliably, regardless of the
> state of any userspace filesystem, network servers and such.

And then you also would like to have a consistent state of the system after
the resume, but if the state were inconsistent before the suspend that
would be difficult to get.

> > That should be easy... :-). You already have nasty deadlocks in FUSE,
> > and you solve them by "root can echo 1 > abort"... so allow me the
> > same possibility.
> > 
> > We can tell fused we are freezing, and if all the requests are not
> > serviced within, say, 30 seconds, we call the filesystem malicious and
> > do echo 1 > abort.
> 
> Arbitrary time limits, nice.  Not.
>
> This freezer is like an old house that's close to collapsing, and you
> are basically just thinking of where to prop it up further.  To
> continute this brilliant analogy, Rafael's patch at least demolishes
> the worst part of the house, where bricks are already falling on our
> head ;)

Actually, this isn't correct and I have some testing problems with this
patch (as I expected, BTW), so it's not going anywhere.  Alternative ideas
will be appreciated. :-)

> > Not ideal, but neither is allowing malicious filesystems in the first
> > place...
> 
> Malicious programs are not something specific to fuse.  A lot of the
> multiuser/multitasking OS design is about isolating things, so such a
> program is limited in the damage it can do.
>
> > > Look at it this way: the task of the freezer is to stop new I/O
> > > hitting the hardware.  But it is totally indiscriminate about what it
> > > stops, it tries to stop _everything_ even things which have nothing to
> > > do with hardware.
> > > 
> > > Not nice.
> > 
> > Not nice, but we don't know any better for now. "Just fix all the
> > drivers" basically means "just fix 90% of kernel".
> 
> And how much of that 90% currently has any power management?

Well, you know why that actually started to be a real problem?  Because
sufficiently many people have started to use suspend/hibernation and it
actually works for quite a lot of them.  This makes people to try exotic
combinations like suspend+FUSE and they find that doesn't work.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Pavel Machek

Hi!

> > > > We can just wait for all fuse requests to be serviced before
> > > > proceeding further with freeze, right?
> > > 
> > > Right.  Nice way to slow down or stop the suspend with an unprivileged
> > > process.  Avoiding that sort of DoS is one of the design goals of
> > > fuse.
> > 
> > So you want me to handle _malicious_ filesystems now?
> 
> What I'd like, is a suspend, that works reliably, regardless of the
> state of any userspace filesystem, network servers and such.

Well, fix userspace filesystems and maybe NFS. If they react to
sigstop in timely manner, they will work with suspend properly, too.

> > That should be easy... :-). You already have nasty deadlocks in FUSE,
> > and you solve them by "root can echo 1 > abort"... so allow me the
> > same possibility.
> > 
> > We can tell fused we are freezing, and if all the requests are not
> > serviced within, say, 30 seconds, we call the filesystem malicious and
> > do echo 1 > abort.
> 
> Arbitrary time limits, nice.  Not.
> 
> This freezer is like an old house that's close to collapsing, and you

Nice way to have useful discussion. Not.

Look, Linux was not designed with malicious filesystems in mind. In
particular, suspend was not designed with malicious filesystems in
mind, and VFS was not designed with malicious filesystems in mind.
That's why you have the nastiness with deadlocks... which you are
unable/unwilling to solve because you'd have to redesign VFS and meet
Al Viro.

Now.. freezer already includes timeout; if some part of kernel knows
nothing about suspend or had crashed, it will abort after some time,
telling you which part to blame. Another timeout for detecting
malicious userspace filesystems will not hurt much in this old house.
(Maybe you don't want to auto abort them, just tell user what happened.

We are stuck with refrigerator for now, and at least for hibernation,
I don't see any feasible alternative.

And now, can we get that sysrq-t trace? Dealing with 'unable to
freeze' may be hard, but the deadlock Matthew saw should be simple bug
for us to fix.

> > Not ideal, but neither is allowing malicious filesystems in the first
> > place...
> 
> Malicious programs are not something specific to fuse.  A lot of the
> multiuser/multitasking OS design is about isolating things, so such a
> program is limited in the damage it can do.

I'm talking malicious _filesystems_ here, and yes, fuse is first of
this kind. We want to handle unresponding NFS, but I believe handling
malicious NFS server nicely is slightly out of scope.

> > Not nice, but we don't know any better for now. "Just fix all the
> > drivers" basically means "just fix 90% of kernel".
> 
> And how much of that 90% currently has any power management?

Anything that's used in today's notebooks, I'd say...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Off topically about Re: malicious filesystems (was Re: Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Oleg Verych

>> > > We can just wait for all fuse requests to be serviced before
>> > > proceeding further with freeze, right?
>> > 
>> > Right.  Nice way to slow down or stop the suspend with an unprivileged
>> > process.  Avoiding that sort of DoS is one of the design goals of
>> > fuse.
>> 
>> So you want me to handle _malicious_ filesystems now?
>
> What I'd like, is a suspend, that works reliably, regardless of the
> state of any userspace filesystem, network servers and such.

And this must be called sleep, deep sleep or something like that. Because
preparing for sleeping takes place, sleeping itself is a process, life
requires. Waking up, staying all day fresh and active is the needed goal.

Thus, whole freeze/ice idea was dead-born, IMHO :)

WRT this, busy-loop _waiting_ that was called sleep, usleep or
nanosleep is another wrong-named concept. I imagine somebody after one
minute sleep :)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway (philosophical)

2007-07-08 Thread Oleg Verych

* Alan Stern 

>> Why are you guys working so hard and spending so much energy to try to
>> avoid doing the right thing is beyond my understanding...
>> 
>> > It _does_ apply to kernel threads.  That's exactly why I wrote above 
>> > that kernel threads which try to do I/O during a suspend will need 
>> > extra attention.
>> 
>> Ok none at all if you don't have a freezer.
>
> In answer to both questions: We need the freezer in order to

Non of *complex* living creatures survive motionless freezing (with
ice).

That's why stupid me just don't care about all that stuff in the
Linux. It's nice to see somebody approaches this from technical, but not
off-topical POV :). 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-08 Thread Miklos Szeredi

> > > We can just wait for all fuse requests to be serviced before
> > > proceeding further with freeze, right?
> > 
> > Right.  Nice way to slow down or stop the suspend with an unprivileged
> > process.  Avoiding that sort of DoS is one of the design goals of
> > fuse.
> 
> So you want me to handle _malicious_ filesystems now?

What I'd like, is a suspend, that works reliably, regardless of the
state of any userspace filesystem, network servers and such.

> That should be easy... :-). You already have nasty deadlocks in FUSE,
> and you solve them by "root can echo 1 > abort"... so allow me the
> same possibility.
> 
> We can tell fused we are freezing, and if all the requests are not
> serviced within, say, 30 seconds, we call the filesystem malicious and
> do echo 1 > abort.

Arbitrary time limits, nice.  Not.

This freezer is like an old house that's close to collapsing, and you
are basically just thinking of where to prop it up further.  To
continute this brilliant analogy, Rafael's patch at least demolishes
the worst part of the house, where bricks are already falling on our
head ;)

> Not ideal, but neither is allowing malicious filesystems in the first
> place...

Malicious programs are not something specific to fuse.  A lot of the
multiuser/multitasking OS design is about isolating things, so such a
program is limited in the damage it can do.

> > Look at it this way: the task of the freezer is to stop new I/O
> > hitting the hardware.  But it is totally indiscriminate about what it
> > stops, it tries to stop _everything_ even things which have nothing to
> > do with hardware.
> > 
> > Not nice.
> 
> Not nice, but we don't know any better for now. "Just fix all the
> drivers" basically means "just fix 90% of kernel".

And how much of that 90% currently has any power management?

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-08 Thread Paul Mackerras

Alan Stern writes:

> In answer to both questions: We need the freezer in order to implement 
> hibernate.  Even if we take your advice and stop using the freezer 
> during suspend, these issues would still remain and would need to be 
> solved.

Stepping back for a minute, let's think about what the freezer is
trying to achieve.  I think that currently there are three basic
design goals:

A. Ensure that device drivers don't get I/O requests after being
   suspended.

B. Ensure that driver suspend routines don't end up blocking forever
   on mutexes or semaphores held by frozen tasks.

C. Provide a way to get an atomic snapshot of memory for hibernation.

Now, it's easy enough to freeze all processes (or all except one), if
you don't have goal B.  Just offline non-boot cpus and disable
interrupts, or use stop_machine().  But goal B implies that you can't
necessarily just stop all tasks wherever they are.  In fact it means
there are points where it is safe to stop a given task, and there may
be points where it isn't safe to stop it - and there is no practical
way to determine those points reliably.[1]

That implies to me that we can have a freezer as long as we do nothing
that can sleep, while tasks are frozen.  In other words, I think the
freezer is a viable option for hibernation as long as we restrict
driver hibernate routines to doing only things which don't sleep.  I
_think_ that should be doable since hibernate routines only need to
wait for outstanding DMAs to complete, as I understand it, which can
be done by polling.

Paul.

[1] For a start, there's no way to determine which mutexes a task
holds.  Even if there were, we would then also have to know which
mutexes it is going to try to acquire before it releases the one we
want, which is pretty much unknowable.

One can say "we'll only freeze kernel tasks that ask to be frozen" but
then one has no way to guarantee A, and we still don't reliably
guarantee B if we have user-level filesystems or device drivers.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Benjamin Herrenschmidt


> I think it's a fairly significant change from the current freezer and I
> also think it's a very good idea. The more I think about it, the more I
> like it, in the sense that it's a simple drop-in that you could put in a
> lot of the ioctl path of drivers to just block tasks that are trying to
> call in while suspending, and could be used selectively by things like
> the USB hub threads.

Note that we could have also a per-device "icebox" (just a waitqueue).
Might be nice for a device to resume whatever it froze just after it
resumed itself, might even be necessary in case whatever thread got
frozen is necessary for handling child devices.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Benjamin Herrenschmidt


> You agree that drivers need to block various activities during suspend.  
> Principally I/O requests, but other things as well.  So when one of 
> these requests arrives, the driver has to make it wait somehow and then 
> has to allow it to proceed at the appropriate time.

Yes. The main thing I see here is that here is nothing common among
drivers in what a "request" is and how it's processed.

For example, for block drivers, it's actually fairly simple to just stop
processing the request queue and wait for pending ones to complete.

For network, in a similar vein, we can mostly just tell the network
stack to stop sending us packets.

That's what I call the "main path". This is often the trivial part to
deal with, mostly because for a whole lot of drivers, it can be done via
a couple of helpers in the subsystem that the driver provides a service
too, via a helper, asking that subsystem to stop calling into the said
driver (the asking should be done by the driver itself of course, for
ordering reasons).

We have some helpers, but I think not enough, and that's where we should
focus imho. For example, I added fb_set_suspend() so that fbdev's can
request fbcon to stop accessing them (it doesn't solve the problem of
userland mmap's, that will have to be done, if we want to do it, in a
more sneaky way, using VM tricks, but the DRM nowadays has the
infrastructure to do it).

But that's only the "main" path. Aside for that, almost all drivers also
have sideband "request" input and some driver don't actually live behind
a subsystem. That ranges from ioctl, to direct read/write on a char dev
from userland.

I think many of those cases can fairly well deal with just taking a PM
semaphore, that's how I did for a couple of things in the past, provided
that the request path isn't deadlocking with the semaphore held because
of the system suspending of course.  

But in a whole lot of cases, it's, I beleive, perfectly kosher to just
return an error. You're trying to capture frame from your camera while
the machine is suspended ? error. At worst, your capture app will be
unhappy when you wakeup, nothing terrible and totally fixable in
userland if it's a problem.

In some cases, we could use a little bit more help from the subsystem.
Network for example, could have some explicit knowledge of the suspend
state, and in addition to stopping the queue would also stop calling
into things like change_mtu or set_multicast, provided it's agreed that
the driver will account for those changes on resume (the actual MTU
values or multicast lists are still updated in the netdev).

> Normally a waitqueue or a struct completion would be used for this
> purpose.  

I think there's no "normal" scenario, each driver or family of drivers
will do things very differently.

> But either one puts the burden on the driver of defining a
> data structure and signalling it at the right time.

Yes.

> That time is generally when the device is resumed, but there's nothing
> wrong with delaying it slightly, to after all the devices have been resumed 
> (i.e.,
> the time when the current PM code takes everything out of the freezer).

In fact, the best is to have parallel suspend/resume of drivers and 
asynchronous resume but that's out of topic :-) (For the record, I did some 
bits like ADB resume like that, that is asynchronous, to speed up wakeup time).
>   
> In fact, we definitely don't want to unblock plug events until this
> later time.

There are two things I believe. There's a generic issue with usermode helpers 
that make no sense to call between pre-suspend and post-resume, and there's the 
specific issue of adding/removing devices.

I believe that "bus" drivers such as USB should indeed get a first round of 
notifications to tell them to stop performing bus plug/unplug operations (it's 
debatable whether we want to keep unplug going provided we can stack up the 
usermode events and re-send them later though, but let's say no for the sake of 
simplicity).

> So instead, why not have the PM core take care of all this?  There
> could be a block_task_until_suspend_is_over() routine available for all
> drivers to use.  Its effect would be exactly the same as sending the
> current task into the freezer, but it wouldn't be the freezer that
> exists now.  It would just be some routine that blocks until the system 
> suspend is over.  We could call it "the icebox" instead of "the 
> freezer".  :-)

I'm not totally sure about that. I like some of it, but I think it's
fairly different conceptually from the freezer (and the implementation
could be as trivial as a single system wide wait queue). 

Basically it has a very big difference to the current freezer, and I
like that, which is that we don't have some 3rd party trying to find out
what to freeze and what not (the freezer), but instead, we have
explicitely drivers or kernel threads sending -themselves- to the
"icebox" when they think it's a good idea. Think of it as lazy freezing
-> you only free

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Benjamin Herrenschmidt


> In my defense, you should realize that until Rafael's notifier chain
> was added (just a few weeks ago, still not in mainline I believe) there
> was no other way to do it.  Plug activity needs to be stopped before
> the child devices are suspended, and the PM core does not send any
> notification to drivers at that time.  All it does is activate the 
> freezer.

That's true. That was one of the reason I've always wanted the
pre-suspend and post-resume hooks. (I prefer keeping the ordering there
too, rather than a notifier, but a notifier is fine I suppose).

Among the clients we want here the firmware stuff, the allocators,
etc... to get themselves in conditions that won't deadlock during
suspend cycle.

I think that's a much bigger issue overall than freezer vs. no
freezer :-)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Alan Stern

On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote:

> > (And rather than trying to manage a waitqueue or struct completion, it
> > would be easiest to jump directly into the freezer!  The driver or the 
> > core wouldn't have to worry about waking up all these blocked threads.)
> 
> That's wrong. The freezer is NOT a solution for that sort of thing. Just
> because you guys can't get your locking right.

You misunderstood.  Stop trying to incite riot, calm down, and pay
attention to what I actually wrote.  I'll explain it again in more
explicit terms:

You agree that drivers need to block various activities during suspend.  
Principally I/O requests, but other things as well.  So when one of 
these requests arrives, the driver has to make it wait somehow and then 
has to allow it to proceed at the appropriate time.

Normally a waitqueue or a struct completion would be used for this
purpose.  But either one puts the burden on the driver of defining a
data structure and signalling it at the right time.  That time is
generally when the device is resumed, but there's nothing wrong with
delaying it slightly, to after all the devices have been resumed (i.e.,
the time when the current PM code takes everything out of the freezer).  
In fact, we definitely don't want to unblock plug events until this
later time.

So instead, why not have the PM core take care of all this?  There
could be a block_task_until_suspend_is_over() routine available for all
drivers to use.  Its effect would be exactly the same as sending the
current task into the freezer, but it wouldn't be the freezer that
exists now.  It would just be some routine that blocks until the system 
suspend is over.  We could call it "the icebox" instead of "the 
freezer".  :-)

Does that make you happier?

> > > Face it, we should seriously look into doing suspend/resume without a
> > > freezer.
> > 
> > I'm willing to try, although I think it will be a tremendous amount of 
> > work to verify that every driver does the right thing.  There's lots of 
> > support missing.  For example, don't you think we should block all 
> > sysfs I/O during suspend?  And likewise for insmod/rmmod?
> 
> sysfs is a matter of driver. If a sysfs read/write callback in a driver
> is hitting the HW, it most certainly already has some kind of locking.
> That locking can/should be extended to deal with blocking when the HW is
> suspended.

User tasks can cause driver binding by writing to sysfs.  Binding 
_can't_ be blocked in the driver; by then it's already too late.  If 
it is going to be blocked at all, it has to be blocked earlier.  One 
possibility is in the sysfs attribute code; another is to block all 
sysfs access.

Of course, another possibility is simply to fail the bind.  But that's 
not very satisfying, since suspends should be transparent.

> However, since it seems that people universally consider it very hard to
> get right (I don't but heh), Linus and Paul have come up with a solution
> for most simple enough directly-mapped drivers such as PCI (ok, that
> doesn't include USB) which is to simply do the HW suspend in a late
> callback after IRQs are off, and not bother with the rest.

Ben, you haven't given enough thought to the work needed to avoid 
locking problems.

For instance, you agree that during suspend we must not allow device or
driver registration or unregistration, right?  And we must not allow
driver binding or unbinding.  But these events generally involve
acquiring a device semaphore, in the driver core and quite often in the
core's caller.  Since that semaphore is also needed for calling the
suspend and resume methods, we have to be very careful about blocking
binding/unbinding/registration/unregistration.  It has to be done at a 
time when no device semaphores are held.

You also agree that kernel threads and workqueues must be allowed to 
operate during suspend.  But consider this: By writing the appropriate 
sysfs attribute, a user task can cause a workqueue item to be queued to 
keventd that tries to unregister a device.  That really puts you on the 
spot: Unregistration can't be allowed to fail, it can't be allowed to 
succeed during a suspend, and keventd can't be blocked!  So what should 
we do?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Alan Stern

On Sun, 8 Jul 2007, Benjamin Herrenschmidt wrote:

> > Which is exactly my point.  It _doesn't_ work fine without a freezer, 
> > because the USB stack currently relies on the freezer to prevent plug 
> > activity.
> 
> Putting on hold plug activity has nothing, NOTHING, to do with the half
> assed piece of deadlocking crap we have now we call a freezer.

You're wrong; it _does_ have something to do with the freezer.  The 
connection is that the code uses the freezer to put plug activity on 
hold.

You probably meant to say that it _should_ have nothing to do with the 
freezer.  That's a different matter.  But in any case you should write 
what you actually mean, rather than just putting down something as 
inflammatory as possible.

> As long as you guys keep mixing up all the issues and coming up with
> totally bogus solutions that cannot work, we won't have a useful suspend
> (either to RAM or to disk) in linux.

In my defense, you should realize that until Rafael's notifier chain
was added (just a few weeks ago, still not in mainline I believe) there
was no other way to do it.  Plug activity needs to be stopped before
the child devices are suspended, and the PM core does not send any
notification to drivers at that time.  All it does is activate the 
freezer.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Benjamin Herrenschmidt


> > Well... 2 things here. Either you have a freezer in which case the
> > chances of the above scenario are increased,
> 
> How so? :-)

I meant you have a freezer that freezes uninterruptible tasks.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Benjamin Herrenschmidt


> Spinning in the driver with the lock not held is impossible, since the 
> driver is called with the lock already acquired.
> 
> Failing with -ERETRY is non-transparent.  I would prefer to block such 
> requests at their source, before the lock is acquired.  Perhaps in the 
> driver core, perhaps even earlier.
> 
> (And rather than trying to manage a waitqueue or struct completion, it
> would be easiest to jump directly into the freezer!  The driver or the 
> core wouldn't have to worry about waking up all these blocked threads.)

That's wrong. The freezer is NOT a solution for that sort of thing. Just
because you guys can't get your locking right.

> You're missing the point.  If the driver and the freezer are both 
> solid, there's no reason they can't share the work.  If many drivers 
> can pass off part of their workload to the single freezer, it's a net 
> win.

I've explained already multiple times that the freezer will not do what
you guys expect it to do. IOs can be submited at non-task time and there
is no clear distinction between IO generating threads that must and
those that must not be frozen.

I really can't understand why you guys work so hard at trying to avoid
the right solutions systematically.

> So it isn't a question of how solid the drivers are; it's a question 
> of how solid the freezer is.  And bear in mind that if you convince 
> people the freezer is not solid enough to be used, then you will have 
> to find an alternative for purposes of hibernation.

Because I'm intimately convinced that the freezer is a wrong approach
that cannot be made solid enough.

> > The freezer is a flawed concept in the first place.
> 
> <... Long and cogent argument which I will skip over for now ...>

Too bad, that's where the interesting points that show that the freezer
cannot work are..

> > Face it, we should seriously look into doing suspend/resume without a
> > freezer.
> 
> I'm willing to try, although I think it will be a tremendous amount of 
> work to verify that every driver does the right thing.  There's lots of 
> support missing.  For example, don't you think we should block all 
> sysfs I/O during suspend?  And likewise for insmod/rmmod?

sysfs is a matter of driver. If a sysfs read/write callback in a driver
is hitting the HW, it most certainly already has some kind of locking.
That locking can/should be extended to deal with blocking when the HW is
suspended.

However, since it seems that people universally consider it very hard to
get right (I don't but heh), Linus and Paul have come up with a solution
for most simple enough directly-mapped drivers such as PCI (ok, that
doesn't include USB) which is to simply do the HW suspend in a late
callback after IRQs are off, and not bother with the rest.

> > I even tend to think that we could do STD that way too, in
> > fact, while Linus is right saying it's a different problem than STR, we
> > could even probably re-use some of the STR infrastructure in some
> > hackish way, still without a freezer. We could have ways to block page
> > cache writeout, for example, to prevent new post-snapshot dirty data
> > from hitting the platter, and use direct BIOs for writeout. That's just
> > an example.
> 
> What about systems with no BIOS?  I think this would be very hard or 
> even impossible to make work.

You made the same mistake I did when reading Nigel's mail ... BIOs ->
Block IO requests, not BIOS :-)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Benjamin Herrenschmidt

On Sat, 2007-07-07 at 12:17 -0400, Alan Stern wrote:
> On Sat, 7 Jul 2007, Benjamin Herrenschmidt wrote:
> 
> > > > And guess what ? It's what we do on powerbooks, and it works fine,
> > > > without a freezer :-)
> 
> > If you remember, one of the things I've been advocating has always been
> > that we should put on hold all plug activity (unplug might be alright as
> > long as the user events are just delayed) when we start suspending. No
> > new devices, no new bindings. "hub" type devices are respondible for
> > bringing in the new stuff after resume.
> 
> Which is exactly my point.  It _doesn't_ work fine without a freezer, 
> because the USB stack currently relies on the freezer to prevent plug 
> activity.

Putting on hold plug activity has nothing, NOTHING, to do with the half
assed piece of deadlocking crap we have now we call a freezer.

As long as you guys keep mixing up all the issues and coming up with
totally bogus solutions that cannot work, we won't have a useful suspend
(either to RAM or to disk) in linux.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Benjamin Herrenschmidt

On Sat, 2007-07-07 at 13:49 +0200, Pavel Machek wrote:

> And anyway I believe that current issue (fuse deadlocks with s2ram)
> should be present on powerbooks, too... it is just way harder to
> trigger. All that is neccessary is fused (or one of its helpers) to
> get frozen by accessing suspended device.

I don't see any fuse specific issue without a freezer, but I see generic
issues such as kmalloc blocking in a driver with a mutex held etc... but
I've talked about these already. They have nothing to do with the
freezer though.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

malicious filesystems (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-07 Thread Pavel Machek

Hi!

> > We can just wait for all fuse requests to be serviced before
> > proceeding further with freeze, right?
> 
> Right.  Nice way to slow down or stop the suspend with an unprivileged
> process.  Avoiding that sort of DoS is one of the design goals of
> fuse.

So you want me to handle _malicious_ filesystems now?

That should be easy... :-). You already have nasty deadlocks in FUSE,
and you solve them by "root can echo 1 > abort"... so allow me the
same possibility.

We can tell fused we are freezing, and if all the requests are not
serviced within, say, 30 seconds, we call the filesystem malicious and
do echo 1 > abort.

Not ideal, but neither is allowing malicious filesystems in the first
place...

> Look at it this way: the task of the freezer is to stop new I/O
> hitting the hardware.  But it is totally indiscriminate about what it
> stops, it tries to stop _everything_ even things which have nothing to
> do with hardware.
> 
> Not nice.

Not nice, but we don't know any better for now. "Just fix all the
drivers" basically means "just fix 90% of kernel".
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: problem 1 (was Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway))

2007-07-07 Thread Rafael J. Wysocki

On Saturday, 7 July 2007 14:08, Pavel Machek wrote:
> Hi!
> 
> > > Now, if kernel needs FUSE services for some reason (that's the problem
> > > we hit in s2ram case, right?), we have a deadlock.
> > > 
> > > So main problem still seems to be "kernel should not depend on
> > > userland services during suspend", refrigerator or not.
> > 
> > And also "Userland should not depend on userland services", which is 
> > rather more of a problem.
> 
> No, that's not a problem. Or rather, that's different problem, called
> "problem 2" (fuse causes freezer to fail to stop processes).
> 
> But we still have "problem 1" here: after devices are suspended,
> kernel tries to use fuse's services. That is not going to work, one
> way or another, because devices are suspended and userland can't work
> reliably.
> 
> (Aha, it _may_ be it is kernel tries to use fuse's services after
> freezing userland but before freezing devices. I don't think it is).
> 
> To solve "problem 1", we need to know which part of kernel asks for
> fuse services. sysrq-t trace is likely to tell us. Can someone repeat
> the "problem 1" scenario (freezer succeeds but then it deadlocks), and
> produce sysrq-t trace? That way we can solve "problem 1".

Well, such a trace would be helpful in any case.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Miklos Szeredi

> We can just wait for all fuse requests to be serviced before
> proceeding further with freeze, right?

Right.  Nice way to slow down or stop the suspend with an unprivileged
process.  Avoiding that sort of DoS is one of the design goals of
fuse.

Look at it this way: the task of the freezer is to stop new I/O
hitting the hardware.  But it is totally indiscriminate about what it
stops, it tries to stop _everything_ even things which have nothing to
do with hardware.

Not nice.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Rafael J. Wysocki

On Saturday, 7 July 2007 04:44, Benjamin Herrenschmidt wrote:
> On Fri, 2007-07-06 at 11:31 +0200, Oliver Neukum wrote:
> > Am Freitag, 6. Juli 2007 schrieb Benjamin Herrenschmidt:
> > > On Fri, 2007-07-06 at 09:13 +0200, Rafael J. Wysocki wrote:
> > > > 
> > > > The only reason (I know of) why we don't handle uninterruptible tasks 
> > > > in the
> > > > freezer is that we're afraid of the suspend process deadlocking with an
> > > > uninterruptible task holding a lock, but AFAICS the probability of such 
> > > > an
> > > > event is extremely small.
> > > 
> > > What would deadlock specifically ? One of the drivers trying to acquire
> > > that lock ? It would be a driver bug then.
> > 
> > Your driver's write method looks like:
> > 
> > mutex_lock();
> > poke_some_hardware();
> > wait_event_uninterruptible(); //for result
> > res = evaluate_result();
> > mutex_unlock();
> > return res;
> > 
> > If you put a task into the refrigerator at wait_event_interruptible()
> > you will deadlock if you need this lock for the driver to go to suspend.
> > The suspend method then must not take the lock _and_ it must be
> > aware that there may be an ongoing operation.
> 
> Well... 2 things here. Either you have a freezer in which case the
> chances of the above scenario are increased,

How so? :-)

> or you don't, in which case 
> your suspend method will just sleep on the lock until outstanding HW
> accesses that have that lock are completed, and everything is fine.
> 
> You need to be careful with one thing though, whether you have a freezer
> or not. If you driver, in some code path, whatever it is (ioctl, kernel
> thread, workqueue, ...) does something like:
> 
> mutex_lock
> kmalloc(...,GFP_KERNEL);
> mutex_unlock
> 
> And it's suspend callback then does:
> 
> mutex_lock
> 
> The problem here is that the disks might already have been suspended
> prior to your driver being called. Thus, any attempt at pushing things
> out to swap or dirty mmap'ings back to storage will hang, thus kmalloc
> can potentially hang (afaik), and you will deadlock.
> 
> That's what I've been talking about earlier when I said that we should
> have some security in SLAB/SLUB/Buddy allocators, to silently turn
> GFP_KERNEL to at least GFP_NOIO or even ATOMIC before we start
> suspending drivers.
> 
> Now, another way to deal with that would have to use
> pre-suspend/post-resume notifications, and have drivers avoid doing the
> above between those, but that's much harder. (Essentially, drivers would
> have to either make sure they don't do things like blocking allocations,
> even implicitely, or possibly fall back to a degraded synchronous mode
> or that sort of thing).
> 
> I think it's much simpler to tweak slab/slub/buddy instead :-)
> 
> Note that the above issue is orthogonal to our freezer discussion, it's
> just one of the potential deadlock cause we have with suspend that needs
> to be fixed.

Agreed.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-07 Thread Miklos Szeredi

> > One task doing ptrace() can basically do whatever it wants with the
> > task being traced.  This is not an exact analogy to what fuse does,
> > but close.
> 
> Well, IMO userland tasks should not have power to grab VFS mutexes for
> indefinite ammount of time. ("fused is allowed to deadlock kernel, in
> a way only write to special file helps" is ugly). Unfortunately, I
> don't think there's a way to work around that deadlock within fuse
> design limits... (coda was able to get around it by working on whole
> files granularity, AFAICT), so we'll have to live with that.

That's just file I/O.  You can easily deadlock coda with any other
file operation.  In fact coda is _less_ robust wrt a misbehaving
userspace server than fuse by a big margin.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Alan Stern

On Sat, 7 Jul 2007, Benjamin Herrenschmidt wrote:

> > > so the suspend process will wait for it.  When binding  
> > > is done the suspend_device() code will take the device lock and tell  
> > > everything else to postpone further bind requests as above.
> > 
> > My question referred to drivers trying to bind or unbind a device
> > _after_ the device has been suspended.  I suppose you'll say that's
> > covered by the NO_BIND flag.  But now we have the locking problem
> > mentioned above: The thread trying to bind is holding a lock which is
> > needed for resuming.
> 
> Why would it ? Just make it fail, maybe with some kind of -ERETRY... Or
> it can spin with the lock not held if it want. That's a detail really.

Spinning in the driver with the lock not held is impossible, since the 
driver is called with the lock already acquired.

Failing with -ERETRY is non-transparent.  I would prefer to block such 
requests at their source, before the lock is acquired.  Perhaps in the 
driver core, perhaps even earlier.

(And rather than trying to manage a waitqueue or struct completion, it
would be easiest to jump directly into the freezer!  The driver or the 
core wouldn't have to worry about waking up all these blocked threads.)

> > As one of the people responsible for the USB power management 
> > implementation, I would appreciate more details about this.  For 
> > example, a dmesg log with CONFIG_USB_DEBUG turned on together with a
> > complete description of the actions you took to provoke the bug.
> > 
> > (I wonder how much of this "buginess" is caused by the lack of the 
> > freezer in PPC.)
> 
> No. The freezer will hide some of those problems under the carpet, but
> not solve the basic issue which is the driver should be solid. Period.

You're missing the point.  If the driver and the freezer are both 
solid, there's no reason they can't share the work.  If many drivers 
can pass off part of their workload to the single freezer, it's a net 
win.

So it isn't a question of how solid the drivers are; it's a question 
of how solid the freezer is.  And bear in mind that if you convince 
people the freezer is not solid enough to be used, then you will have 
to find an alternative for purposes of hibernation.

> The freezer is a flawed concept in the first place.

<... Long and cogent argument which I will skip over for now ...>

> Face it, we should seriously look into doing suspend/resume without a
> freezer.

I'm willing to try, although I think it will be a tremendous amount of 
work to verify that every driver does the right thing.  There's lots of 
support missing.  For example, don't you think we should block all 
sysfs I/O during suspend?  And likewise for insmod/rmmod?

> I even tend to think that we could do STD that way too, in
> fact, while Linus is right saying it's a different problem than STR, we
> could even probably re-use some of the STR infrastructure in some
> hackish way, still without a freezer. We could have ways to block page
> cache writeout, for example, to prevent new post-snapshot dirty data
> from hitting the platter, and use direct BIOs for writeout. That's just
> an example.

What about systems with no BIOS?  I think this would be very hard or 
even impossible to make work.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Alan Stern

On Sat, 7 Jul 2007, Benjamin Herrenschmidt wrote:

> > > And guess what ? It's what we do on powerbooks, and it works fine,
> > > without a freezer :-)

> If you remember, one of the things I've been advocating has always been
> that we should put on hold all plug activity (unplug might be alright as
> long as the user events are just delayed) when we start suspending. No
> new devices, no new bindings. "hub" type devices are respondible for
> bringing in the new stuff after resume.

Which is exactly my point.  It _doesn't_ work fine without a freezer, 
because the USB stack currently relies on the freezer to prevent plug 
activity.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Pavel Machek

On Fri 2007-07-06 09:07:38, Miklos Szeredi wrote:
> > > Actually fuse allows SIGKILL, because it's always fatal, and the
> > > syscall may not be restarted.
> > 
> > Okay, and you should handle refrigerator in the same paths where you
> > handle SIGKILL. Just add try_to_freeze() there...
> 
> It's the fourth time I'm repeating this in this thread:
> 
> Yes adding try_to_freeze() there would partially solve the probelem.
> 
> But another task can be sleeping on a mutex held by the task waiting
> for the reply.  And the freezer won't be able to handle that one.
> 
> Generally, calling try_to_freeze() with mutexes held is not a good
> idea.

Agreed, calling try_to_freeze() with mutex held is no-no, and it is
even documented somewhere.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Pavel Machek

Hi!

> > > Actually fuse allows SIGKILL, because it's always fatal, and the
> > > syscall may not be restarted.
> > 
> > I think you want to stick try_to_freeze() at the same places where you
> > do SIGKILL handling. That should solve the 'syslogd is unfreezeable'
> > problem.
> 
> I could, but it would not solve the general problem.  Namely, that the
> presence of fuse imposes a certain ordering in which userspace tasks
> have to be frozen.  And it is not possible to know this ordering.

We can just wait for all fuse requests to be serviced before
proceeding further with freeze, right?

> And even if the ordering were solved, the freezer would still not work
> if the filesystem is not responding due to external events, such as a
> lost network (this affects NFS, CIFS, whatever just the same as
> fuse).

That's ok, you can't suspend if your hdd is dead, and in the same way
you can't suspend if your NFS server is dead. I agree it is ugly, but
we seem to live ok with that.

We could (and should?) handle that, probably by realizing that NFS is
not a disk and using interruptible sleep, but...

> > Plus, it would be nice to find out where suspend/hibernation is
> > triggering fuse activity. We can then decide where to fix it -- in
> > fuse or in suspend parts. You said sys_sync is not implemented... so
> > where is the problem?
> 
> I cannot say without having a sysrq-t of the situation.

Yes please. Can someone affected please produce sysrq-t?
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Pavel Machek

On Thu 2007-07-05 10:15:01, Paul Mackerras wrote:
> Rafael J. Wysocki writes:
> 
> > This is incompatible with the code in kernel/power/main.c, since we only
> > disable the nonboot CPUs after devices have been suspended.  Do you think 
> > that
> > your framework can be modified to work without disabling the nonboot CPUs
> > by the user space?
> 
> Sure.  It was a "if it can be done in userspace, do it in userspace"
> kind of decision, but I'm not wedded to it.
> 
> I actually do want to converge to using the generic suspend-to-ram
> code on powerbooks.  I just want to avoid causing regressions for
> powerbook users, including myself. :)

Curious, do you actually use fuse? Can you try it _with_ freezer and
produce sysrq-t trace of deadlock?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

problem 1 (was Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway))

2007-07-07 Thread Pavel Machek

Hi!

> > Now, if kernel needs FUSE services for some reason (that's the problem
> > we hit in s2ram case, right?), we have a deadlock.
> > 
> > So main problem still seems to be "kernel should not depend on
> > userland services during suspend", refrigerator or not.
> 
> And also "Userland should not depend on userland services", which is 
> rather more of a problem.

No, that's not a problem. Or rather, that's different problem, called
"problem 2" (fuse causes freezer to fail to stop processes).

But we still have "problem 1" here: after devices are suspended,
kernel tries to use fuse's services. That is not going to work, one
way or another, because devices are suspended and userland can't work
reliably.

(Aha, it _may_ be it is kernel tries to use fuse's services after
freezing userland but before freezing devices. I don't think it is).

To solve "problem 1", we need to know which part of kernel asks for
fuse services. sysrq-t trace is likely to tell us. Can someone repeat
the "problem 1" scenario (freezer succeeds but then it deadlocks), and
produce sysrq-t trace? That way we can solve "problem 1".
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-07 Thread Pavel Machek

Hi!

> > > > You have processes that don't react to signals, because some
> > > > other user land task is misbehaving.  I'd call that ugly at the
> > > > very least.
> > > 
> > > It already happens with, say, NFS. Don't think about it in terms of a 
> > > userland task misbehaving - think of it in terms of a resource becoming 
> > > unavailable.
> > 
> > I think there's a difference between a userland task playing the role of a
> > resource and a "real" external resource the kernel doesn't control.
> > 
> > IMO, userland tasks should not have the power to affect each other as though
> > they were parts of the kernel.
> 
> One task doing ptrace() can basically do whatever it wants with the
> task being traced.  This is not an exact analogy to what fuse does,
> but close.

Well, IMO userland tasks should not have power to grab VFS mutexes for
indefinite ammount of time. ("fused is allowed to deadlock kernel, in
a way only write to special file helps" is ugly). Unfortunately, I
don't think there's a way to work around that deadlock within fuse
design limits... (coda was able to get around it by working on whole
files granularity, AFAICT), so we'll have to live with that.

I think we have two separate problems here, and both are
solvable, without major changes to fuse or suspend framework.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: removing refrigerator does not help with s2ram vs. fuse deadlocks (was Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway)

2007-07-07 Thread Pavel Machek

Hi!

> > > And also "Userland should not depend on userland services", which is 
> > > rather more of a problem.
> > 
> > I think you're oversimplifying it, as far as FUSE is concerned.
> > 
> > Namely, if there are two userland tasks, A and B, and B is uninterruptible,
> > because A is blocked, then this is not a usual situation.
> 
> Fuse is one case of it occuring, and if we end up with more userspace 
> drivers then the problem is only going to get worse.

We'll have to solve them as they come.

Face it, hardware drivers _have_ to know about suspend/resume. Even
userspace drivers will have to know about suspend/resume, because they
need to reinit the hw during resume.

Now... most parts of kernel need to know (a bit) about suspend/resume
-- at least enough to play nicely with refrigerator. In retrospect it
is pretty obvious that this covers fused, too, unfortunately noone
noticed that when fuse was designed.

Can we try to solve the suspend vs. fuse problem now? "Just removing
the refrigerator" is not the answer. First, refrigerator is impossible
to remove in few months timeframe, and second, it does not solve the
problem anyway.

(Actually, there are two separate problems with suspend vs. fuse.)
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Pavel Machek

Hi!

> > > And guess what ? It's what we do on powerbooks, and it works fine,
> > > without a freezer :-)

Well, issue is, you should stop claiming it works fine until issue
below is fixed... please?

And anyway I believe that current issue (fuse deadlocks with s2ram)
should be present on powerbooks, too... it is just way harder to
trigger. All that is neccessary is fused (or one of its helpers) to
get frozen by accessing suspended device.


Pavel

> > I wish you'd stop saying that.  Have you ever done any serious testing?
> > 
> > Here's something to try:  Add a time delay to the end of hub_suspend in
> > drivers/usb/core/hub.c, so you can provoke a race manually.  Then while
> > one of your root hubs is being suspended and the system is waiting in
> > that delay, either plug in a new USB device to that hub or unplug an
> > existing device.
> > 
> > Be sure that CONFIG_USB_DEBUG is on so that we can figure out what 
> > happened after the fact.
> 
> If you remember, one of the things I've been advocating has always been
> that we should put on hold all plug activity (unplug might be alright as
> long as the user events are just delayed) when we start suspending. No
> new devices, no new bindings. "hub" type devices are respondible for
> bringing in the new stuff after resume.



-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-07 Thread Pavel Machek

Hi!

> > How will that help?  Block the kernel thread in the freezer or block it 
> > in the driver -- either way it is blocked.  So how do your deadlocks 
> > get resolved?
> 
> Because nobody is waiting on that kernel thread anyway without a freezer
> so there is no deadlock anymore.

In the deadlock we are seeing, _someone_ is waiting on userspace
thread, that leads to deadlock with freezer. We don't know who,
because we have not seen the sysrq-t dumps.

The "unknown who" will deadlock on fused frozen by driver, too. We
really need to fix the "unknown who" here.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-06 Thread Benjamin Herrenschmidt


> > so the suspend process will wait for it.  When binding  
> > is done the suspend_device() code will take the device lock and tell  
> > everything else to postpone further bind requests as above.
> 
> My question referred to drivers trying to bind or unbind a device
> _after_ the device has been suspended.  I suppose you'll say that's
> covered by the NO_BIND flag.  But now we have the locking problem
> mentioned above: The thread trying to bind is holding a lock which is
> needed for resuming.

Why would it ? Just make it fail, maybe with some kind of -ERETRY... Or
it can spin with the lock not held if it want. That's a detail really.

> As one of the people responsible for the USB power management 
> implementation, I would appreciate more details about this.  For 
> example, a dmesg log with CONFIG_USB_DEBUG turned on together with a
> complete description of the actions you took to provoke the bug.
> 
> (I wonder how much of this "buginess" is caused by the lack of the 
> freezer in PPC.)

No. The freezer will hide some of those problems under the carpet, but
not solve the basic issue which is the driver should be solid. Period.

The freezer is a flawed concept in the first place. If you go back to
square one, what is the basic idea of it ? I'll basically expose the
idea and go down all of the path I have in mind where it stops working
and becomes an incredibly difficult thing that in the end doesn't even
solve all the problems it's supposed to.

So first thing first...

I want a quiescent system with no new "IO requests" (whatever that mean
in the context of drivers) issued to avoid races during suspend/resume.

That sounds like a nice idea. Yeah. Sounds... only. Problem is. How do
you define that quiescent system ? First idea is ... let's stop
userland. There are various ways of doing that, but the freezer hooking
into the signal code is not necessarily a bad one.

No, I'm purposefully putting aside all the cases where the above doesn't
work (user process in the kernel in some uninterruptible wait, etc...),
which are the first big setback imho... our simple idea is suddenly not
so simple anymore, but we can bring those back later.

Now, there is still a problem... kernel threads. In fact, there is no
fundamental distinction between a kernel thread and a user process...
one has an MM and the other doesn't but as far as we are concerned, it's
the same. Kernel threads can issues IOs, or like khubd, detect devices,
plug/unplug them, etc etc all over the place.

Easy answer that comes to mind -> freeze them too. Heh, but kernel
threads don't do signals, so we end up with all those try_to_freeze().
Then what about the fact that drivers may need those kernel threads to
proceed ? Some drivers queue up their IO requests to a kernel thread to
process them and suspend() might need to flush those down, issue a
couple more such as "spin down disks" before that kernel thread can
actually be frozen... Hrm.. maybe not all of them then.

But how do you decide ? What defines that a kernel can issue an IO ? In
fact, if you look closely, anything doing kmalloc(...,GFP_KERNEL) for
example can trigger an IO... implicitely, via the VM pushing things out.
And that's just one example.

In some case, those same threads that may need to be kept non-frozen are
-also- the ones that will potentially submit new IOs or bring in new
devices.

And then, there is keventd ... what do you do about work queues ? You
have everybody pouring things at workqueues... some of these things may
well hit your driver, some may not. Same goes in some cases with
interrupt time stuff, such as timers or tasklets.. think about
networking.

In the end, the nice idea that "threads/tasks cause requests, so we just
stop them" basically falls appart. Half of the kernel can cause a driver
to be hit somewhere and a given time, it can be from a thread context,
directly caused by userland, or from some timer due to some subsystem
having a keepalive thing ticking in or whatever else.

Now, we go back to the previous issue of what do we do about
uninterruptible sleep... You want to abort suspend because, for example,
somethign called a driver that does an msleep(200) or so ? Are you aware
that 99% of laptop users close their laptops and shove it in the bag not
even waiting for the disk to spin down ? And you want suspend to abort
because some random "happen all the time" even such as a process being
somewhere temporarily in uninterruptible state in the kernel ?

So let's say we freeze them from within the scheduler even when they are
uininterruptible.. ouch... you just caused the deadlocks we talked about
before. While without a freezer, suspend() can at least rely on the fact
that it can wait for processes that have such pending locked constructs
waiting will ultimately wakeupm and wait for them (or even explicitely
wake them), it can't if they've been frozen. So what was a perfectly
solvable moderate driver synchronisation issue becomes a deadlock
nigh

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-06 Thread Benjamin Herrenschmidt

On Fri, 2007-07-06 at 10:38 -0400, Alan Stern wrote:
> On Fri, 6 Jul 2007, Benjamin Herrenschmidt wrote:
> 
> > What you propose is basically a slightly over-simplistic version of what
> > I think (and Paulus think) should be done. We do need to do it via
> > driver callbacks down the tree since only drivers can know how to deal
> > with their DMA etc... and ordering need to be respected, but that's
> > basically it.
> > 
> > And guess what ? It's what we do on powerbooks, and it works fine,
> > without a freezer :-)
> 
> I wish you'd stop saying that.  Have you ever done any serious testing?
> 
> Here's something to try:  Add a time delay to the end of hub_suspend in
> drivers/usb/core/hub.c, so you can provoke a race manually.  Then while
> one of your root hubs is being suspended and the system is waiting in
> that delay, either plug in a new USB device to that hub or unplug an
> existing device.
> 
> Be sure that CONFIG_USB_DEBUG is on so that we can figure out what 
> happened after the fact.

If you remember, one of the things I've been advocating has always been
that we should put on hold all plug activity (unplug might be alright as
long as the user events are just delayed) when we start suspending. No
new devices, no new bindings. "hub" type devices are respondible for
bringing in the new stuff after resume.

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-06 Thread Alan Stern

On Fri, 6 Jul 2007, Kyle Moffett wrote:

> Except Linus already decreed (and I heartily agree) that hibernation  
> and suspend-to-RAM were fundamentally completely different operations  
> and therefore any attempts to share code were basically just making a  
> big muddy mess of things.  Would a thread "Remove phase-of-the-moon  
> calculations from network-recv code" be relevant to lunar observation  
> just because the two had to do with the phases of the moon?  No!

For that matter, shutting down a CPU and hibernation are fundamentally
different operations -- but they both use the freezer.  Is that a big
muddy mess?  But never mind; I won't discuss hibernation.

> > A)  Decide what to do about remote wakeup requests.
> 
> Why do we care?  If the wakeup request arrives before we go to sleep,  
> we obviously aren't asleep and so can't wake up.  If it arrives after  
> we go to sleep then it will wake us up.  Anything that depends on a  
> wakeup arriving mid-sequence is 100% masochistic race condition.

You don't understand my point.  If a wakeup request arrives before the
system goes to sleep, and it is serviced, then a device which ought to
have been suspended will in fact be awake.  This will (if the parent's
driver is written correctly) cause the sleep transition to abort.

Not that there's necessarily anything wrong with that.  I just wanted 
to be sure you were aware of the potential problems.

> > C)  Prevent devices and drivers from being registered or  
> > unregistered; in particular decide what to do about hot-plug or hot- 
> > unplug events.
> 
> (1b) Again, that's where the NO_BIND flag comes in.  If its set then  
> any device probe events must sleep, otherwise they can go through.

I didn't say "bind", I said "registered".  Admittedly, they are rather
similar.

Still, there are difficulties.  Let's say a driver has set the NO_BIND 
flag for one of its devices.  A bind request comes in, and the driver 
puts it on a waitqueue.  Note that the binding thread holds the device 
semaphore; this is always true when a driver's probe routine is called.

Later on it comes time for the PM core to resume the device, which will
start up the threads on the waitqueue.  Before doing so it must acquire
the device semaphore.  Deadlock!

> If any of those things screw up suspend-to-RAM then it is 100% the  
> drivers fault and no "process freezer" is going to fix it, end of  
> story.

Why do you say that?  A "process freezer" can prevent bind and
registration calls from occurring, since these calls have to run in
process context.  Ergo a freezer _can_ fix some of these problems.

>  And "A" cannot be made reliable.  At some point you shut off  
> interrupts right before going to sleep, and at that point any remote  
> wakeup event is just going to get dropped until you actually enter  
> sleep mode and the hardware takes over again.  If you miss a wakeup  
> event then whatever sent it should just retry, just as with *every*  
> other kind of network packet.

Who mentioned network packets?  And who says a remote wakeup event will 
get dropped once interrupts are disabled?  More likely it will set a 
bit somewhere that causes the system to wake up immediately after it 
has gone to sleep.

> > So what happens if a new subdevice arrives at the wrong time?  Do  
> > you block instead of binding it?  While holding a mutex needed to  
> > suspend the parent device?
> 
> That would be a driver bug.  If you have asynchronous probing then  
> proper suspend handling includes being able to postpone driver probe  
> events until after resume.

One of your conditions (embodied in the pseudocode you posted earlier) 
was that drivers should be told to prevent binding and registration 
before the child devices are suspended.  Currently the PM core doesn't 
do anything like that.  You can't blame the drivers for this lack.

Of course it could be added.  Or perhaps more easily, the drivers that 
support asynchronous probing could be notified when a suspend is about 
to start so they could begin blocking bindings/registrations then.

> > What about drivers trying to bind to existing devices?
> 
> While binding it will clearly be holding a mutex/spinlock on the  
> parent device,

("Parent device"?  Do you mean the device being bound?  If so then I
agree.  Or do you mean the device's parent?  If so then your statement
is not clear at all.  There is special-case code in the driver core to
make sure it is true for USB devices, and it looks ugly as can be.)

> so the suspend process will wait for it.  When binding  
> is done the suspend_device() code will take the device lock and tell  
> everything else to postpone further bind requests as above.

My question referred to drivers trying to bind or unbind a device
_after_ the device has been suspended.  I suppose you'll say that's
covered by the NO_BIND flag.  But now we have the locking problem
mentioned above: The thread trying to bind is holding a lock which is
needed for re

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-06 Thread Benjamin Herrenschmidt

On Fri, 2007-07-06 at 11:53 +0200, Rafael J. Wysocki wrote:
> 
> Moreover, I claim that, in the context of your example, _if_ the task
> is stuck
> at the wait_event_uninterruptible(), _then_ the freezerless suspend
> will
> deadlock with the task.

Why would the task be stuck there if it's not becasue of a freezer ? The
only reason I see would be a dependency on something like kmalloc trying
to push things to disk, or a lock owned by another user process, but the
former is a generic issue I've discussed in a separate mail and the
later is a driver bug imho.

Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-06 Thread Benjamin Herrenschmidt

On Fri, 2007-07-06 at 11:31 +0200, Oliver Neukum wrote:
> Am Freitag, 6. Juli 2007 schrieb Benjamin Herrenschmidt:
> > On Fri, 2007-07-06 at 09:13 +0200, Rafael J. Wysocki wrote:
> > > 
> > > The only reason (I know of) why we don't handle uninterruptible tasks in 
> > > the
> > > freezer is that we're afraid of the suspend process deadlocking with an
> > > uninterruptible task holding a lock, but AFAICS the probability of such an
> > > event is extremely small.
> > 
> > What would deadlock specifically ? One of the drivers trying to acquire
> > that lock ? It would be a driver bug then.
> 
> Your driver's write method looks like:
> 
> mutex_lock();
> poke_some_hardware();
> wait_event_uninterruptible(); //for result
> res = evaluate_result();
> mutex_unlock();
> return res;
> 
> If you put a task into the refrigerator at wait_event_interruptible()
> you will deadlock if you need this lock for the driver to go to suspend.
> The suspend method then must not take the lock _and_ it must be
> aware that there may be an ongoing operation.

Well... 2 things here. Either you have a freezer in which case the
chances of the above scenario are increased, or you don't, in which case
your suspend method will just sleep on the lock until outstanding HW
accesses that have that lock are completed, and everything is fine.

You need to be careful with one thing though, whether you have a freezer
or not. If you driver, in some code path, whatever it is (ioctl, kernel
thread, workqueue, ...) does something like:

mutex_lock
kmalloc(...,GFP_KERNEL);
mutex_unlock

And it's suspend callback then does:

mutex_lock

The problem here is that the disks might already have been suspended
prior to your driver being called. Thus, any attempt at pushing things
out to swap or dirty mmap'ings back to storage will hang, thus kmalloc
can potentially hang (afaik), and you will deadlock.

That's what I've been talking about earlier when I said that we should
have some security in SLAB/SLUB/Buddy allocators, to silently turn
GFP_KERNEL to at least GFP_NOIO or even ATOMIC before we start
suspending drivers.

Now, another way to deal with that would have to use
pre-suspend/post-resume notifications, and have drivers avoid doing the
above between those, but that's much harder. (Essentially, drivers would
have to either make sure they don't do things like blocking allocations,
even implicitely, or possibly fall back to a degraded synchronous mode
or that sort of thing).

I think it's much simpler to tweak slab/slub/buddy instead :-)

Note that the above issue is orthogonal to our freezer discussion, it's
just one of the potential deadlock cause we have with suspend that needs
to be fixed.

Cheers,
Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-06 Thread Kyle Moffett


On Jul 06, 2007, at 11:42:33, Alan Stern wrote:

On Thu, 5 Jul 2007, Kyle Moffett wrote:
Umm, this thread is NOT ABOUT HIBERNATING!!!  Please go back and  
read the subject, specifically the "suspend to RAM" parts :-D.


But it _is_ about the freezer; see the "Remove process freezer"  
part. :-)  Since the freezer is used during hibernation,  
hibernation is a legitimate topic.


Except Linus already decreed (and I heartily agree) that hibernation  
and suspend-to-RAM were fundamentally completely different operations  
and therefore any attempts to share code were basically just making a  
big muddy mess of things.  Would a thread "Remove phase-of-the-moon  
calculations from network-recv code" be relevant to lunar observation  
just because the two had to do with the phases of the moon?  No!


When your hardware can put itself to sleep and atomically preserve  
memory as it does so, you don't need an atomic copy.  For Real  
Suspend(TM) (IE: Suspend-to-RAM), the list of things to do is  
short and simple:


1)  Stop DMA and put most hardware into low-power states (stops  
all interrupt sources)
2)  Ensure that the other CPUs have finished any trailing  
interrupt handlers and put them to sleep

3)  Put the interrupt-controllers into low-power stat
4)  Go to sleep


Your short and simple list omits a few crucial items:

A)  Decide what to do about remote wakeup requests.


Why do we care?  If the wakeup request arrives before we go to sleep,  
we obviously aren't asleep and so can't wake up.  If it arrives after  
we go to sleep then it will wake us up.  Anything that depends on a  
wakeup arriving mid-sequence is 100% masochistic race condition.


B)  Prevent I/O requests from resuming devices that have been  
suspended.


(1a) As I describe below, step (1) includes setting NO_BIND and NO_IO  
flags on devices as they are processed.  Anybody who wants to do IO  
while those flags are set should just go sleep on a waitqueue.


C)  Prevent devices and drivers from being registered or  
unregistered; in particular decide what to do about hot-plug or hot- 
unplug events.


(1b) Again, that's where the NO_BIND flag comes in.  If its set then  
any device probe events must sleep, otherwise they can go through.



D)  Block driver bind or unbind calls.


See points (1a) and (1b) above.

Any of these things is capable of screwing up the course of  
events.  (In fact A _should_ be allowed to abort a suspend.)


If any of those things screw up suspend-to-RAM then it is 100% the  
drivers fault and no "process freezer" is going to fix it, end of  
story.  And "A" cannot be made reliable.  At some point you shut off  
interrupts right before going to sleep, and at that point any remote  
wakeup event is just going to get dropped until you actually enter  
sleep mode and the hardware takes over again.  If you miss a wakeup  
event then whatever sent it should just retry, just as with *every*  
other kind of network packet.


How about a freezer whose job it is to "wait for pending hard  
interrupts to complete when we have already guaranteed that we  
won't get any more"?  That part should be really *REALLY* easy.   
You don't need to care about either userspace processes or kernel  
threads at all.  Specifically, Step 1 consists of:


suspend_device(dev)
{
set_no_bind_flag(dev);
for (dev->subdevices)
suspend_device(dev);
set_no_io_flag(dev);
wait_for_in_progress_dma(dev);
turn_off_interrupts(dev);
go_to_low_power_state(dev);
}

After you've set the "no_bind" flag, you won't get any *new*  
subdevices trying to bind,


So what happens if a new subdevice arrives at the wrong time?  Do  
you block instead of binding it?  While holding a mutex needed to  
suspend the parent device?


That would be a driver bug.  If you have asynchronous probing then  
proper suspend handling includes being able to postpone driver probe  
events until after resume.  If you have synchronous probing then the  
problem doesn't exist because "set_no_bind_flag" is just telling the  
device not to raise any more device probe interrupts.



What about drivers trying to bind to existing devices?


While binding it will clearly be holding a mutex/spinlock on the  
parent device, so the suspend process will wait for it.  When binding  
is done the suspend_device() code will take the device lock and tell  
everything else to postpone further bind requests as above.


What happens to I/O requests submitted after the "no_io" flag is  
set?  The driver will have to block them, effectively creating its  
own little "freezer".


Oh, so you're calling every waitqueue in the kernel a "freezer" now?   
We do these things at the driver level *all* *the* *time*.  For  
instance, you can't submit new IOs to an ATA controller while it's  
renegotiating the bus speed, but that's never been a problem before.


When all the leaf devices are off, the parent device can be turned  
off because everythi

Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-06 Thread Alan Stern

On Thu, 5 Jul 2007, Kyle Moffett wrote:

> Umm, this thread is NOT ABOUT HIBERNATING!!!  Please go back and read  
> the subject, specifically the "suspend to RAM" parts :-D.

But it _is_ about the freezer; see the "Remove process freezer" part.  
:-)  Since the freezer is used during hibernation, hibernation is a
legitimate topic.

>  When your  
> hardware can put itself to sleep and atomically preserve memory as it  
> does so, you don't need an atomic copy.  For Real Suspend(TM) (IE:  
> Suspend-to-RAM), the list of things to do is short and simple:
> 
> 1)  Stop DMA and put most hardware into low-power states (stops all  
> interrupt sources)
> 2)  Ensure that the other CPUs have finished any trailing interrupt  
> handlers and put them to sleep
> 3)  Put the interrupt-controllers into low-power state
> 4)  Go to sleep

Your short and simple list omits a few crucial items:

A)  Decide what to do about remote wakeup requests.
B)  Prevent I/O requests from resuming devices that have been 
suspended.
C)  Prevent devices and drivers from being registered or unregistered; 
in particular decide what to do about hot-plug or hot-unplug events.
D)  Block driver bind or unbind calls.

Any of these things is capable of screwing up the course of events.
(In fact A _should_ be allowed to abort a suspend.)

> > As Pavel rightly said, you can get rid of the freezer, but you're  
> > only going to have to implement another one that does the  
> > essentially the same thing, even if it is at some other level.
> 
> How about a freezer whose job it is to "wait for pending hard  
> interrupts to complete when we have already guaranteed that we won't  
> get any more"?  That part should be really *REALLY* easy.  You don't  
> need to care about either userspace processes or kernel threads at  
> all.  Specifically, Step 1 consists of:
> 
> suspend_device(dev)
> {
>   set_no_bind_flag(dev);
>   for (dev->subdevices)
>   suspend_device(dev);
>   set_no_io_flag(dev);
>   wait_for_in_progress_dma(dev);
>   turn_off_interrupts(dev);
>   go_to_low_power_state(dev);
> }
> 
> After you've set the "no_bind" flag, you won't get any *new*  
> subdevices trying to bind, 

So what happens if a new subdevice arrives at the wrong time?  Do you 
block instead of binding it?  While holding a mutex needed to suspend 
the parent device?

What about drivers trying to bind to existing devices?

What happens to I/O requests submitted after the "no_io" flag is set?  
The driver will have to block them, effectively creating its own little 
"freezer".

> therefore it's safe to iterate over the  
> list of present sub-devices and suspend them.  Once those are  
> suspended and in low-power states you can set a "no_io" flag to  
> prevent the driver from submitting more IO.  At that point you can  
> lazily wait for existing DMA/IO/interrupts to finish on the device,  
> since *NOBODY* will be submitting them anymore, and we certainly  
> aren't probing for new devices.  Then you can just turn off the power  
> to the device.  When all the leaf devices are off, the parent device  
> can be turned off because everything waiting on the leaf devices is  
> blocked on them and won't unblock until the parent device *AND* the  
> leaf device are turned on again, in that order.

This is a lot like what we already do.  The differences are:

There is nothing corresponding to your "no-bind" flag.

Most drivers don't have anything like your "no_io" flag;
they assume that nobody will be around to submit an I/O
request.

> Scheduling and userspace are all still fully enabled in this  
> scenario.  Once all your devices are turned off, the only remaining  
> running threads will be those which haven't done IO since the  
> beginning of the suspend.  We can then disable preemption, turn off  
> the timer interrupts, and tell the other CPUs to park all their  
> remaining threads in schedule() and sleep.  Then we put the IRQ  
> controller to sleep and go to sleep ourselves.  If our driver model  
> locking is sufficient to handle putting a parent device to sleep  
> while threads are sleeping on a child device then there are exactly 0  
> problems.
> 
> Resuming is basically running the whole process in reverse.  Runtime- 
> suspend is achieved by not setting the 'no_io' or 'no_bind' flags and  
> putting selective device-subtrees to sleep without doing anything to  
> the rest of the system.

Nobody doubts that suspend can be made to work without the freezer.  
The point is that doing it this way dumps a bunch of extra 
responsibility on drivers.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-pm] Re: [PATCH] Remove process freezer from suspend to RAM pathway

2007-07-06 Thread Alan Stern

On Fri, 6 Jul 2007, Benjamin Herrenschmidt wrote:

> Why are you guys working so hard and spending so much energy to try to
> avoid doing the right thing is beyond my understanding...
> 
> > It _does_ apply to kernel threads.  That's exactly why I wrote above 
> > that kernel threads which try to do I/O during a suspend will need 
> > extra attention.
> 
> Ok none at all if you don't have a freezer.

In answer to both questions: We need the freezer in order to implement 
hibernate.  Even if we take your advice and stop using the freezer 
during suspend, these issues would still remain and would need to be 
solved.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 >

1 - 100 of 378 matches

Mail list logo