Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread david

On Wed, 1 Aug 2007, Pavel Machek wrote:


Hi!


 Do we have to block module loading?


No.  Registering new drivers is okay, registering new devices is bad.

Of course, some modules do want to register a new device in their init
method.  I don't know what we should do about them.  Force the
registration to fail, I suppose.  How often will people suspend while a
module is loading?


Well... plug this pcmcia card into the slot so that I do not have to
carry it separately, close the lid and go?

...not that impossible to imagine...


I useually leave my broadband card in the slot, but not seated. I wouldn't 
bet against it getting pushed in enough to be detected while putting the 
laptop in the bag.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread Rafael J. Wysocki
On Wednesday, 1 August 2007 11:22, Pavel Machek wrote:
> Hi!
> 
> > > Hmm, wonder why this isn't affecting people with VPNs?  Probably
> > > network mounts over VPN are rare, and ever rarer to have fs activity
> > > on them during suspend.
> > > 
> > > Anyway, I think it's long overdue to stop thinking about how to "fix"
> > > fuse, and concentrate on fixing the underlying problem instead ;)
> > 
> > To conclude this branch of the thread, I have a patch in the works that may
> > help a bit with unfreezable FUSE filesystems and it only affects the 
> > freezer.
> > I'll post it when 2.6.23-rc1 is out, because it's on top of some other 
> > patches
> > that need to go first.
> 
> I'm interested... which one is that?

Appended, on top of this:
https://lists.linux-foundation.org/pipermail/linux-pm/2007-July/014521.html

Greetings,
Rafael


---
 kernel/power/process.c |   49 -
 1 file changed, 48 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc1/kernel/power/process.c
===
--- linux-2.6.23-rc1.orig/kernel/power/process.c2007-07-24 
00:14:07.0 +0200
+++ linux-2.6.23-rc1/kernel/power/process.c 2007-07-24 00:14:17.0 
+0200
@@ -30,6 +30,14 @@
  */
 #define MAX_WAITS 5
 
+/*
+ * If the freezing of tasks fails, we attempt to thaw tasks that have already
+ * been frozen to give a chance the other tasks to freeze, in case one or more
+ * of them are blocked by the frozen ones.  If this fails MAX_ATTEMPTS times
+ * in a row, we give up.
+ */
+#define MAX_ATTEMPTS 10
+
 #define FREEZER_KERNEL_THREADS 0
 #define FREEZER_USER_SPACE 1
 
@@ -192,14 +200,21 @@ static void cancel_freezing(struct task_
 static int try_to_freeze_tasks(int freeze_user_space)
 {
struct task_struct *g, *p;
-   unsigned int todo, waits;
+   unsigned int todo, waits, attempts;
unsigned long ret;
struct timeval start, end;
s64 elapsed_csecs64;
unsigned int elapsed_csecs;
+   char *tick = "-\\|/";
+
+   printk(" ");
+   attempts = 0;
 
do_gettimeofday();
 
+ Repeat:
+   printk("\b%c", tick[attempts++ % 4]);
+
refrigerator_called = 0;
waits = 0;
do {
@@ -235,11 +250,43 @@ static int try_to_freeze_tasks(int freez
}
} while (todo);
 
+   if (todo && attempts <= MAX_ATTEMPTS) {
+   /*
+* Some tasks have not been able to freeze.  They might be stuck
+* in TASK_UNINTERRUPTIBLE waiting for the frozen tasks.  Try to
+* thaw the tasks that have frozen without clearing the freeze
+* requests of the remaining tasks and repeat.
+*/
+   read_lock(_lock);
+   do_each_thread(g, p) {
+   if (frozen(p)) {
+   p->flags &= ~PF_FROZEN;
+   wake_up_process(p);
+   }
+   } while_each_thread(g, p);
+   read_unlock(_lock);
+
+   ret = wait_event_timeout(refrigerator_waitq,
+   refrigerator_called, TIMEOUT);
+   if (!ret) {
+   /*
+* There is a little hope that we will succeed, but at
+* least we want to know which tasks have not been
+* frozen.  Thus, we are going to repeat once.
+*/
+   attempts = MAX_ATTEMPTS;
+   }
+
+   goto Repeat;
+   }
+
do_gettimeofday();
elapsed_csecs64 = timeval_to_ns() - timeval_to_ns();
do_div(elapsed_csecs64, NSEC_PER_SEC / 100);
elapsed_csecs = elapsed_csecs64;
 
+   printk("\b");
+
if (todo) {
/* This does not unfreeze processes that are already frozen
 * (we have slightly ugly calling convention in that respect,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread Pavel Machek
Hi!

> > Hmm, wonder why this isn't affecting people with VPNs?  Probably
> > network mounts over VPN are rare, and ever rarer to have fs activity
> > on them during suspend.
> > 
> > Anyway, I think it's long overdue to stop thinking about how to "fix"
> > fuse, and concentrate on fixing the underlying problem instead ;)
> 
> To conclude this branch of the thread, I have a patch in the works that may
> help a bit with unfreezable FUSE filesystems and it only affects the freezer.
> I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches
> that need to go first.

I'm interested... which one is that?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread Pavel Machek
Hi!

> >  Do we have to block module loading?
> 
> No.  Registering new drivers is okay, registering new devices is bad.
> 
> Of course, some modules do want to register a new device in their init 
> method.  I don't know what we should do about them.  Force the 
> registration to fail, I suppose.  How often will people suspend while a 
> module is loading?

Well... plug this pcmcia card into the slot so that I do not have to
carry it separately, close the lid and go?

...not that impossible to imagine...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread Pavel Machek
Hi!

> > > The problem with FUSE is related to the fact that the freezer can't
> > > freeze uninterruptible tasks and we said that perhaps we might avoid
> > > it if FUSE was made freezing-aware.  Still, no one has gone in this
> > > direction and I don't know of any plans to do that.
> > 
> > I thought we have fully explored this direction.  Lots of emails, and
> > an IRC session with Pavel.  Conclusion:
> 
> What am I missing in the following suggested solution?
> 
> 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, 
> which, 
> when set, causes a  userspace process to be frozen with kernel threads 
> instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, 
> sync and then freeze TIF_LATEFREEZE and freezable kernel threads.
> 
> 2) In the fuse code, the PID of the process that will do the work gets passed 

The list of neccessary PIDs is not known to the kernel. FUSE servers
may depend on another parts of userland.



-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread Pavel Machek
Hi!

   The problem with FUSE is related to the fact that the freezer can't
   freeze uninterruptible tasks and we said that perhaps we might avoid
   it if FUSE was made freezing-aware.  Still, no one has gone in this
   direction and I don't know of any plans to do that.
  
  I thought we have fully explored this direction.  Lots of emails, and
  an IRC session with Pavel.  Conclusion:
 
 What am I missing in the following suggested solution?
 
 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, 
 which, 
 when set, causes a  userspace process to be frozen with kernel threads 
 instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, 
 sync and then freeze TIF_LATEFREEZE and freezable kernel threads.
 
 2) In the fuse code, the PID of the process that will do the work gets passed 

The list of neccessary PIDs is not known to the kernel. FUSE servers
may depend on another parts of userland.



-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread Pavel Machek
Hi!

   Do we have to block module loading?
 
 No.  Registering new drivers is okay, registering new devices is bad.
 
 Of course, some modules do want to register a new device in their init 
 method.  I don't know what we should do about them.  Force the 
 registration to fail, I suppose.  How often will people suspend while a 
 module is loading?

Well... plug this pcmcia card into the slot so that I do not have to
carry it separately, close the lid and go?

...not that impossible to imagine...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread Pavel Machek
Hi!

  Hmm, wonder why this isn't affecting people with VPNs?  Probably
  network mounts over VPN are rare, and ever rarer to have fs activity
  on them during suspend.
  
  Anyway, I think it's long overdue to stop thinking about how to fix
  fuse, and concentrate on fixing the underlying problem instead ;)
 
 To conclude this branch of the thread, I have a patch in the works that may
 help a bit with unfreezable FUSE filesystems and it only affects the freezer.
 I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches
 that need to go first.

I'm interested... which one is that?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread Rafael J. Wysocki
On Wednesday, 1 August 2007 11:22, Pavel Machek wrote:
 Hi!
 
   Hmm, wonder why this isn't affecting people with VPNs?  Probably
   network mounts over VPN are rare, and ever rarer to have fs activity
   on them during suspend.
   
   Anyway, I think it's long overdue to stop thinking about how to fix
   fuse, and concentrate on fixing the underlying problem instead ;)
  
  To conclude this branch of the thread, I have a patch in the works that may
  help a bit with unfreezable FUSE filesystems and it only affects the 
  freezer.
  I'll post it when 2.6.23-rc1 is out, because it's on top of some other 
  patches
  that need to go first.
 
 I'm interested... which one is that?

Appended, on top of this:
https://lists.linux-foundation.org/pipermail/linux-pm/2007-July/014521.html

Greetings,
Rafael


---
 kernel/power/process.c |   49 -
 1 file changed, 48 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc1/kernel/power/process.c
===
--- linux-2.6.23-rc1.orig/kernel/power/process.c2007-07-24 
00:14:07.0 +0200
+++ linux-2.6.23-rc1/kernel/power/process.c 2007-07-24 00:14:17.0 
+0200
@@ -30,6 +30,14 @@
  */
 #define MAX_WAITS 5
 
+/*
+ * If the freezing of tasks fails, we attempt to thaw tasks that have already
+ * been frozen to give a chance the other tasks to freeze, in case one or more
+ * of them are blocked by the frozen ones.  If this fails MAX_ATTEMPTS times
+ * in a row, we give up.
+ */
+#define MAX_ATTEMPTS 10
+
 #define FREEZER_KERNEL_THREADS 0
 #define FREEZER_USER_SPACE 1
 
@@ -192,14 +200,21 @@ static void cancel_freezing(struct task_
 static int try_to_freeze_tasks(int freeze_user_space)
 {
struct task_struct *g, *p;
-   unsigned int todo, waits;
+   unsigned int todo, waits, attempts;
unsigned long ret;
struct timeval start, end;
s64 elapsed_csecs64;
unsigned int elapsed_csecs;
+   char *tick = -\\|/;
+
+   printk( );
+   attempts = 0;
 
do_gettimeofday(start);
 
+ Repeat:
+   printk(\b%c, tick[attempts++ % 4]);
+
refrigerator_called = 0;
waits = 0;
do {
@@ -235,11 +250,43 @@ static int try_to_freeze_tasks(int freez
}
} while (todo);
 
+   if (todo  attempts = MAX_ATTEMPTS) {
+   /*
+* Some tasks have not been able to freeze.  They might be stuck
+* in TASK_UNINTERRUPTIBLE waiting for the frozen tasks.  Try to
+* thaw the tasks that have frozen without clearing the freeze
+* requests of the remaining tasks and repeat.
+*/
+   read_lock(tasklist_lock);
+   do_each_thread(g, p) {
+   if (frozen(p)) {
+   p-flags = ~PF_FROZEN;
+   wake_up_process(p);
+   }
+   } while_each_thread(g, p);
+   read_unlock(tasklist_lock);
+
+   ret = wait_event_timeout(refrigerator_waitq,
+   refrigerator_called, TIMEOUT);
+   if (!ret) {
+   /*
+* There is a little hope that we will succeed, but at
+* least we want to know which tasks have not been
+* frozen.  Thus, we are going to repeat once.
+*/
+   attempts = MAX_ATTEMPTS;
+   }
+
+   goto Repeat;
+   }
+
do_gettimeofday(end);
elapsed_csecs64 = timeval_to_ns(end) - timeval_to_ns(start);
do_div(elapsed_csecs64, NSEC_PER_SEC / 100);
elapsed_csecs = elapsed_csecs64;
 
+   printk(\b);
+
if (todo) {
/* This does not unfreeze processes that are already frozen
 * (we have slightly ugly calling convention in that respect,
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-08-02 Thread david

On Wed, 1 Aug 2007, Pavel Machek wrote:


Hi!


 Do we have to block module loading?


No.  Registering new drivers is okay, registering new devices is bad.

Of course, some modules do want to register a new device in their init
method.  I don't know what we should do about them.  Force the
registration to fail, I suppose.  How often will people suspend while a
module is loading?


Well... plug this pcmcia card into the slot so that I do not have to
carry it separately, close the lid and go?

...not that impossible to imagine...


I useually leave my broadband card in the slot, but not seated. I wouldn't 
bet against it getting pushed in enough to be detected while putting the 
laptop in the bag.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [linux-pm] Re: Hibernation considerations

2007-07-24 Thread Alan Stern
On Tue, 24 Jul 2007, Huang, Ying wrote:

> >From: Alan Stern [mailto:[EMAIL PROTECTED]
> >It can't.  Indeed, in the absence of a freezer, user threads will need
> >devices (more accurately, will submit I/O requests for devices) that
> >have to be kept quiescent or low-power.  Drivers will need to delay
> >those requests until the devices are returned to full operation.
> >
> >That's exactly what I've been saying all along: Drivers will need to
> >be changed to delay I/O requests, if there is no freezer.
> 
> If it is a too big work to implement "delaying I/O requests" for every
> driver, is it possible to implement it as follow:
> 
> 1. It is triggered to suspend to RAM/DISK.
> 2. Replace the driver related syscall entries (such as sys_read,
> sys_write, sys_ioctl, etc) in sys_call_table with special wrapper
> entries provided by "suspend to RAM/DISK" subsystem, which will delay
> I/O requests if appropriate.
> 3. When devices are quiesced, they are put into "low power" state and
> system is put into suspend state; or the image is written to disk
> (through snapshot/uswsusp or kexeced kernel).
> 4. After resuming from RAM/DISK, devices are put into "normal" state and
> the syscall entries replaced in step 2 are restored.

Ha!  I made exactly this same suggestion (URL lost in the mists of 
time), except that I proposed changing the syscall entries for every 
system call, not just the driver-related ones.

Nobody seemed to think it would work very well.

It leaves a few loose ends.  For example, suppose a user thread is 
already in the middle of a system call and is about to start doing some 
I/O (maybe it's waiting for a timer to expire).

In the end, this doesn't seem to be very different from freezing all 
user threads.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [linux-pm] Re: Hibernation considerations

2007-07-24 Thread Huang, Ying
>From: Alan Stern [mailto:[EMAIL PROTECTED]
>It can't.  Indeed, in the absence of a freezer, user threads will need
>devices (more accurately, will submit I/O requests for devices) that
>have to be kept quiescent or low-power.  Drivers will need to delay
>those requests until the devices are returned to full operation.
>
>That's exactly what I've been saying all along: Drivers will need to
>be changed to delay I/O requests, if there is no freezer.

If it is a too big work to implement "delaying I/O requests" for every
driver, is it possible to implement it as follow:

1. It is triggered to suspend to RAM/DISK.
2. Replace the driver related syscall entries (such as sys_read,
sys_write, sys_ioctl, etc) in sys_call_table with special wrapper
entries provided by "suspend to RAM/DISK" subsystem, which will delay
I/O requests if appropriate.
3. When devices are quiesced, they are put into "low power" state and
system is put into suspend state; or the image is written to disk
(through snapshot/uswsusp or kexeced kernel).
4. After resuming from RAM/DISK, devices are put into "normal" state and
the syscall entries replaced in step 2 are restored.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [linux-pm] Re: Hibernation considerations

2007-07-24 Thread Huang, Ying
From: Alan Stern [mailto:[EMAIL PROTECTED]
It can't.  Indeed, in the absence of a freezer, user threads will need
devices (more accurately, will submit I/O requests for devices) that
have to be kept quiescent or low-power.  Drivers will need to delay
those requests until the devices are returned to full operation.

That's exactly what I've been saying all along: Drivers will need to
be changed to delay I/O requests, if there is no freezer.

If it is a too big work to implement delaying I/O requests for every
driver, is it possible to implement it as follow:

1. It is triggered to suspend to RAM/DISK.
2. Replace the driver related syscall entries (such as sys_read,
sys_write, sys_ioctl, etc) in sys_call_table with special wrapper
entries provided by suspend to RAM/DISK subsystem, which will delay
I/O requests if appropriate.
3. When devices are quiesced, they are put into low power state and
system is put into suspend state; or the image is written to disk
(through snapshot/uswsusp or kexeced kernel).
4. After resuming from RAM/DISK, devices are put into normal state and
the syscall entries replaced in step 2 are restored.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [linux-pm] Re: Hibernation considerations

2007-07-24 Thread Alan Stern
On Tue, 24 Jul 2007, Huang, Ying wrote:

 From: Alan Stern [mailto:[EMAIL PROTECTED]
 It can't.  Indeed, in the absence of a freezer, user threads will need
 devices (more accurately, will submit I/O requests for devices) that
 have to be kept quiescent or low-power.  Drivers will need to delay
 those requests until the devices are returned to full operation.
 
 That's exactly what I've been saying all along: Drivers will need to
 be changed to delay I/O requests, if there is no freezer.
 
 If it is a too big work to implement delaying I/O requests for every
 driver, is it possible to implement it as follow:
 
 1. It is triggered to suspend to RAM/DISK.
 2. Replace the driver related syscall entries (such as sys_read,
 sys_write, sys_ioctl, etc) in sys_call_table with special wrapper
 entries provided by suspend to RAM/DISK subsystem, which will delay
 I/O requests if appropriate.
 3. When devices are quiesced, they are put into low power state and
 system is put into suspend state; or the image is written to disk
 (through snapshot/uswsusp or kexeced kernel).
 4. After resuming from RAM/DISK, devices are put into normal state and
 the syscall entries replaced in step 2 are restored.

Ha!  I made exactly this same suggestion (URL lost in the mists of 
time), except that I proposed changing the syscall entries for every 
system call, not just the driver-related ones.

Nobody seemed to think it would work very well.

It leaves a few loose ends.  For example, suppose a user thread is 
already in the middle of a system call and is about to start doing some 
I/O (maybe it's waiting for a timer to expire).

In the end, this doesn't seem to be very different from freezing all 
user threads.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Rafael J. Wysocki
On Monday, 23 July 2007 23:55, Nigel Cunningham wrote:
> Hi.
> 
> On Tuesday 24 July 2007 01:23:15 Alan Stern wrote:
> > On Mon, 23 Jul 2007, Nigel Cunningham wrote:
> > 
> > > Take a step back for a second.
> > > 
> > > The problem we're facing now is that we're getting some userspace 
> > > threads, 
> > > used in processing I/O, that are functioning as exceptions to the "freeze 
> > > userspace, then freezeable kernel threads" rule. They are only exceptions 
> > > because of that role in processing I/O - because they're de facto kernel 
> > > threads. So, if we orient our thinking more in terms of I/O processing 
> > > and 
> > > less in terms of the userspace/kernelspace distinction, we'll have a 
> > > solution:
> > > 
> > > 1) Freeze processes that aren't fs related (ie stop them generating I/O).
> > 
> > The problem here is that with things like FUSE, _every_ process is 
> > potentially fs related.  Nothing prevents a FUSE thread from doing IPC 
> > with any other thread.
> 
> Yes, but the fuse thread is going to know what other thread it's doing IPC 
> with, so it can get that thread flagged too.

Yes, but that thread may do IPC with yet another one and so on.

> > > 2) Flush pending I/O.
> > > 3) Freeze filesystems in reverse order of dependency, the primary purpose 
> > > being to stop them generating further I/O on their metadata.
> > > 
> > > Locks that are being held are only being held because work is being done. 
> If 
> > > we progressively focus on threads in terms of their create/process work 
> > > dependencies, we'll see that the problem isn't at all intractable.
> > 
> > As has been mentioned before, keeping track of all that dependency 
> > information would be very fragile and time-consuming.
> 
> I disagree. It's at least going to be less fragile and time-consuming then 
> maintaining new/extra code for kexec.

Well, I think the issue is real, so we need to find a solution (the simpler,
the better) and that need not be related to kexec. ;-)

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Nigel Cunningham
Hi.

On Tuesday 24 July 2007 01:23:15 Alan Stern wrote:
> On Mon, 23 Jul 2007, Nigel Cunningham wrote:
> 
> > Take a step back for a second.
> > 
> > The problem we're facing now is that we're getting some userspace threads, 
> > used in processing I/O, that are functioning as exceptions to the "freeze 
> > userspace, then freezeable kernel threads" rule. They are only exceptions 
> > because of that role in processing I/O - because they're de facto kernel 
> > threads. So, if we orient our thinking more in terms of I/O processing and 
> > less in terms of the userspace/kernelspace distinction, we'll have a 
> > solution:
> > 
> > 1) Freeze processes that aren't fs related (ie stop them generating I/O).
> 
> The problem here is that with things like FUSE, _every_ process is 
> potentially fs related.  Nothing prevents a FUSE thread from doing IPC 
> with any other thread.

Yes, but the fuse thread is going to know what other thread it's doing IPC 
with, so it can get that thread flagged too.

> > 2) Flush pending I/O.
> > 3) Freeze filesystems in reverse order of dependency, the primary purpose 
> > being to stop them generating further I/O on their metadata.
> > 
> > Locks that are being held are only being held because work is being done. 
If 
> > we progressively focus on threads in terms of their create/process work 
> > dependencies, we'll see that the problem isn't at all intractable.
> 
> As has been mentioned before, keeping track of all that dependency 
> information would be very fragile and time-consuming.

I disagree. It's at least going to be less fragile and time-consuming then 
maintaining new/extra code for kexec.

Nigel



pgpKo1OjveuTs.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Alan Stern
On Mon, 23 Jul 2007 [EMAIL PROTECTED] wrote:

> > For one thing, checking for a suspend-in-progress at the beginning of
> > each and every system call would add overhead to a hot path in the
> > kernel, one which is already very heavily optimized.  People wouldn't
> > stand for it.
> 
> I thought that the suspend stuff did this easily,

It does not do it at all.  Do you know how the freezer works?

>  but the freezer really 
> starts running into trouble when it wants to freeze some things, but not 
> other things. this seems to be the biggest area of churn and problems.

No.  The freezer starts running into trouble when it wants to freeze a
thread but can't, because that thread is waiting for some event to
occur and the only thread which can cause the event is already frozen.  
Or is itself waiting for a third thread which is already frozen...

> > You get similar problems from system calls that wait in kernel mode
> > until something has happened.  For example, a read() call for the
> > console device will wait until somebody types on the keyboard.  At any
> > point in time, many (or even most) user threads are blocked in a system
> > call.
> 
> but are locks held while they are blocked like this?

Sometimes they are, sometimes they aren't.

> > Let's let kernel K1 be the original kernel, the one which is going into
> > hibernation.  Kernel K2 is the one started by kexec to write out the
> > memory image.
> >
> > Your question becomes: Why should K2 jumping back to K1 cause K1
> > immediately to start running user tasks?  Answer: Because K1 has been
> > running user tasks all along (except while K2 was active) and nothing
> > has told it to stop.  In fact, about the only things which _can_ cause
> > K1 to stop running user threads are the freezer (which you want to
> > eliminate) and disabling interrupts (not possible since some drivers
> > require interrupts to be enabled when putting devices in low-power
> > mode).
> 
> when you jump to a body of code you jump to a specific point in the code, 
> not to some nebulous 'everything running' state.

How is that relevant?  When K2 jumps back to K1, it jumps to some 
designated location in K1.  It might just after the place where K1 
called K2; I'm not familiar with the details of kexec.  In any event, 
K1 will still be in the same state as it was when it called K2.

> > So when K2 starts up, it will have a phase in which user threads don't
> > run.  That doesn't affect K1.  When K2 returns to K1, K1 does not go
> > through this sort of phase.  It simply picks up from where it left off.
> 
> then how can it restart drivers before the user threads need them?

It can't.  Indeed, in the absence of a freezer, user threads will need 
devices (more accurately, will submit I/O requests for devices) that 
have to be kept quiescent or low-power.  Drivers will need to delay 
those requests until the devices are returned to full operation.

That's exactly what I've been saying all along: Drivers will need to 
be changed to delay I/O requests, if there is no freezer.

> > However there still remains the problem of user tasks running after
> > devices are supposed to be quiescent and before K1 starts.  There's
> > currently nothing to stop such tasks from making I/O requests and
> > thereby causing a quiescent device to become active again.
> 
> but if the devices are in low power mode then K1 needs to get them out of 
> low power mode before user tasks try to access them.

No -- which is good because it can't.  If a user task is running
there's no way to stop it from submitting I/O requests.  K1 needs to
delay these requests until after the device has returned to full 
operation.

> > We aren't talking about drivers initializing devices.  We are talking
> > about what happens during the time when drivers are trying to quiesce
> > devices (i.e., before K1 has started up K2) or power them down (after
> > K2 has returned to K1).
> 
> or if you are doing a resume instead of a suspend to ram the drivers need 
> to initialize or otherwise move to full power on K1 before user tasks hit 
> them.

Correct.  User tasks are allowed to submit requests, but the requests 
can't be carried out until the device returns to full operation.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread david

On Mon, 23 Jul 2007, Oliver Neukum wrote:


Am Montag 23 Juli 2007 schrieb Miklos Szeredi:

The reason is that we want them to "park" in safe places, ie. where there
are no locks held etc.  Thus, these safe places need to be chosen somehow
and since they are not marked throughout the code, we choose the obvious
one. :-)


Why shouldn't locks be held?

No locks which are required for suspend must be held, sure.  But
otherwise holding locks doesn't matter at all.


If you can provide a way to tell them apart, this would work.


can you just tell the driver to try and suspend and if it reports back 
that it fails back out of the suspend? or will the driver deadlock instead 
of reporting a failure if a lock is held.


David Lang

Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread david

On Mon, 23 Jul 2007, Alan Stern wrote:


On Sun, 22 Jul 2007 [EMAIL PROTECTED] wrote:




Ok, I did misunderstand you. it sound slike all you need to do to make
sure that locks are not held is to allow system calls to return before
trying to do the suspend/kexec/etc. that sounds like not only a trivial
thing to do, but something that would probably be done anyway.


If you could actually do it, it would work.  But you can't do it.  If
it were feasible, the freezer would have used that approach in the
first place.

For one thing, checking for a suspend-in-progress at the beginning of
each and every system call would add overhead to a hot path in the
kernel, one which is already very heavily optimized.  People wouldn't
stand for it.


I thought that the suspend stuff did this easily, but the freezer really 
starts running into trouble when it wants to freeze some things, but not 
other things. this seems to be the biggest area of churn and problems.



although syscalls that then call out to userspace tasks before they can
complete cause potential deadlocks (without that issue you can just wait
until all syscalls have returned, and not allow anything to issue new
syscalls) is this the issue that's killing FUSE+suspend?


You get similar problems from system calls that wait in kernel mode
until something has happened.  For example, a read() call for the
console device will wait until somebody types on the keyboard.  At any
point in time, many (or even most) user threads are blocked in a system
call.


but are locks held while they are blocked like this?


But it also means that tasks which otherwise would have been frozen are
actually free to run before the kexec call is made (and after the call
returns, if the kexec'd kernel returns back to the original kernel).
Any driver which was written with the assumption that tasks would be
frozen at those times will need to be changed.


here is where you loose me.

why should jumping back to the original kernel immedialty start running
these processes?


Let's let kernel K1 be the original kernel, the one which is going into
hibernation.  Kernel K2 is the one started by kexec to write out the
memory image.

Your question becomes: Why should K2 jumping back to K1 cause K1
immediately to start running user tasks?  Answer: Because K1 has been
running user tasks all along (except while K2 was active) and nothing
has told it to stop.  In fact, about the only things which _can_ cause
K1 to stop running user threads are the freezer (which you want to
eliminate) and disabling interrupts (not possible since some drivers
require interrupts to be enabled when putting devices in low-power
mode).


when you jump to a body of code you jump to a specific point in the code, 
not to some nebulous 'everything running' state.



 the process of doing a kexec requires things to happen in
the drivers before normal activity can happen, so there is a phase in
there where the kernel being jumped to has drivers initializing, but still
does not allow anything else to run.


So when K2 starts up, it will have a phase in which user threads don't
run.  That doesn't affect K1.  When K2 returns to K1, K1 does not go
through this sort of phase.  It simply picks up from where it left off.


then how can it restart drivers before the user threads need them?


why can't this phase be extended to
allow for the possibility of transitioning these drivers to a sleep mode
instead of to full operation?


Indeed, Rafael has suggested that K2 be responsible for putting devices
in low-power mode.  This has the disadvantage of requiring K2 to
include drivers for every device used by K1, but otherwise it would
work.

However there still remains the problem of user tasks running after
devices are supposed to be quiescent and before K1 starts.  There's
currently nothing to stop such tasks from making I/O requests and
thereby causing a quiescent device to become active again.


but if the devices are in low power mode then K1 needs to get them out of 
low power mode before user tasks try to access them.



The situation as regards locking is harder to discuss since I don't
know of any code examples to use as a guide.  The fact remains that if
user tasks aren't frozen then they can make system calls, and while
running in kernel mode they can acquire locks, which might cause
problems -- even though I can't identify any definite examples.


yes, if userspace is running jobs and submitting I/O and system calls
while drivers are trying to initalize there is a big problem, but I am
missing the reason this must be the case.


We aren't talking about drivers initializing devices.  We are talking
about what happens during the time when drivers are trying to quiesce
devices (i.e., before K1 has started up K2) or power them down (after
K2 has returned to K1).


or if you are doing a resume instead of a suspend to ram the drivers need 
to initialize or otherwise move to full power on K1 before user tasks hit 
them.



the 

Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Alan Stern
On Mon, 23 Jul 2007, Nigel Cunningham wrote:

> Take a step back for a second.
> 
> The problem we're facing now is that we're getting some userspace threads, 
> used in processing I/O, that are functioning as exceptions to the "freeze 
> userspace, then freezeable kernel threads" rule. They are only exceptions 
> because of that role in processing I/O - because they're de facto kernel 
> threads. So, if we orient our thinking more in terms of I/O processing and 
> less in terms of the userspace/kernelspace distinction, we'll have a 
> solution:
> 
> 1) Freeze processes that aren't fs related (ie stop them generating I/O).

The problem here is that with things like FUSE, _every_ process is 
potentially fs related.  Nothing prevents a FUSE thread from doing IPC 
with any other thread.

> 2) Flush pending I/O.
> 3) Freeze filesystems in reverse order of dependency, the primary purpose 
> being to stop them generating further I/O on their metadata.
> 
> Locks that are being held are only being held because work is being done. If 
> we progressively focus on threads in terms of their create/process work 
> dependencies, we'll see that the problem isn't at all intractable.

As has been mentioned before, keeping track of all that dependency 
information would be very fragile and time-consuming.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Alan Stern
On Sun, 22 Jul 2007 [EMAIL PROTECTED] wrote:

> > You are confusing "userspace" with "user tasks".  And not only that,
> > you often use the term "userspace" when you should say "user mode".
> >
> > If you want I can explain the differences.
> 
> please do, I have been treating all three as the same catagory.

Very briefly then: "User mode" and "kernel mode" refer to the CPU's
hardware privilege level.  A process makes the transition from user
mode to kernel mode by executing a system call.  Interrupt and
exception handlers also run in kernel mode, but they generally are not
considered to be part of any process.  The reverse transition occurs
when a process returns from a system call, or when an interrupt which
occurred while the CPU was in user mode completes.  (It's interesting
to note that system calls are somewhat similar to interrupts; in fact
sometimes they are implemented by a "software interrupt".)

"Kernel threads" are processes that run entirely in kernel mode.  They
usually don't have a memory mapping for any user-owned memory and they
never go into user mode.  All other processes are "user threads".

"Userspace" is a rather general term referring to things not in the
kernel.  It comprises both user tasks (while running in user mode) and
user memory.

> Ok, I did misunderstand you. it sound slike all you need to do to make 
> sure that locks are not held is to allow system calls to return before 
> trying to do the suspend/kexec/etc. that sounds like not only a trivial 
> thing to do, but something that would probably be done anyway.

If you could actually do it, it would work.  But you can't do it.  If 
it were feasible, the freezer would have used that approach in the 
first place.

For one thing, checking for a suspend-in-progress at the beginning of
each and every system call would add overhead to a hot path in the
kernel, one which is already very heavily optimized.  People wouldn't
stand for it.

> although syscalls that then call out to userspace tasks before they can 
> complete cause potential deadlocks (without that issue you can just wait 
> until all syscalls have returned, and not allow anything to issue new 
> syscalls) is this the issue that's killing FUSE+suspend?

You get similar problems from system calls that wait in kernel mode 
until something has happened.  For example, a read() call for the 
console device will wait until somebody types on the keyboard.  At any 
point in time, many (or even most) user threads are blocked in a system 
call.

> > Here's what you are missing:
> >
> > The new kexec approach eliminates the freezer and relies instead on the
> > fact that none of the tasks in the original kernel can execute while
> > the new kexec'd kernel is running.  This means the new kernel can write
> > out a memory image with no fear of interference or corruption.
> 
> correct
> 
> > But it also means that tasks which otherwise would have been frozen are
> > actually free to run before the kexec call is made (and after the call
> > returns, if the kexec'd kernel returns back to the original kernel).
> > Any driver which was written with the assumption that tasks would be
> > frozen at those times will need to be changed.
> 
> here is where you loose me.
> 
> why should jumping back to the original kernel immedialty start running 
> these processes?

Let's let kernel K1 be the original kernel, the one which is going into
hibernation.  Kernel K2 is the one started by kexec to write out the
memory image.

Your question becomes: Why should K2 jumping back to K1 cause K1
immediately to start running user tasks?  Answer: Because K1 has been
running user tasks all along (except while K2 was active) and nothing
has told it to stop.  In fact, about the only things which _can_ cause
K1 to stop running user threads are the freezer (which you want to
eliminate) and disabling interrupts (not possible since some drivers
require interrupts to be enabled when putting devices in low-power 
mode).

>  the process of doing a kexec requires things to happen in 
> the drivers before normal activity can happen, so there is a phase in 
> there where the kernel being jumped to has drivers initializing, but still 
> does not allow anything else to run.

So when K2 starts up, it will have a phase in which user threads don't 
run.  That doesn't affect K1.  When K2 returns to K1, K1 does not go 
through this sort of phase.  It simply picks up from where it left off.

> why can't this phase be extended to 
> allow for the possibility of transitioning these drivers to a sleep mode 
> instead of to full operation?

Indeed, Rafael has suggested that K2 be responsible for putting devices
in low-power mode.  This has the disadvantage of requiring K2 to 
include drivers for every device used by K1, but otherwise it would 
work.

However there still remains the problem of user tasks running after 
devices are supposed to be quiescent and before K1 starts.  There's 
currently nothing to stop such tasks from 

Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Oliver Neukum
Am Samstag 21 Juli 2007 schrieb Alan Stern:
> On Fri, 20 Jul 2007, Oliver Neukum wrote:
> 
> > > We already have a pre-suspend notification available for drivers that 
> > > need to allocate large amounts of memory.
> > 
> > Is that facility fine grained enough?
> 
> It's a notifier chain that gets called at several points during the 
> suspend transition.  One of those points is right at the start, while 
> userspace is still running and reasonably large amounts of memory can 
> be allocated.
> 
> Is it fine-grained enough?  I don't know -- hard to tell, since nothing 
> much is using it yet.
> 
> > > You are correct about the need to delay/stop device addition.  I don't
> > > know how this can be done in general; each code path calling
> > > device_add() may have to be treated individually.
> > 
> > What about the old API?
> 
> What old API do you mean?

The find_device() stuff.

> >  Do we have to block module loading?
> 
> No.  Registering new drivers is okay, registering new devices is bad.

What if it is a driver for virtual devices that don't need probe()
for actual hardware?

> Of course, some modules do want to register a new device in their init 
> method.  I don't know what we should do about them.  Force the 
> registration to fail, I suppose.  How often will people suspend while a 
> module is loading?
> 
> > What happens if a scsi error handler is woken? If it cannot be woken,
> > how are errors handled?
> 
> Why should the error handler wake up?  There isn't supposed to be any 
> I/O going on, hence no errors to handle.

What about shared busses? Firewire, FibreChannel? They can get external
resets, etc ...

Regards
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Miklos Szeredi
> Alan has recently proposed to introduce "suspend locks" to be acquired during
> a suspend/hibernation and such that we can leave uninterruptible tasks that
> don't hold any of them.

Sounds sane.  A global rwsem could be acquired for read by drivers,
and for write by suspend/hibernate.  Just need to add it to all
drivers that have PM, but that shouldn't need a heroic effort.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Rafael J. Wysocki
On Monday, 23 July 2007 15:08, Miklos Szeredi wrote:
> > > > The reason is that we want them to "park" in safe places, ie. where 
> > > > there
> > > > are no locks held etc.  Thus, these safe places need to be chosen 
> > > > somehow
> > > > and since they are not marked throughout the code, we choose the obvious
> > > > one. :-)
> > > 
> > > Why shouldn't locks be held?
> > > 
> > > No locks which are required for suspend must be held, sure.  But
> > > otherwise holding locks doesn't matter at all.
> > 
> > If you can provide a way to tell them apart, this would work.
> 
> Without some marking we can't tell obviously.
> 
> Are there many such locks?  We can easily check by adding some
> debugging code to the lock primitives, to make them yell if they are
> used during suspend.

This way we can only obtain information from systems that use hibernation
quite often.

Alan has recently proposed to introduce "suspend locks" to be acquired during
a suspend/hibernation and such that we can leave uninterruptible tasks that
don't hold any of them.

Unfortunately, I have no link to his original message at hand.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Miklos Szeredi
> > > The reason is that we want them to "park" in safe places, ie. where there
> > > are no locks held etc.  Thus, these safe places need to be chosen somehow
> > > and since they are not marked throughout the code, we choose the obvious
> > > one. :-)
> > 
> > Why shouldn't locks be held?
> > 
> > No locks which are required for suspend must be held, sure.  But
> > otherwise holding locks doesn't matter at all.
> 
> If you can provide a way to tell them apart, this would work.

Without some marking we can't tell obviously.

Are there many such locks?  We can easily check by adding some
debugging code to the lock primitives, to make them yell if they are
used during suspend.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Oliver Neukum
Am Montag 23 Juli 2007 schrieb Miklos Szeredi:
> > The reason is that we want them to "park" in safe places, ie. where there
> > are no locks held etc.  Thus, these safe places need to be chosen somehow
> > and since they are not marked throughout the code, we choose the obvious
> > one. :-)
> 
> Why shouldn't locks be held?
> 
> No locks which are required for suspend must be held, sure.  But
> otherwise holding locks doesn't matter at all.

If you can provide a way to tell them apart, this would work.

Regards
Oliver

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Rafael J. Wysocki
On Monday, 23 July 2007 14:14, Miklos Szeredi wrote:
> > On Monday, 23 July 2007 12:24, Miklos Szeredi wrote:
> > > > > The only thing to do is what Rafael has been working on: unfreeze
> > > > > things, hope the tasks sort themselves out, and try again.
> > > > 
> > > > That's what I'm questioning. Is there a more reliable way and we've
> > > > just given up too quickly?
> > > 
> > > There obviously _are_ more reliable ways.  A trivial one seems to be
> > > to just not require user tasks to finish syscalls.
> > > 
> > > Yeah, stopping user processes outside the kernel is convenient, but
> > > there's no fundamental reason why it is the only place where those
> > > tasks can be stopped.
> > 
> > The reason is that we want them to "park" in safe places, ie. where there
> > are no locks held etc.  Thus, these safe places need to be chosen somehow
> > and since they are not marked throughout the code, we choose the obvious
> > one. :-)
> 
> Why shouldn't locks be held?
> 
> No locks which are required for suspend must be held, sure.  But
> otherwise holding locks doesn't matter at all.
> 
> And I'm not saying that is trivial to do, but it might not be too hard
> either.
> 
> Rafael, can you please tell, what happened to that patch, that did not
> wait for tasks in uninterruptible sleep to be frozen?
> 
> That seemed like a magnificent approach compared to anything that has
> been proposed since.

Well, the freezer have failed to freeze tasks for a couple of times in my
test setup and I've had a couple of hangs.

I have an idea how to improve it, but that still requires some pending freezer
patches to go first.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Miklos Szeredi
> On Monday, 23 July 2007 12:24, Miklos Szeredi wrote:
> > > > The only thing to do is what Rafael has been working on: unfreeze
> > > > things, hope the tasks sort themselves out, and try again.
> > > 
> > > That's what I'm questioning. Is there a more reliable way and we've
> > > just given up too quickly?
> > 
> > There obviously _are_ more reliable ways.  A trivial one seems to be
> > to just not require user tasks to finish syscalls.
> > 
> > Yeah, stopping user processes outside the kernel is convenient, but
> > there's no fundamental reason why it is the only place where those
> > tasks can be stopped.
> 
> The reason is that we want them to "park" in safe places, ie. where there
> are no locks held etc.  Thus, these safe places need to be chosen somehow
> and since they are not marked throughout the code, we choose the obvious
> one. :-)

Why shouldn't locks be held?

No locks which are required for suspend must be held, sure.  But
otherwise holding locks doesn't matter at all.

And I'm not saying that is trivial to do, but it might not be too hard
either.

Rafael, can you please tell, what happened to that patch, that did not
wait for tasks in uninterruptible sleep to be frozen?

That seemed like a magnificent approach compared to anything that has
been proposed since.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Rafael J. Wysocki
On Monday, 23 July 2007 12:24, Miklos Szeredi wrote:
> > > The only thing to do is what Rafael has been working on: unfreeze
> > > things, hope the tasks sort themselves out, and try again.
> > 
> > That's what I'm questioning. Is there a more reliable way and we've
> > just given up too quickly?
> 
> There obviously _are_ more reliable ways.  A trivial one seems to be
> to just not require user tasks to finish syscalls.
> 
> Yeah, stopping user processes outside the kernel is convenient, but
> there's no fundamental reason why it is the only place where those
> tasks can be stopped.

The reason is that we want them to "park" in safe places, ie. where there
are no locks held etc.  Thus, these safe places need to be chosen somehow
and since they are not marked throughout the code, we choose the obvious
one. :-)

> And there are very fundamental reasons to _not_ require this.  Not
> just in the fuse case, but in any case where a syscall requires
> another user task to run before it can be finished (e.g. NFS over
> OpenVPN).

Yeah.  Mark the safe places for us and we'll use them.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Miklos Szeredi
> > The only thing to do is what Rafael has been working on: unfreeze
> > things, hope the tasks sort themselves out, and try again.
> 
> That's what I'm questioning. Is there a more reliable way and we've
> just given up too quickly?

There obviously _are_ more reliable ways.  A trivial one seems to be
to just not require user tasks to finish syscalls.

Yeah, stopping user processes outside the kernel is convenient, but
there's no fundamental reason why it is the only place where those
tasks can be stopped.

And there are very fundamental reasons to _not_ require this.  Not
just in the fuse case, but in any case where a syscall requires
another user task to run before it can be finished (e.g. NFS over
OpenVPN).

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Miklos Szeredi
  The only thing to do is what Rafael has been working on: unfreeze
  things, hope the tasks sort themselves out, and try again.
 
 That's what I'm questioning. Is there a more reliable way and we've
 just given up too quickly?

There obviously _are_ more reliable ways.  A trivial one seems to be
to just not require user tasks to finish syscalls.

Yeah, stopping user processes outside the kernel is convenient, but
there's no fundamental reason why it is the only place where those
tasks can be stopped.

And there are very fundamental reasons to _not_ require this.  Not
just in the fuse case, but in any case where a syscall requires
another user task to run before it can be finished (e.g. NFS over
OpenVPN).

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Rafael J. Wysocki
On Monday, 23 July 2007 12:24, Miklos Szeredi wrote:
   The only thing to do is what Rafael has been working on: unfreeze
   things, hope the tasks sort themselves out, and try again.
  
  That's what I'm questioning. Is there a more reliable way and we've
  just given up too quickly?
 
 There obviously _are_ more reliable ways.  A trivial one seems to be
 to just not require user tasks to finish syscalls.
 
 Yeah, stopping user processes outside the kernel is convenient, but
 there's no fundamental reason why it is the only place where those
 tasks can be stopped.

The reason is that we want them to park in safe places, ie. where there
are no locks held etc.  Thus, these safe places need to be chosen somehow
and since they are not marked throughout the code, we choose the obvious
one. :-)

 And there are very fundamental reasons to _not_ require this.  Not
 just in the fuse case, but in any case where a syscall requires
 another user task to run before it can be finished (e.g. NFS over
 OpenVPN).

Yeah.  Mark the safe places for us and we'll use them.

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Miklos Szeredi
 On Monday, 23 July 2007 12:24, Miklos Szeredi wrote:
The only thing to do is what Rafael has been working on: unfreeze
things, hope the tasks sort themselves out, and try again.
   
   That's what I'm questioning. Is there a more reliable way and we've
   just given up too quickly?
  
  There obviously _are_ more reliable ways.  A trivial one seems to be
  to just not require user tasks to finish syscalls.
  
  Yeah, stopping user processes outside the kernel is convenient, but
  there's no fundamental reason why it is the only place where those
  tasks can be stopped.
 
 The reason is that we want them to park in safe places, ie. where there
 are no locks held etc.  Thus, these safe places need to be chosen somehow
 and since they are not marked throughout the code, we choose the obvious
 one. :-)

Why shouldn't locks be held?

No locks which are required for suspend must be held, sure.  But
otherwise holding locks doesn't matter at all.

And I'm not saying that is trivial to do, but it might not be too hard
either.

Rafael, can you please tell, what happened to that patch, that did not
wait for tasks in uninterruptible sleep to be frozen?

That seemed like a magnificent approach compared to anything that has
been proposed since.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Rafael J. Wysocki
On Monday, 23 July 2007 14:14, Miklos Szeredi wrote:
  On Monday, 23 July 2007 12:24, Miklos Szeredi wrote:
 The only thing to do is what Rafael has been working on: unfreeze
 things, hope the tasks sort themselves out, and try again.

That's what I'm questioning. Is there a more reliable way and we've
just given up too quickly?
   
   There obviously _are_ more reliable ways.  A trivial one seems to be
   to just not require user tasks to finish syscalls.
   
   Yeah, stopping user processes outside the kernel is convenient, but
   there's no fundamental reason why it is the only place where those
   tasks can be stopped.
  
  The reason is that we want them to park in safe places, ie. where there
  are no locks held etc.  Thus, these safe places need to be chosen somehow
  and since they are not marked throughout the code, we choose the obvious
  one. :-)
 
 Why shouldn't locks be held?
 
 No locks which are required for suspend must be held, sure.  But
 otherwise holding locks doesn't matter at all.
 
 And I'm not saying that is trivial to do, but it might not be too hard
 either.
 
 Rafael, can you please tell, what happened to that patch, that did not
 wait for tasks in uninterruptible sleep to be frozen?
 
 That seemed like a magnificent approach compared to anything that has
 been proposed since.

Well, the freezer have failed to freeze tasks for a couple of times in my
test setup and I've had a couple of hangs.

I have an idea how to improve it, but that still requires some pending freezer
patches to go first.

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Oliver Neukum
Am Montag 23 Juli 2007 schrieb Miklos Szeredi:
  The reason is that we want them to park in safe places, ie. where there
  are no locks held etc.  Thus, these safe places need to be chosen somehow
  and since they are not marked throughout the code, we choose the obvious
  one. :-)
 
 Why shouldn't locks be held?
 
 No locks which are required for suspend must be held, sure.  But
 otherwise holding locks doesn't matter at all.

If you can provide a way to tell them apart, this would work.

Regards
Oliver

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Miklos Szeredi
   The reason is that we want them to park in safe places, ie. where there
   are no locks held etc.  Thus, these safe places need to be chosen somehow
   and since they are not marked throughout the code, we choose the obvious
   one. :-)
  
  Why shouldn't locks be held?
  
  No locks which are required for suspend must be held, sure.  But
  otherwise holding locks doesn't matter at all.
 
 If you can provide a way to tell them apart, this would work.

Without some marking we can't tell obviously.

Are there many such locks?  We can easily check by adding some
debugging code to the lock primitives, to make them yell if they are
used during suspend.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Rafael J. Wysocki
On Monday, 23 July 2007 15:08, Miklos Szeredi wrote:
The reason is that we want them to park in safe places, ie. where 
there
are no locks held etc.  Thus, these safe places need to be chosen 
somehow
and since they are not marked throughout the code, we choose the obvious
one. :-)
   
   Why shouldn't locks be held?
   
   No locks which are required for suspend must be held, sure.  But
   otherwise holding locks doesn't matter at all.
  
  If you can provide a way to tell them apart, this would work.
 
 Without some marking we can't tell obviously.
 
 Are there many such locks?  We can easily check by adding some
 debugging code to the lock primitives, to make them yell if they are
 used during suspend.

This way we can only obtain information from systems that use hibernation
quite often.

Alan has recently proposed to introduce suspend locks to be acquired during
a suspend/hibernation and such that we can leave uninterruptible tasks that
don't hold any of them.

Unfortunately, I have no link to his original message at hand.

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Miklos Szeredi
 Alan has recently proposed to introduce suspend locks to be acquired during
 a suspend/hibernation and such that we can leave uninterruptible tasks that
 don't hold any of them.

Sounds sane.  A global rwsem could be acquired for read by drivers,
and for write by suspend/hibernate.  Just need to add it to all
drivers that have PM, but that shouldn't need a heroic effort.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Oliver Neukum
Am Samstag 21 Juli 2007 schrieb Alan Stern:
 On Fri, 20 Jul 2007, Oliver Neukum wrote:
 
   We already have a pre-suspend notification available for drivers that 
   need to allocate large amounts of memory.
  
  Is that facility fine grained enough?
 
 It's a notifier chain that gets called at several points during the 
 suspend transition.  One of those points is right at the start, while 
 userspace is still running and reasonably large amounts of memory can 
 be allocated.
 
 Is it fine-grained enough?  I don't know -- hard to tell, since nothing 
 much is using it yet.
 
   You are correct about the need to delay/stop device addition.  I don't
   know how this can be done in general; each code path calling
   device_add() may have to be treated individually.
  
  What about the old API?
 
 What old API do you mean?

The find_device() stuff.

   Do we have to block module loading?
 
 No.  Registering new drivers is okay, registering new devices is bad.

What if it is a driver for virtual devices that don't need probe()
for actual hardware?

 Of course, some modules do want to register a new device in their init 
 method.  I don't know what we should do about them.  Force the 
 registration to fail, I suppose.  How often will people suspend while a 
 module is loading?
 
  What happens if a scsi error handler is woken? If it cannot be woken,
  how are errors handled?
 
 Why should the error handler wake up?  There isn't supposed to be any 
 I/O going on, hence no errors to handle.

What about shared busses? Firewire, FibreChannel? They can get external
resets, etc ...

Regards
Oliver
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Alan Stern
On Sun, 22 Jul 2007 [EMAIL PROTECTED] wrote:

  You are confusing userspace with user tasks.  And not only that,
  you often use the term userspace when you should say user mode.
 
  If you want I can explain the differences.
 
 please do, I have been treating all three as the same catagory.

Very briefly then: User mode and kernel mode refer to the CPU's
hardware privilege level.  A process makes the transition from user
mode to kernel mode by executing a system call.  Interrupt and
exception handlers also run in kernel mode, but they generally are not
considered to be part of any process.  The reverse transition occurs
when a process returns from a system call, or when an interrupt which
occurred while the CPU was in user mode completes.  (It's interesting
to note that system calls are somewhat similar to interrupts; in fact
sometimes they are implemented by a software interrupt.)

Kernel threads are processes that run entirely in kernel mode.  They
usually don't have a memory mapping for any user-owned memory and they
never go into user mode.  All other processes are user threads.

Userspace is a rather general term referring to things not in the
kernel.  It comprises both user tasks (while running in user mode) and
user memory.

 Ok, I did misunderstand you. it sound slike all you need to do to make 
 sure that locks are not held is to allow system calls to return before 
 trying to do the suspend/kexec/etc. that sounds like not only a trivial 
 thing to do, but something that would probably be done anyway.

If you could actually do it, it would work.  But you can't do it.  If 
it were feasible, the freezer would have used that approach in the 
first place.

For one thing, checking for a suspend-in-progress at the beginning of
each and every system call would add overhead to a hot path in the
kernel, one which is already very heavily optimized.  People wouldn't
stand for it.

 although syscalls that then call out to userspace tasks before they can 
 complete cause potential deadlocks (without that issue you can just wait 
 until all syscalls have returned, and not allow anything to issue new 
 syscalls) is this the issue that's killing FUSE+suspend?

You get similar problems from system calls that wait in kernel mode 
until something has happened.  For example, a read() call for the 
console device will wait until somebody types on the keyboard.  At any 
point in time, many (or even most) user threads are blocked in a system 
call.

  Here's what you are missing:
 
  The new kexec approach eliminates the freezer and relies instead on the
  fact that none of the tasks in the original kernel can execute while
  the new kexec'd kernel is running.  This means the new kernel can write
  out a memory image with no fear of interference or corruption.
 
 correct
 
  But it also means that tasks which otherwise would have been frozen are
  actually free to run before the kexec call is made (and after the call
  returns, if the kexec'd kernel returns back to the original kernel).
  Any driver which was written with the assumption that tasks would be
  frozen at those times will need to be changed.
 
 here is where you loose me.
 
 why should jumping back to the original kernel immedialty start running 
 these processes?

Let's let kernel K1 be the original kernel, the one which is going into
hibernation.  Kernel K2 is the one started by kexec to write out the
memory image.

Your question becomes: Why should K2 jumping back to K1 cause K1
immediately to start running user tasks?  Answer: Because K1 has been
running user tasks all along (except while K2 was active) and nothing
has told it to stop.  In fact, about the only things which _can_ cause
K1 to stop running user threads are the freezer (which you want to
eliminate) and disabling interrupts (not possible since some drivers
require interrupts to be enabled when putting devices in low-power 
mode).

  the process of doing a kexec requires things to happen in 
 the drivers before normal activity can happen, so there is a phase in 
 there where the kernel being jumped to has drivers initializing, but still 
 does not allow anything else to run.

So when K2 starts up, it will have a phase in which user threads don't 
run.  That doesn't affect K1.  When K2 returns to K1, K1 does not go 
through this sort of phase.  It simply picks up from where it left off.

 why can't this phase be extended to 
 allow for the possibility of transitioning these drivers to a sleep mode 
 instead of to full operation?

Indeed, Rafael has suggested that K2 be responsible for putting devices
in low-power mode.  This has the disadvantage of requiring K2 to 
include drivers for every device used by K1, but otherwise it would 
work.

However there still remains the problem of user tasks running after 
devices are supposed to be quiescent and before K1 starts.  There's 
currently nothing to stop such tasks from making I/O requests and 
thereby causing a quiescent device to become active 

Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Alan Stern
On Mon, 23 Jul 2007, Nigel Cunningham wrote:

 Take a step back for a second.
 
 The problem we're facing now is that we're getting some userspace threads, 
 used in processing I/O, that are functioning as exceptions to the freeze 
 userspace, then freezeable kernel threads rule. They are only exceptions 
 because of that role in processing I/O - because they're de facto kernel 
 threads. So, if we orient our thinking more in terms of I/O processing and 
 less in terms of the userspace/kernelspace distinction, we'll have a 
 solution:
 
 1) Freeze processes that aren't fs related (ie stop them generating I/O).

The problem here is that with things like FUSE, _every_ process is 
potentially fs related.  Nothing prevents a FUSE thread from doing IPC 
with any other thread.

 2) Flush pending I/O.
 3) Freeze filesystems in reverse order of dependency, the primary purpose 
 being to stop them generating further I/O on their metadata.
 
 Locks that are being held are only being held because work is being done. If 
 we progressively focus on threads in terms of their create/process work 
 dependencies, we'll see that the problem isn't at all intractable.

As has been mentioned before, keeping track of all that dependency 
information would be very fragile and time-consuming.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread david

On Mon, 23 Jul 2007, Alan Stern wrote:


On Sun, 22 Jul 2007 [EMAIL PROTECTED] wrote:




Ok, I did misunderstand you. it sound slike all you need to do to make
sure that locks are not held is to allow system calls to return before
trying to do the suspend/kexec/etc. that sounds like not only a trivial
thing to do, but something that would probably be done anyway.


If you could actually do it, it would work.  But you can't do it.  If
it were feasible, the freezer would have used that approach in the
first place.

For one thing, checking for a suspend-in-progress at the beginning of
each and every system call would add overhead to a hot path in the
kernel, one which is already very heavily optimized.  People wouldn't
stand for it.


I thought that the suspend stuff did this easily, but the freezer really 
starts running into trouble when it wants to freeze some things, but not 
other things. this seems to be the biggest area of churn and problems.



although syscalls that then call out to userspace tasks before they can
complete cause potential deadlocks (without that issue you can just wait
until all syscalls have returned, and not allow anything to issue new
syscalls) is this the issue that's killing FUSE+suspend?


You get similar problems from system calls that wait in kernel mode
until something has happened.  For example, a read() call for the
console device will wait until somebody types on the keyboard.  At any
point in time, many (or even most) user threads are blocked in a system
call.


but are locks held while they are blocked like this?


But it also means that tasks which otherwise would have been frozen are
actually free to run before the kexec call is made (and after the call
returns, if the kexec'd kernel returns back to the original kernel).
Any driver which was written with the assumption that tasks would be
frozen at those times will need to be changed.


here is where you loose me.

why should jumping back to the original kernel immedialty start running
these processes?


Let's let kernel K1 be the original kernel, the one which is going into
hibernation.  Kernel K2 is the one started by kexec to write out the
memory image.

Your question becomes: Why should K2 jumping back to K1 cause K1
immediately to start running user tasks?  Answer: Because K1 has been
running user tasks all along (except while K2 was active) and nothing
has told it to stop.  In fact, about the only things which _can_ cause
K1 to stop running user threads are the freezer (which you want to
eliminate) and disabling interrupts (not possible since some drivers
require interrupts to be enabled when putting devices in low-power
mode).


when you jump to a body of code you jump to a specific point in the code, 
not to some nebulous 'everything running' state.



 the process of doing a kexec requires things to happen in
the drivers before normal activity can happen, so there is a phase in
there where the kernel being jumped to has drivers initializing, but still
does not allow anything else to run.


So when K2 starts up, it will have a phase in which user threads don't
run.  That doesn't affect K1.  When K2 returns to K1, K1 does not go
through this sort of phase.  It simply picks up from where it left off.


then how can it restart drivers before the user threads need them?


why can't this phase be extended to
allow for the possibility of transitioning these drivers to a sleep mode
instead of to full operation?


Indeed, Rafael has suggested that K2 be responsible for putting devices
in low-power mode.  This has the disadvantage of requiring K2 to
include drivers for every device used by K1, but otherwise it would
work.

However there still remains the problem of user tasks running after
devices are supposed to be quiescent and before K1 starts.  There's
currently nothing to stop such tasks from making I/O requests and
thereby causing a quiescent device to become active again.


but if the devices are in low power mode then K1 needs to get them out of 
low power mode before user tasks try to access them.



The situation as regards locking is harder to discuss since I don't
know of any code examples to use as a guide.  The fact remains that if
user tasks aren't frozen then they can make system calls, and while
running in kernel mode they can acquire locks, which might cause
problems -- even though I can't identify any definite examples.


yes, if userspace is running jobs and submitting I/O and system calls
while drivers are trying to initalize there is a big problem, but I am
missing the reason this must be the case.


We aren't talking about drivers initializing devices.  We are talking
about what happens during the time when drivers are trying to quiesce
devices (i.e., before K1 has started up K2) or power them down (after
K2 has returned to K1).


or if you are doing a resume instead of a suspend to ram the drivers need 
to initialize or otherwise move to full power on K1 before user tasks hit 
them.



the 

Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread david

On Mon, 23 Jul 2007, Oliver Neukum wrote:


Am Montag 23 Juli 2007 schrieb Miklos Szeredi:

The reason is that we want them to park in safe places, ie. where there
are no locks held etc.  Thus, these safe places need to be chosen somehow
and since they are not marked throughout the code, we choose the obvious
one. :-)


Why shouldn't locks be held?

No locks which are required for suspend must be held, sure.  But
otherwise holding locks doesn't matter at all.


If you can provide a way to tell them apart, this would work.


can you just tell the driver to try and suspend and if it reports back 
that it fails back out of the suspend? or will the driver deadlock instead 
of reporting a failure if a lock is held.


David Lang

Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Alan Stern
On Mon, 23 Jul 2007 [EMAIL PROTECTED] wrote:

  For one thing, checking for a suspend-in-progress at the beginning of
  each and every system call would add overhead to a hot path in the
  kernel, one which is already very heavily optimized.  People wouldn't
  stand for it.
 
 I thought that the suspend stuff did this easily,

It does not do it at all.  Do you know how the freezer works?

  but the freezer really 
 starts running into trouble when it wants to freeze some things, but not 
 other things. this seems to be the biggest area of churn and problems.

No.  The freezer starts running into trouble when it wants to freeze a
thread but can't, because that thread is waiting for some event to
occur and the only thread which can cause the event is already frozen.  
Or is itself waiting for a third thread which is already frozen...

  You get similar problems from system calls that wait in kernel mode
  until something has happened.  For example, a read() call for the
  console device will wait until somebody types on the keyboard.  At any
  point in time, many (or even most) user threads are blocked in a system
  call.
 
 but are locks held while they are blocked like this?

Sometimes they are, sometimes they aren't.

  Let's let kernel K1 be the original kernel, the one which is going into
  hibernation.  Kernel K2 is the one started by kexec to write out the
  memory image.
 
  Your question becomes: Why should K2 jumping back to K1 cause K1
  immediately to start running user tasks?  Answer: Because K1 has been
  running user tasks all along (except while K2 was active) and nothing
  has told it to stop.  In fact, about the only things which _can_ cause
  K1 to stop running user threads are the freezer (which you want to
  eliminate) and disabling interrupts (not possible since some drivers
  require interrupts to be enabled when putting devices in low-power
  mode).
 
 when you jump to a body of code you jump to a specific point in the code, 
 not to some nebulous 'everything running' state.

How is that relevant?  When K2 jumps back to K1, it jumps to some 
designated location in K1.  It might just after the place where K1 
called K2; I'm not familiar with the details of kexec.  In any event, 
K1 will still be in the same state as it was when it called K2.

  So when K2 starts up, it will have a phase in which user threads don't
  run.  That doesn't affect K1.  When K2 returns to K1, K1 does not go
  through this sort of phase.  It simply picks up from where it left off.
 
 then how can it restart drivers before the user threads need them?

It can't.  Indeed, in the absence of a freezer, user threads will need 
devices (more accurately, will submit I/O requests for devices) that 
have to be kept quiescent or low-power.  Drivers will need to delay 
those requests until the devices are returned to full operation.

That's exactly what I've been saying all along: Drivers will need to 
be changed to delay I/O requests, if there is no freezer.

  However there still remains the problem of user tasks running after
  devices are supposed to be quiescent and before K1 starts.  There's
  currently nothing to stop such tasks from making I/O requests and
  thereby causing a quiescent device to become active again.
 
 but if the devices are in low power mode then K1 needs to get them out of 
 low power mode before user tasks try to access them.

No -- which is good because it can't.  If a user task is running
there's no way to stop it from submitting I/O requests.  K1 needs to
delay these requests until after the device has returned to full 
operation.

  We aren't talking about drivers initializing devices.  We are talking
  about what happens during the time when drivers are trying to quiesce
  devices (i.e., before K1 has started up K2) or power them down (after
  K2 has returned to K1).
 
 or if you are doing a resume instead of a suspend to ram the drivers need 
 to initialize or otherwise move to full power on K1 before user tasks hit 
 them.

Correct.  User tasks are allowed to submit requests, but the requests 
can't be carried out until the device returns to full operation.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Nigel Cunningham
Hi.

On Tuesday 24 July 2007 01:23:15 Alan Stern wrote:
 On Mon, 23 Jul 2007, Nigel Cunningham wrote:
 
  Take a step back for a second.
  
  The problem we're facing now is that we're getting some userspace threads, 
  used in processing I/O, that are functioning as exceptions to the freeze 
  userspace, then freezeable kernel threads rule. They are only exceptions 
  because of that role in processing I/O - because they're de facto kernel 
  threads. So, if we orient our thinking more in terms of I/O processing and 
  less in terms of the userspace/kernelspace distinction, we'll have a 
  solution:
  
  1) Freeze processes that aren't fs related (ie stop them generating I/O).
 
 The problem here is that with things like FUSE, _every_ process is 
 potentially fs related.  Nothing prevents a FUSE thread from doing IPC 
 with any other thread.

Yes, but the fuse thread is going to know what other thread it's doing IPC 
with, so it can get that thread flagged too.

  2) Flush pending I/O.
  3) Freeze filesystems in reverse order of dependency, the primary purpose 
  being to stop them generating further I/O on their metadata.
  
  Locks that are being held are only being held because work is being done. 
If 
  we progressively focus on threads in terms of their create/process work 
  dependencies, we'll see that the problem isn't at all intractable.
 
 As has been mentioned before, keeping track of all that dependency 
 information would be very fragile and time-consuming.

I disagree. It's at least going to be less fragile and time-consuming then 
maintaining new/extra code for kexec.

Nigel



pgpKo1OjveuTs.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-23 Thread Rafael J. Wysocki
On Monday, 23 July 2007 23:55, Nigel Cunningham wrote:
 Hi.
 
 On Tuesday 24 July 2007 01:23:15 Alan Stern wrote:
  On Mon, 23 Jul 2007, Nigel Cunningham wrote:
  
   Take a step back for a second.
   
   The problem we're facing now is that we're getting some userspace 
   threads, 
   used in processing I/O, that are functioning as exceptions to the freeze 
   userspace, then freezeable kernel threads rule. They are only exceptions 
   because of that role in processing I/O - because they're de facto kernel 
   threads. So, if we orient our thinking more in terms of I/O processing 
   and 
   less in terms of the userspace/kernelspace distinction, we'll have a 
   solution:
   
   1) Freeze processes that aren't fs related (ie stop them generating I/O).
  
  The problem here is that with things like FUSE, _every_ process is 
  potentially fs related.  Nothing prevents a FUSE thread from doing IPC 
  with any other thread.
 
 Yes, but the fuse thread is going to know what other thread it's doing IPC 
 with, so it can get that thread flagged too.

Yes, but that thread may do IPC with yet another one and so on.

   2) Flush pending I/O.
   3) Freeze filesystems in reverse order of dependency, the primary purpose 
   being to stop them generating further I/O on their metadata.
   
   Locks that are being held are only being held because work is being done. 
 If 
   we progressively focus on threads in terms of their create/process work 
   dependencies, we'll see that the problem isn't at all intractable.
  
  As has been mentioned before, keeping track of all that dependency 
  information would be very fragile and time-consuming.
 
 I disagree. It's at least going to be less fragile and time-consuming then 
 maintaining new/extra code for kexec.

Well, I think the issue is real, so we need to find a solution (the simpler,
the better) and that need not be related to kexec. ;-)

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread david

On Mon, 23 Jul 2007, Nigel Cunningham wrote:


Hi Alan.

On Monday 23 July 2007 01:26:23 Alan Stern wrote:

On Sun, 22 Jul 2007, Nigel Cunningham wrote:


Hi.

On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:

It seems that you could still potentially get a failure to freeze if one
FUSE process depends on another, and the one that is frozen second just
happens to be waiting on the one that is frozen first when it is frozen.
I admit that this situation is unlikely, and perhaps acceptable.

A larger concern is that it seems that freezing FUSE processes at all
_will_ generate deadlocks if a non-synchronous or memory-map-supporting
filesystem is loopback mounted from a FUSE filesystem.  In that case, if
you attempt to sync or free memory once FUSE is frozen, you are sure to
get a deadlock.


Ok. So then (in response to Alan too), how about keeping a tree of mounts,
akin to the device tree, and working from the deepest nodes up? (In
conjunction with what I already suggested)?


Face it, Nigel, this is a losing battle.  You can try to come up with
ever-more complex schemes to try and force FUSE into the freezer's
framework, but it just won't fit.  Or if it does, the next filesystem
to come along will require an even more baroque type of special-case
handling.


It does seem to be a losing battle, but I'm wondering whether that's really
because it's an intractable problem, or because people have given up on it
before its time. We are talking about a computer system, so things should be
predictable.


The general problem is that task A may be in an unfreezable state,
waiting for task B to do something, while task B is already frozen.
Since there's no reasonable way to determine that A really is waiting
for B, you're just stuck.  (To make matters worse, A may not even
realize which task it is waiting for; it may know only that it's
waiting for somebody to do something!)  A and B could be user tasks,
kernel threads, or one of each.


I guess I want to persist because all of these issues aren't utterly
unsolvable. It's just that we don't have the infrastructure yet to figure out
the solutions to these issues trivially. Take, for example, the locking
issue. If we could call some function to say "What process holds this lock?",
then task A could know that it's waiting on task B and put that information
somewhere. We could then use the information to freeze task B before task A.



this sounds like the standard priority inversion problem taken to 
extremes. Ingo has been working this issue, but IIRC the problem is that 
tracking what owns the lock so that you can get that thing to run ends up 
being enough overhead that it's not acceptable in the general case.


David Lang


The only thing to do is what Rafael has been working on: unfreeze
things, hope the tasks sort themselves out, and try again.


That's what I'm questioning. Is there a more reliable way and we've just given
up too quickly?

Regards,

Nigel


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Nigel Cunningham
Hi.

On Monday 23 July 2007 10:04:43 Paul Mackerras wrote:
> Nigel Cunningham writes:
> 
> > I guess I want to persist because all of these issues aren't utterly
> > unsolvable. It's just that we don't have the infrastructure yet to
> > figure out the solutions to these issues trivially. Take, for example,
> 
> Ever heard of the halting problem? :)  It's not just a matter of
> infrastructure.  You very quickly get into questions that are
> mathematically undecideable.

Is this the halting problem, though?

> > the locking issue. If we could call some function to say "What process
> > holds this lock?", then task A could know that it's waiting on task B
> > and put that information somewhere. We could then use the information
> > to freeze task B before task A.
> 
> But how would that help?  If task B holds the lock, then we can't
> freeze it until it's released the lock.  Then the question is, what
> does task B need in order to get to the point where it releases the
> lock?  And so on.  It rapidly gets not just extremely messy, but
> actually impossible to compute in general.

Take a step back for a second.

The problem we're facing now is that we're getting some userspace threads, 
used in processing I/O, that are functioning as exceptions to the "freeze 
userspace, then freezeable kernel threads" rule. They are only exceptions 
because of that role in processing I/O - because they're de facto kernel 
threads. So, if we orient our thinking more in terms of I/O processing and 
less in terms of the userspace/kernelspace distinction, we'll have a 
solution:

1) Freeze processes that aren't fs related (ie stop them generating I/O).
2) Flush pending I/O.
3) Freeze filesystems in reverse order of dependency, the primary purpose 
being to stop them generating further I/O on their metadata.

Locks that are being held are only being held because work is being done. If 
we progressively focus on threads in terms of their create/process work 
dependencies, we'll see that the problem isn't at all intractable.

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgpjTSNWacYUf.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Paul Mackerras
Nigel Cunningham writes:

> I guess I want to persist because all of these issues aren't utterly
> unsolvable. It's just that we don't have the infrastructure yet to
> figure out the solutions to these issues trivially. Take, for example,

Ever heard of the halting problem? :)  It's not just a matter of
infrastructure.  You very quickly get into questions that are
mathematically undecideable.

> the locking issue. If we could call some function to say "What process
> holds this lock?", then task A could know that it's waiting on task B
> and put that information somewhere. We could then use the information
> to freeze task B before task A.

But how would that help?  If task B holds the lock, then we can't
freeze it until it's released the lock.  Then the question is, what
does task B need in order to get to the point where it releases the
lock?  And so on.  It rapidly gets not just extremely messy, but
actually impossible to compute in general.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Nigel Cunningham
On Monday 23 July 2007 09:09:21 Rafael J. Wysocki wrote:
> Hi,
> 
> On Monday, 23 July 2007 00:42, Nigel Cunningham wrote:
> > Hi Alan.
> > 
> > On Monday 23 July 2007 01:26:23 Alan Stern wrote:
> > > On Sun, 22 Jul 2007, Nigel Cunningham wrote:
> > > 
> > > > Hi.
> > > > 
> > > > On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
> > > > > It seems that you could still potentially get a failure to freeze if 
one
> > > > > FUSE process depends on another, and the one that is frozen second 
just
> > > > > happens to be waiting on the one that is frozen first when it is 
frozen.
> > > > > I admit that this situation is unlikely, and perhaps acceptable.
> > > > > 
> > > > > A larger concern is that it seems that freezing FUSE processes at 
all
> > > > > _will_ generate deadlocks if a non-synchronous or 
memory-map-supporting
> > > > > filesystem is loopback mounted from a FUSE filesystem.  In that 
case, if
> > > > > you attempt to sync or free memory once FUSE is frozen, you are sure 
to
> > > > > get a deadlock.
> > > > 
> > > > Ok. So then (in response to Alan too), how about keeping a tree of 
mounts, 
> > > > akin to the device tree, and working from the deepest nodes up? (In 
> > > > conjunction with what I already suggested)?
> > > 
> > > Face it, Nigel, this is a losing battle.  You can try to come up with
> > > ever-more complex schemes to try and force FUSE into the freezer's
> > > framework, but it just won't fit.  Or if it does, the next filesystem
> > > to come along will require an even more baroque type of special-case 
> > > handling.
> > 
> > It does seem to be a losing battle, but I'm wondering whether that's 
really 
> > because it's an intractable problem, or because people have given up on it 
> > before its time. We are talking about a computer system, so things should 
be 
> > predictable.
> >  
> > > The general problem is that task A may be in an unfreezable state,
> > > waiting for task B to do something, while task B is already frozen.  
> > > Since there's no reasonable way to determine that A really is waiting
> > > for B, you're just stuck.  (To make matters worse, A may not even
> > > realize which task it is waiting for; it may know only that it's
> > > waiting for somebody to do something!)  A and B could be user tasks, 
> > > kernel threads, or one of each.
> > 
> > I guess I want to persist because all of these issues aren't utterly 
> > unsolvable. It's just that we don't have the infrastructure yet to figure 
out 
> > the solutions to these issues trivially. Take, for example, the locking 
> > issue. If we could call some function to say "What process holds this 
lock?", 
> > then task A could know that it's waiting on task B and put that 
information 
> > somewhere. We could then use the information to freeze task B before task 
A.
> > 
> >  
> > > The only thing to do is what Rafael has been working on: unfreeze
> > > things, hope the tasks sort themselves out, and try again.
> > 
> > That's what I'm questioning. Is there a more reliable way and we've just 
given 
> > up too quickly?
> 
> Well, there probably is one, but it likely would require us to make changes
> that wouldn't be accepted by some people and thus would never be merged.

Well, doesn't that imply that we should at least look into what changes would 
be needed? If they wouldn't be accepted by some people, then either the 
objections would be reasonable or they wouldn't (and would hopefully be 
overridden). But we can't know if we don't try.

Regards,

Nigel

-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgptv0SjDRopM.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Rafael J. Wysocki
Hi,

On Monday, 23 July 2007 00:42, Nigel Cunningham wrote:
> Hi Alan.
> 
> On Monday 23 July 2007 01:26:23 Alan Stern wrote:
> > On Sun, 22 Jul 2007, Nigel Cunningham wrote:
> > 
> > > Hi.
> > > 
> > > On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
> > > > It seems that you could still potentially get a failure to freeze if one
> > > > FUSE process depends on another, and the one that is frozen second just
> > > > happens to be waiting on the one that is frozen first when it is frozen.
> > > > I admit that this situation is unlikely, and perhaps acceptable.
> > > > 
> > > > A larger concern is that it seems that freezing FUSE processes at all
> > > > _will_ generate deadlocks if a non-synchronous or memory-map-supporting
> > > > filesystem is loopback mounted from a FUSE filesystem.  In that case, if
> > > > you attempt to sync or free memory once FUSE is frozen, you are sure to
> > > > get a deadlock.
> > > 
> > > Ok. So then (in response to Alan too), how about keeping a tree of 
> > > mounts, 
> > > akin to the device tree, and working from the deepest nodes up? (In 
> > > conjunction with what I already suggested)?
> > 
> > Face it, Nigel, this is a losing battle.  You can try to come up with
> > ever-more complex schemes to try and force FUSE into the freezer's
> > framework, but it just won't fit.  Or if it does, the next filesystem
> > to come along will require an even more baroque type of special-case 
> > handling.
> 
> It does seem to be a losing battle, but I'm wondering whether that's really 
> because it's an intractable problem, or because people have given up on it 
> before its time. We are talking about a computer system, so things should be 
> predictable.
>  
> > The general problem is that task A may be in an unfreezable state,
> > waiting for task B to do something, while task B is already frozen.  
> > Since there's no reasonable way to determine that A really is waiting
> > for B, you're just stuck.  (To make matters worse, A may not even
> > realize which task it is waiting for; it may know only that it's
> > waiting for somebody to do something!)  A and B could be user tasks, 
> > kernel threads, or one of each.
> 
> I guess I want to persist because all of these issues aren't utterly 
> unsolvable. It's just that we don't have the infrastructure yet to figure out 
> the solutions to these issues trivially. Take, for example, the locking 
> issue. If we could call some function to say "What process holds this lock?", 
> then task A could know that it's waiting on task B and put that information 
> somewhere. We could then use the information to freeze task B before task A.
> 
>  
> > The only thing to do is what Rafael has been working on: unfreeze
> > things, hope the tasks sort themselves out, and try again.
> 
> That's what I'm questioning. Is there a more reliable way and we've just 
> given 
> up too quickly?

Well, there probably is one, but it likely would require us to make changes
that wouldn't be accepted by some people and thus would never be merged.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Nigel Cunningham
Hi Alan.

On Monday 23 July 2007 01:26:23 Alan Stern wrote:
> On Sun, 22 Jul 2007, Nigel Cunningham wrote:
> 
> > Hi.
> > 
> > On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
> > > It seems that you could still potentially get a failure to freeze if one
> > > FUSE process depends on another, and the one that is frozen second just
> > > happens to be waiting on the one that is frozen first when it is frozen.
> > > I admit that this situation is unlikely, and perhaps acceptable.
> > > 
> > > A larger concern is that it seems that freezing FUSE processes at all
> > > _will_ generate deadlocks if a non-synchronous or memory-map-supporting
> > > filesystem is loopback mounted from a FUSE filesystem.  In that case, if
> > > you attempt to sync or free memory once FUSE is frozen, you are sure to
> > > get a deadlock.
> > 
> > Ok. So then (in response to Alan too), how about keeping a tree of mounts, 
> > akin to the device tree, and working from the deepest nodes up? (In 
> > conjunction with what I already suggested)?
> 
> Face it, Nigel, this is a losing battle.  You can try to come up with
> ever-more complex schemes to try and force FUSE into the freezer's
> framework, but it just won't fit.  Or if it does, the next filesystem
> to come along will require an even more baroque type of special-case 
> handling.

It does seem to be a losing battle, but I'm wondering whether that's really 
because it's an intractable problem, or because people have given up on it 
before its time. We are talking about a computer system, so things should be 
predictable.
 
> The general problem is that task A may be in an unfreezable state,
> waiting for task B to do something, while task B is already frozen.  
> Since there's no reasonable way to determine that A really is waiting
> for B, you're just stuck.  (To make matters worse, A may not even
> realize which task it is waiting for; it may know only that it's
> waiting for somebody to do something!)  A and B could be user tasks, 
> kernel threads, or one of each.

I guess I want to persist because all of these issues aren't utterly 
unsolvable. It's just that we don't have the infrastructure yet to figure out 
the solutions to these issues trivially. Take, for example, the locking 
issue. If we could call some function to say "What process holds this lock?", 
then task A could know that it's waiting on task B and put that information 
somewhere. We could then use the information to freeze task B before task A.

 
> The only thing to do is what Rafael has been working on: unfreeze
> things, hope the tasks sort themselves out, and try again.

That's what I'm questioning. Is there a more reliable way and we've just given 
up too quickly?

Regards,

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Cobden
103 Curdie Street, Cobden 3266, Victoria, Australia
Ph. +61 3 5595 1185 / +61 417 100 574
Communal Worship: 11 am Sunday.


pgpVAIGM5vqnS.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread david

On Sun, 22 Jul 2007, Alan Stern wrote:


On Sun, 22 Jul 2007, Miklos Szeredi wrote:


The only thing to do is what Rafael has been working on: unfreeze
things, hope the tasks sort themselves out, and try again.


Have we some proof, that this will untangle the freezing tasks in a
limited time?  Or will it just make the problem harder to trigger?


Of course there's no proof.  Just the opposite -- if things get hung up
the first time, they might get hung up the second time.  And the
third...

But it ought to make the problem harder to trigger.  For the present
that's a worthwhile improvement.


it gives the system more tries to find a spot in time where the deadlock 
doesn't happen, if you find one you can continue.


but even if things keep getting hung up, at least you are backing out of 
each try safely and can eventually tell the user "I give up, try shutting 
some things down and suspending again"


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread david

On Sun, 22 Jul 2007, Alan Stern wrote:


On Sat, 21 Jul 2007 [EMAIL PROTECTED] wrote:


wait a min her, it's possible we are misunderstanding each other.


I'd describe it as: You are misunderstanding me.  :-)


very possibly :-)


as I see it.

if userspace can aquire locks that prevent the kernel from shutting off
(or doing anything else in particular) then it's possible for misbehaving
userspace code to stop the kernel by simply choosing to never release the
lock.

this would be a trivial DOS from userspace.


You are confusing "userspace" with "user tasks".  And not only that,
you often use the term "userspace" when you should say "user mode".

If you want I can explain the differences.


please do, I have been treating all three as the same catagory.


now, if you are talking instead about the fact that when userspace makes a
system call, the execution of that system call involves aquiring locks
that are released before the system call completes you have a very
different situation.


That is exactly what I have been talking about.  It may be different
from what you _thought_, but it's not different from what I actually
_said_.


Ok, I did misunderstand you. it sound slike all you need to do to make 
sure that locks are not held is to allow system calls to return before 
trying to do the suspend/kexec/etc. that sounds like not only a trivial 
thing to do, but something that would probably be done anyway.


although syscalls that then call out to userspace tasks before they can 
complete cause potential deadlocks (without that issue you can just wait 
until all syscalls have returned, and not allow anything to issue new 
syscalls) is this the issue that's killing FUSE+suspend?



if you have locks that are held across system calls then you should
already have problems. becouse you can't count on userspace ever taking
whatever action is appropriate to release the lock.

what am I missing that concerns you so much?


Here's what you are missing:

The new kexec approach eliminates the freezer and relies instead on the
fact that none of the tasks in the original kernel can execute while
the new kexec'd kernel is running.  This means the new kernel can write
out a memory image with no fear of interference or corruption.


correct


But it also means that tasks which otherwise would have been frozen are
actually free to run before the kexec call is made (and after the call
returns, if the kexec'd kernel returns back to the original kernel).
Any driver which was written with the assumption that tasks would be
frozen at those times will need to be changed.


here is where you loose me.

why should jumping back to the original kernel immedialty start running 
these processes? the process of doing a kexec requires things to happen in 
the drivers before normal activity can happen, so there is a phase in 
there where the kernel being jumped to has drivers initializing, but still 
does not allow anything else to run. why can't this phase be extended to 
allow for the possibility of transitioning these drivers to a sleep mode 
instead of to full operation?



For example, drivers know that they have to quiesce their device in
preparation for creating the memory snapshot.  But they assume that no
I/O requests will be made while the device is quiesced (because no user
task is capable of generating an I/O request if they are all frozen),
so the driver doesn't try to prevent such requests from reactivating
the device.

The situation as regards locking is harder to discuss since I don't
know of any code examples to use as a guide.  The fact remains that if
user tasks aren't frozen then they can make system calls, and while
running in kernel mode they can acquire locks, which might cause
problems -- even though I can't identify any definite examples.


yes, if userspace is running jobs and submitting I/O and system calls 
while drivers are trying to initalize there is a big problem, but I am 
missing the reason this must be the case.



Because of these problems, it's too early to start trying to use kexec
to avoid the need for the freezer.

Of course, exactly the same possible problems exist when one tries to
remove the freezer from suspend-to-RAM.  It has nothing to do with
kexec in particular (and certainly nothing to do with ACPI).


the part of the freezer that everyone is trying to eliminate is the 
exceptions (freeze everything except X,Y,Z becouse we will need to use 
those later for A)



having read through Documentation/power/devices.txt I remain convinced
that you are making a fundamental mistake.

you are designing a system


I'm not designing anything!  _You_ are.  I'm merely pointing out
problems in your design which you haven't considered.


a better way of phrasing what I meant goes more along the lines of 'the 
current design of the system...'



 that will only work if everything (every
driver, every state transition) participates fully in the process at all
times. You started with the facts 'this is 

Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Alan Stern
On Sun, 22 Jul 2007, Miklos Szeredi wrote:

> > The only thing to do is what Rafael has been working on: unfreeze
> > things, hope the tasks sort themselves out, and try again.
> 
> Have we some proof, that this will untangle the freezing tasks in a
> limited time?  Or will it just make the problem harder to trigger?

Of course there's no proof.  Just the opposite -- if things get hung up
the first time, they might get hung up the second time.  And the
third...

But it ought to make the problem harder to trigger.  For the present 
that's a worthwhile improvement.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Miklos Szeredi
> The only thing to do is what Rafael has been working on: unfreeze
> things, hope the tasks sort themselves out, and try again.

Have we some proof, that this will untangle the freezing tasks in a
limited time?  Or will it just make the problem harder to trigger?

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Alan Stern
On Sat, 21 Jul 2007 [EMAIL PROTECTED] wrote:

> wait a min her, it's possible we are misunderstanding each other.

I'd describe it as: You are misunderstanding me.  :-)

> as I see it.
> 
> if userspace can aquire locks that prevent the kernel from shutting off 
> (or doing anything else in particular) then it's possible for misbehaving 
> userspace code to stop the kernel by simply choosing to never release the 
> lock.
> 
> this would be a trivial DOS from userspace.

You are confusing "userspace" with "user tasks".  And not only that,
you often use the term "userspace" when you should say "user mode".

If you want I can explain the differences.

> now, if you are talking instead about the fact that when userspace makes a 
> system call, the execution of that system call involves aquiring locks 
> that are released before the system call completes you have a very 
> different situation.

That is exactly what I have been talking about.  It may be different
from what you _thought_, but it's not different from what I actually
_said_.

> if you have locks that are held across system calls then you should 
> already have problems. becouse you can't count on userspace ever taking 
> whatever action is appropriate to release the lock.
> 
> what am I missing that concerns you so much?

Here's what you are missing:

The new kexec approach eliminates the freezer and relies instead on the
fact that none of the tasks in the original kernel can execute while
the new kexec'd kernel is running.  This means the new kernel can write
out a memory image with no fear of interference or corruption.

But it also means that tasks which otherwise would have been frozen are 
actually free to run before the kexec call is made (and after the call 
returns, if the kexec'd kernel returns back to the original kernel).  
Any driver which was written with the assumption that tasks would be 
frozen at those times will need to be changed.

For example, drivers know that they have to quiesce their device in
preparation for creating the memory snapshot.  But they assume that no
I/O requests will be made while the device is quiesced (because no user
task is capable of generating an I/O request if they are all frozen),
so the driver doesn't try to prevent such requests from reactivating
the device.

The situation as regards locking is harder to discuss since I don't 
know of any code examples to use as a guide.  The fact remains that if 
user tasks aren't frozen then they can make system calls, and while 
running in kernel mode they can acquire locks, which might cause 
problems -- even though I can't identify any definite examples.

Because of these problems, it's too early to start trying to use kexec
to avoid the need for the freezer.

Of course, exactly the same possible problems exist when one tries to
remove the freezer from suspend-to-RAM.  It has nothing to do with 
kexec in particular (and certainly nothing to do with ACPI).

> having read through Documentation/power/devices.txt I remain convinced 
> that you are making a fundamental mistake.
> 
> you are designing a system

I'm not designing anything!  _You_ are.  I'm merely pointing out
problems in your design which you haven't considered.

>  that will only work if everything (every 
> driver, every state transition) participates fully in the process at all 
> times. You started with the facts 'this is the info that ACPI provides

Look again; I wasn't talking about ACPI.  You have mixed up the issues
in this email thread.  (Not hard to do, since it has been a very long
and complicated thread.)

> and 
> this is how it is designed to be used' and worked from there instead of 
> looking to see what the kernel really needed and figuring how to provide a 
> good interface for that that happens to be implemented (today) with ACPI. 
> (a proper power management framework shouldn't care if you have ACPI, APM, 
> or some other method of controlling the devices)

This and the rest of your email have no bearing on what I was talking
about, so I have snipped out the remainder.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Alan Stern
On Sun, 22 Jul 2007, Nigel Cunningham wrote:

> Hi.
> 
> On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
> > It seems that you could still potentially get a failure to freeze if one
> > FUSE process depends on another, and the one that is frozen second just
> > happens to be waiting on the one that is frozen first when it is frozen.
> > I admit that this situation is unlikely, and perhaps acceptable.
> > 
> > A larger concern is that it seems that freezing FUSE processes at all
> > _will_ generate deadlocks if a non-synchronous or memory-map-supporting
> > filesystem is loopback mounted from a FUSE filesystem.  In that case, if
> > you attempt to sync or free memory once FUSE is frozen, you are sure to
> > get a deadlock.
> 
> Ok. So then (in response to Alan too), how about keeping a tree of mounts, 
> akin to the device tree, and working from the deepest nodes up? (In 
> conjunction with what I already suggested)?

Face it, Nigel, this is a losing battle.  You can try to come up with
ever-more complex schemes to try and force FUSE into the freezer's
framework, but it just won't fit.  Or if it does, the next filesystem
to come along will require an even more baroque type of special-case 
handling.

The general problem is that task A may be in an unfreezable state,
waiting for task B to do something, while task B is already frozen.  
Since there's no reasonable way to determine that A really is waiting
for B, you're just stuck.  (To make matters worse, A may not even
realize which task it is waiting for; it may know only that it's
waiting for somebody to do something!)  A and B could be user tasks, 
kernel threads, or one of each.

The only thing to do is what Rafael has been working on: unfreeze
things, hope the tasks sort themselves out, and try again.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Alan Stern
On Sun, 22 Jul 2007, Nigel Cunningham wrote:

 Hi.
 
 On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
  It seems that you could still potentially get a failure to freeze if one
  FUSE process depends on another, and the one that is frozen second just
  happens to be waiting on the one that is frozen first when it is frozen.
  I admit that this situation is unlikely, and perhaps acceptable.
  
  A larger concern is that it seems that freezing FUSE processes at all
  _will_ generate deadlocks if a non-synchronous or memory-map-supporting
  filesystem is loopback mounted from a FUSE filesystem.  In that case, if
  you attempt to sync or free memory once FUSE is frozen, you are sure to
  get a deadlock.
 
 Ok. So then (in response to Alan too), how about keeping a tree of mounts, 
 akin to the device tree, and working from the deepest nodes up? (In 
 conjunction with what I already suggested)?

Face it, Nigel, this is a losing battle.  You can try to come up with
ever-more complex schemes to try and force FUSE into the freezer's
framework, but it just won't fit.  Or if it does, the next filesystem
to come along will require an even more baroque type of special-case 
handling.

The general problem is that task A may be in an unfreezable state,
waiting for task B to do something, while task B is already frozen.  
Since there's no reasonable way to determine that A really is waiting
for B, you're just stuck.  (To make matters worse, A may not even
realize which task it is waiting for; it may know only that it's
waiting for somebody to do something!)  A and B could be user tasks, 
kernel threads, or one of each.

The only thing to do is what Rafael has been working on: unfreeze
things, hope the tasks sort themselves out, and try again.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Alan Stern
On Sat, 21 Jul 2007 [EMAIL PROTECTED] wrote:

 wait a min her, it's possible we are misunderstanding each other.

I'd describe it as: You are misunderstanding me.  :-)

 as I see it.
 
 if userspace can aquire locks that prevent the kernel from shutting off 
 (or doing anything else in particular) then it's possible for misbehaving 
 userspace code to stop the kernel by simply choosing to never release the 
 lock.
 
 this would be a trivial DOS from userspace.

You are confusing userspace with user tasks.  And not only that,
you often use the term userspace when you should say user mode.

If you want I can explain the differences.

 now, if you are talking instead about the fact that when userspace makes a 
 system call, the execution of that system call involves aquiring locks 
 that are released before the system call completes you have a very 
 different situation.

That is exactly what I have been talking about.  It may be different
from what you _thought_, but it's not different from what I actually
_said_.

 if you have locks that are held across system calls then you should 
 already have problems. becouse you can't count on userspace ever taking 
 whatever action is appropriate to release the lock.
 
 what am I missing that concerns you so much?

Here's what you are missing:

The new kexec approach eliminates the freezer and relies instead on the
fact that none of the tasks in the original kernel can execute while
the new kexec'd kernel is running.  This means the new kernel can write
out a memory image with no fear of interference or corruption.

But it also means that tasks which otherwise would have been frozen are 
actually free to run before the kexec call is made (and after the call 
returns, if the kexec'd kernel returns back to the original kernel).  
Any driver which was written with the assumption that tasks would be 
frozen at those times will need to be changed.

For example, drivers know that they have to quiesce their device in
preparation for creating the memory snapshot.  But they assume that no
I/O requests will be made while the device is quiesced (because no user
task is capable of generating an I/O request if they are all frozen),
so the driver doesn't try to prevent such requests from reactivating
the device.

The situation as regards locking is harder to discuss since I don't 
know of any code examples to use as a guide.  The fact remains that if 
user tasks aren't frozen then they can make system calls, and while 
running in kernel mode they can acquire locks, which might cause 
problems -- even though I can't identify any definite examples.

Because of these problems, it's too early to start trying to use kexec
to avoid the need for the freezer.

Of course, exactly the same possible problems exist when one tries to
remove the freezer from suspend-to-RAM.  It has nothing to do with 
kexec in particular (and certainly nothing to do with ACPI).

 having read through Documentation/power/devices.txt I remain convinced 
 that you are making a fundamental mistake.
 
 you are designing a system

I'm not designing anything!  _You_ are.  I'm merely pointing out
problems in your design which you haven't considered.

  that will only work if everything (every 
 driver, every state transition) participates fully in the process at all 
 times. You started with the facts 'this is the info that ACPI provides

Look again; I wasn't talking about ACPI.  You have mixed up the issues
in this email thread.  (Not hard to do, since it has been a very long
and complicated thread.)

 and 
 this is how it is designed to be used' and worked from there instead of 
 looking to see what the kernel really needed and figuring how to provide a 
 good interface for that that happens to be implemented (today) with ACPI. 
 (a proper power management framework shouldn't care if you have ACPI, APM, 
 or some other method of controlling the devices)

This and the rest of your email have no bearing on what I was talking
about, so I have snipped out the remainder.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Miklos Szeredi
 The only thing to do is what Rafael has been working on: unfreeze
 things, hope the tasks sort themselves out, and try again.

Have we some proof, that this will untangle the freezing tasks in a
limited time?  Or will it just make the problem harder to trigger?

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Alan Stern
On Sun, 22 Jul 2007, Miklos Szeredi wrote:

  The only thing to do is what Rafael has been working on: unfreeze
  things, hope the tasks sort themselves out, and try again.
 
 Have we some proof, that this will untangle the freezing tasks in a
 limited time?  Or will it just make the problem harder to trigger?

Of course there's no proof.  Just the opposite -- if things get hung up
the first time, they might get hung up the second time.  And the
third...

But it ought to make the problem harder to trigger.  For the present 
that's a worthwhile improvement.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread david

On Sun, 22 Jul 2007, Alan Stern wrote:


On Sat, 21 Jul 2007 [EMAIL PROTECTED] wrote:


wait a min her, it's possible we are misunderstanding each other.


I'd describe it as: You are misunderstanding me.  :-)


very possibly :-)


as I see it.

if userspace can aquire locks that prevent the kernel from shutting off
(or doing anything else in particular) then it's possible for misbehaving
userspace code to stop the kernel by simply choosing to never release the
lock.

this would be a trivial DOS from userspace.


You are confusing userspace with user tasks.  And not only that,
you often use the term userspace when you should say user mode.

If you want I can explain the differences.


please do, I have been treating all three as the same catagory.


now, if you are talking instead about the fact that when userspace makes a
system call, the execution of that system call involves aquiring locks
that are released before the system call completes you have a very
different situation.


That is exactly what I have been talking about.  It may be different
from what you _thought_, but it's not different from what I actually
_said_.


Ok, I did misunderstand you. it sound slike all you need to do to make 
sure that locks are not held is to allow system calls to return before 
trying to do the suspend/kexec/etc. that sounds like not only a trivial 
thing to do, but something that would probably be done anyway.


although syscalls that then call out to userspace tasks before they can 
complete cause potential deadlocks (without that issue you can just wait 
until all syscalls have returned, and not allow anything to issue new 
syscalls) is this the issue that's killing FUSE+suspend?



if you have locks that are held across system calls then you should
already have problems. becouse you can't count on userspace ever taking
whatever action is appropriate to release the lock.

what am I missing that concerns you so much?


Here's what you are missing:

The new kexec approach eliminates the freezer and relies instead on the
fact that none of the tasks in the original kernel can execute while
the new kexec'd kernel is running.  This means the new kernel can write
out a memory image with no fear of interference or corruption.


correct


But it also means that tasks which otherwise would have been frozen are
actually free to run before the kexec call is made (and after the call
returns, if the kexec'd kernel returns back to the original kernel).
Any driver which was written with the assumption that tasks would be
frozen at those times will need to be changed.


here is where you loose me.

why should jumping back to the original kernel immedialty start running 
these processes? the process of doing a kexec requires things to happen in 
the drivers before normal activity can happen, so there is a phase in 
there where the kernel being jumped to has drivers initializing, but still 
does not allow anything else to run. why can't this phase be extended to 
allow for the possibility of transitioning these drivers to a sleep mode 
instead of to full operation?



For example, drivers know that they have to quiesce their device in
preparation for creating the memory snapshot.  But they assume that no
I/O requests will be made while the device is quiesced (because no user
task is capable of generating an I/O request if they are all frozen),
so the driver doesn't try to prevent such requests from reactivating
the device.

The situation as regards locking is harder to discuss since I don't
know of any code examples to use as a guide.  The fact remains that if
user tasks aren't frozen then they can make system calls, and while
running in kernel mode they can acquire locks, which might cause
problems -- even though I can't identify any definite examples.


yes, if userspace is running jobs and submitting I/O and system calls 
while drivers are trying to initalize there is a big problem, but I am 
missing the reason this must be the case.



Because of these problems, it's too early to start trying to use kexec
to avoid the need for the freezer.

Of course, exactly the same possible problems exist when one tries to
remove the freezer from suspend-to-RAM.  It has nothing to do with
kexec in particular (and certainly nothing to do with ACPI).


the part of the freezer that everyone is trying to eliminate is the 
exceptions (freeze everything except X,Y,Z becouse we will need to use 
those later for A)



having read through Documentation/power/devices.txt I remain convinced
that you are making a fundamental mistake.

you are designing a system


I'm not designing anything!  _You_ are.  I'm merely pointing out
problems in your design which you haven't considered.


a better way of phrasing what I meant goes more along the lines of 'the 
current design of the system...'



 that will only work if everything (every
driver, every state transition) participates fully in the process at all
times. You started with the facts 'this is the info 

Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread david

On Sun, 22 Jul 2007, Alan Stern wrote:


On Sun, 22 Jul 2007, Miklos Szeredi wrote:


The only thing to do is what Rafael has been working on: unfreeze
things, hope the tasks sort themselves out, and try again.


Have we some proof, that this will untangle the freezing tasks in a
limited time?  Or will it just make the problem harder to trigger?


Of course there's no proof.  Just the opposite -- if things get hung up
the first time, they might get hung up the second time.  And the
third...

But it ought to make the problem harder to trigger.  For the present
that's a worthwhile improvement.


it gives the system more tries to find a spot in time where the deadlock 
doesn't happen, if you find one you can continue.


but even if things keep getting hung up, at least you are backing out of 
each try safely and can eventually tell the user I give up, try shutting 
some things down and suspending again


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Nigel Cunningham
Hi Alan.

On Monday 23 July 2007 01:26:23 Alan Stern wrote:
 On Sun, 22 Jul 2007, Nigel Cunningham wrote:
 
  Hi.
  
  On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
   It seems that you could still potentially get a failure to freeze if one
   FUSE process depends on another, and the one that is frozen second just
   happens to be waiting on the one that is frozen first when it is frozen.
   I admit that this situation is unlikely, and perhaps acceptable.
   
   A larger concern is that it seems that freezing FUSE processes at all
   _will_ generate deadlocks if a non-synchronous or memory-map-supporting
   filesystem is loopback mounted from a FUSE filesystem.  In that case, if
   you attempt to sync or free memory once FUSE is frozen, you are sure to
   get a deadlock.
  
  Ok. So then (in response to Alan too), how about keeping a tree of mounts, 
  akin to the device tree, and working from the deepest nodes up? (In 
  conjunction with what I already suggested)?
 
 Face it, Nigel, this is a losing battle.  You can try to come up with
 ever-more complex schemes to try and force FUSE into the freezer's
 framework, but it just won't fit.  Or if it does, the next filesystem
 to come along will require an even more baroque type of special-case 
 handling.

It does seem to be a losing battle, but I'm wondering whether that's really 
because it's an intractable problem, or because people have given up on it 
before its time. We are talking about a computer system, so things should be 
predictable.
 
 The general problem is that task A may be in an unfreezable state,
 waiting for task B to do something, while task B is already frozen.  
 Since there's no reasonable way to determine that A really is waiting
 for B, you're just stuck.  (To make matters worse, A may not even
 realize which task it is waiting for; it may know only that it's
 waiting for somebody to do something!)  A and B could be user tasks, 
 kernel threads, or one of each.

I guess I want to persist because all of these issues aren't utterly 
unsolvable. It's just that we don't have the infrastructure yet to figure out 
the solutions to these issues trivially. Take, for example, the locking 
issue. If we could call some function to say What process holds this lock?, 
then task A could know that it's waiting on task B and put that information 
somewhere. We could then use the information to freeze task B before task A.

 
 The only thing to do is what Rafael has been working on: unfreeze
 things, hope the tasks sort themselves out, and try again.

That's what I'm questioning. Is there a more reliable way and we've just given 
up too quickly?

Regards,

Nigel
-- 
Nigel Cunningham
Christian Reformed Church of Cobden
103 Curdie Street, Cobden 3266, Victoria, Australia
Ph. +61 3 5595 1185 / +61 417 100 574
Communal Worship: 11 am Sunday.


pgpVAIGM5vqnS.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Rafael J. Wysocki
Hi,

On Monday, 23 July 2007 00:42, Nigel Cunningham wrote:
 Hi Alan.
 
 On Monday 23 July 2007 01:26:23 Alan Stern wrote:
  On Sun, 22 Jul 2007, Nigel Cunningham wrote:
  
   Hi.
   
   On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
It seems that you could still potentially get a failure to freeze if one
FUSE process depends on another, and the one that is frozen second just
happens to be waiting on the one that is frozen first when it is frozen.
I admit that this situation is unlikely, and perhaps acceptable.

A larger concern is that it seems that freezing FUSE processes at all
_will_ generate deadlocks if a non-synchronous or memory-map-supporting
filesystem is loopback mounted from a FUSE filesystem.  In that case, if
you attempt to sync or free memory once FUSE is frozen, you are sure to
get a deadlock.
   
   Ok. So then (in response to Alan too), how about keeping a tree of 
   mounts, 
   akin to the device tree, and working from the deepest nodes up? (In 
   conjunction with what I already suggested)?
  
  Face it, Nigel, this is a losing battle.  You can try to come up with
  ever-more complex schemes to try and force FUSE into the freezer's
  framework, but it just won't fit.  Or if it does, the next filesystem
  to come along will require an even more baroque type of special-case 
  handling.
 
 It does seem to be a losing battle, but I'm wondering whether that's really 
 because it's an intractable problem, or because people have given up on it 
 before its time. We are talking about a computer system, so things should be 
 predictable.
  
  The general problem is that task A may be in an unfreezable state,
  waiting for task B to do something, while task B is already frozen.  
  Since there's no reasonable way to determine that A really is waiting
  for B, you're just stuck.  (To make matters worse, A may not even
  realize which task it is waiting for; it may know only that it's
  waiting for somebody to do something!)  A and B could be user tasks, 
  kernel threads, or one of each.
 
 I guess I want to persist because all of these issues aren't utterly 
 unsolvable. It's just that we don't have the infrastructure yet to figure out 
 the solutions to these issues trivially. Take, for example, the locking 
 issue. If we could call some function to say What process holds this lock?, 
 then task A could know that it's waiting on task B and put that information 
 somewhere. We could then use the information to freeze task B before task A.
 
  
  The only thing to do is what Rafael has been working on: unfreeze
  things, hope the tasks sort themselves out, and try again.
 
 That's what I'm questioning. Is there a more reliable way and we've just 
 given 
 up too quickly?

Well, there probably is one, but it likely would require us to make changes
that wouldn't be accepted by some people and thus would never be merged.

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Nigel Cunningham
On Monday 23 July 2007 09:09:21 Rafael J. Wysocki wrote:
 Hi,
 
 On Monday, 23 July 2007 00:42, Nigel Cunningham wrote:
  Hi Alan.
  
  On Monday 23 July 2007 01:26:23 Alan Stern wrote:
   On Sun, 22 Jul 2007, Nigel Cunningham wrote:
   
Hi.

On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
 It seems that you could still potentially get a failure to freeze if 
one
 FUSE process depends on another, and the one that is frozen second 
just
 happens to be waiting on the one that is frozen first when it is 
frozen.
 I admit that this situation is unlikely, and perhaps acceptable.
 
 A larger concern is that it seems that freezing FUSE processes at 
all
 _will_ generate deadlocks if a non-synchronous or 
memory-map-supporting
 filesystem is loopback mounted from a FUSE filesystem.  In that 
case, if
 you attempt to sync or free memory once FUSE is frozen, you are sure 
to
 get a deadlock.

Ok. So then (in response to Alan too), how about keeping a tree of 
mounts, 
akin to the device tree, and working from the deepest nodes up? (In 
conjunction with what I already suggested)?
   
   Face it, Nigel, this is a losing battle.  You can try to come up with
   ever-more complex schemes to try and force FUSE into the freezer's
   framework, but it just won't fit.  Or if it does, the next filesystem
   to come along will require an even more baroque type of special-case 
   handling.
  
  It does seem to be a losing battle, but I'm wondering whether that's 
really 
  because it's an intractable problem, or because people have given up on it 
  before its time. We are talking about a computer system, so things should 
be 
  predictable.
   
   The general problem is that task A may be in an unfreezable state,
   waiting for task B to do something, while task B is already frozen.  
   Since there's no reasonable way to determine that A really is waiting
   for B, you're just stuck.  (To make matters worse, A may not even
   realize which task it is waiting for; it may know only that it's
   waiting for somebody to do something!)  A and B could be user tasks, 
   kernel threads, or one of each.
  
  I guess I want to persist because all of these issues aren't utterly 
  unsolvable. It's just that we don't have the infrastructure yet to figure 
out 
  the solutions to these issues trivially. Take, for example, the locking 
  issue. If we could call some function to say What process holds this 
lock?, 
  then task A could know that it's waiting on task B and put that 
information 
  somewhere. We could then use the information to freeze task B before task 
A.
  
   
   The only thing to do is what Rafael has been working on: unfreeze
   things, hope the tasks sort themselves out, and try again.
  
  That's what I'm questioning. Is there a more reliable way and we've just 
given 
  up too quickly?
 
 Well, there probably is one, but it likely would require us to make changes
 that wouldn't be accepted by some people and thus would never be merged.

Well, doesn't that imply that we should at least look into what changes would 
be needed? If they wouldn't be accepted by some people, then either the 
objections would be reasonable or they wouldn't (and would hopefully be 
overridden). But we can't know if we don't try.

Regards,

Nigel

-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgptv0SjDRopM.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Paul Mackerras
Nigel Cunningham writes:

 I guess I want to persist because all of these issues aren't utterly
 unsolvable. It's just that we don't have the infrastructure yet to
 figure out the solutions to these issues trivially. Take, for example,

Ever heard of the halting problem? :)  It's not just a matter of
infrastructure.  You very quickly get into questions that are
mathematically undecideable.

 the locking issue. If we could call some function to say What process
 holds this lock?, then task A could know that it's waiting on task B
 and put that information somewhere. We could then use the information
 to freeze task B before task A.

But how would that help?  If task B holds the lock, then we can't
freeze it until it's released the lock.  Then the question is, what
does task B need in order to get to the point where it releases the
lock?  And so on.  It rapidly gets not just extremely messy, but
actually impossible to compute in general.

Paul.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread Nigel Cunningham
Hi.

On Monday 23 July 2007 10:04:43 Paul Mackerras wrote:
 Nigel Cunningham writes:
 
  I guess I want to persist because all of these issues aren't utterly
  unsolvable. It's just that we don't have the infrastructure yet to
  figure out the solutions to these issues trivially. Take, for example,
 
 Ever heard of the halting problem? :)  It's not just a matter of
 infrastructure.  You very quickly get into questions that are
 mathematically undecideable.

Is this the halting problem, though?

  the locking issue. If we could call some function to say What process
  holds this lock?, then task A could know that it's waiting on task B
  and put that information somewhere. We could then use the information
  to freeze task B before task A.
 
 But how would that help?  If task B holds the lock, then we can't
 freeze it until it's released the lock.  Then the question is, what
 does task B need in order to get to the point where it releases the
 lock?  And so on.  It rapidly gets not just extremely messy, but
 actually impossible to compute in general.

Take a step back for a second.

The problem we're facing now is that we're getting some userspace threads, 
used in processing I/O, that are functioning as exceptions to the freeze 
userspace, then freezeable kernel threads rule. They are only exceptions 
because of that role in processing I/O - because they're de facto kernel 
threads. So, if we orient our thinking more in terms of I/O processing and 
less in terms of the userspace/kernelspace distinction, we'll have a 
solution:

1) Freeze processes that aren't fs related (ie stop them generating I/O).
2) Flush pending I/O.
3) Freeze filesystems in reverse order of dependency, the primary purpose 
being to stop them generating further I/O on their metadata.

Locks that are being held are only being held because work is being done. If 
we progressively focus on threads in terms of their create/process work 
dependencies, we'll see that the problem isn't at all intractable.

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgpjTSNWacYUf.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-22 Thread david

On Mon, 23 Jul 2007, Nigel Cunningham wrote:


Hi Alan.

On Monday 23 July 2007 01:26:23 Alan Stern wrote:

On Sun, 22 Jul 2007, Nigel Cunningham wrote:


Hi.

On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:

It seems that you could still potentially get a failure to freeze if one
FUSE process depends on another, and the one that is frozen second just
happens to be waiting on the one that is frozen first when it is frozen.
I admit that this situation is unlikely, and perhaps acceptable.

A larger concern is that it seems that freezing FUSE processes at all
_will_ generate deadlocks if a non-synchronous or memory-map-supporting
filesystem is loopback mounted from a FUSE filesystem.  In that case, if
you attempt to sync or free memory once FUSE is frozen, you are sure to
get a deadlock.


Ok. So then (in response to Alan too), how about keeping a tree of mounts,
akin to the device tree, and working from the deepest nodes up? (In
conjunction with what I already suggested)?


Face it, Nigel, this is a losing battle.  You can try to come up with
ever-more complex schemes to try and force FUSE into the freezer's
framework, but it just won't fit.  Or if it does, the next filesystem
to come along will require an even more baroque type of special-case
handling.


It does seem to be a losing battle, but I'm wondering whether that's really
because it's an intractable problem, or because people have given up on it
before its time. We are talking about a computer system, so things should be
predictable.


The general problem is that task A may be in an unfreezable state,
waiting for task B to do something, while task B is already frozen.
Since there's no reasonable way to determine that A really is waiting
for B, you're just stuck.  (To make matters worse, A may not even
realize which task it is waiting for; it may know only that it's
waiting for somebody to do something!)  A and B could be user tasks,
kernel threads, or one of each.


I guess I want to persist because all of these issues aren't utterly
unsolvable. It's just that we don't have the infrastructure yet to figure out
the solutions to these issues trivially. Take, for example, the locking
issue. If we could call some function to say What process holds this lock?,
then task A could know that it's waiting on task B and put that information
somewhere. We could then use the information to freeze task B before task A.



this sounds like the standard priority inversion problem taken to 
extremes. Ingo has been working this issue, but IIRC the problem is that 
tracking what owns the lock so that you can get that thing to run ends up 
being enough overhead that it's not acceptable in the general case.


David Lang


The only thing to do is what Rafael has been working on: unfreeze
things, hope the tasks sort themselves out, and try again.


That's what I'm questioning. Is there a more reliable way and we've just given
up too quickly?

Regards,

Nigel


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread david

On Sat, 21 Jul 2007, Alan Stern wrote:


On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote:


How would you prevent tasks from being scheduled?  How would you
prevent drivers from deadlocking because in order to put their device
in a low-power state they need to acquire a lock which is held by a
user task?


you give up on the suspend becouse you have no way of getting the user
task to give up the lock.


Once the deadlock has occurred it's too late.  You can't give up; in
fact you can't do anything at all.  The system has hung.


however, kernel locks should not be held by user tasks, user tasks are not
expected to behave in rational ways, allowing them to compete with kernel
tasks for locks is a sure way to get a deadlock or indefinate stall.


What on Earth are you talking about?  "Kernel locks should not be held
by user tasks"?  Then who _should_ hold them?  You are aware, I hope,
that down() and mutex_lock() can be called only in process context?


what locks are accessed this way?


Lots of them.  For example, most drivers won't want a suspend to occur
right in the middle of an I/O transfer.  To prevent this, the driver
might use a mutex.  The task doing the I/O (which will be a user task)
acquires the mutex during a transfer and the suspend routine acquires
the mutex while quiescing the device.


wait a min her, it's possible we are misunderstanding each other.

as I see it.

if userspace can aquire locks that prevent the kernel from shutting off 
(or doing anything else in particular) then it's possible for misbehaving 
userspace code to stop the kernel by simply choosing to never release the 
lock.


this would be a trivial DOS from userspace.

now, if you are talking instead about the fact that when userspace makes a 
system call, the execution of that system call involves aquiring locks 
that are released before the system call completes you have a very 
different situation.


if you have locks that are held across system calls then you should 
already have problems. becouse you can't count on userspace ever taking 
whatever action is appropriate to release the lock.


what am I missing that concerns you so much?


Does it really (fundamentally) require scheduling tasks, particularly in
the case that the devices have already been put in the "quiesced" state?


I can't say for sure.  That's the way we have been doing it.  It
wouldn't be easy to change, because the driver would have to busy-wait
during delays -- which would mean it would need to use different code
for system-wide suspend and runtime suspend.


please define terms so that we are all on the same page


Please read Documentation/power/devices.txt.


I have done so.


what do you mean by
system-wide suspend


That's what you would call standby, suspend-to-RAM, or hibernate.  The
entire system goes to sleep.


runtime suspend


That's when an individual device is placed in a low-power state to
save energy while it isn't being used.  The system as a whole remains
awake and the device will be resumed the next time it is needed for
anything.


thanks for the defintitions.

having read through Documentation/power/devices.txt I remain convinced 
that you are making a fundamental mistake.


you are designing a system that will only work if everything (every 
driver, every state transition) participates fully in the process at all 
times. You started with the facts 'this is the info that ACPI provides and 
this is how it is designed to be used' and worked from there instead of 
looking to see what the kernel really needed and figuring how to provide a 
good interface for that that happens to be implemented (today) with ACPI. 
(a proper power management framework shouldn't care if you have ACPI, APM, 
or some other method of controlling the devices)


this leads to resume functions that can only work if the proper suspend 
function was called rather then makeing 'resume' just mean 'go to full 
operation', which is the same thing that gets called when the device is 
first initialized. internally it can examine the hardware and follow 
different paths depending on what it finds the current state of the 
hardware is, but the outside world (including the rest of the kernel) 
should not care. the fact that the rest of the kernel needs to know if it 
should call 'resume' or 'initialize' is a failure in the abstraction.


in fact, a better abstraction would be something like

report_power_modes
  which would return a series of modes (sorted only by modeID)
  modeID, %power_used_in_this_mode, %capability_in_this_mode
  (I would make mode 0 always be complete power off, and mode 1 always be 
full capacity)


report_power_mode_speed
  which would return a matrix giving how long it takes to transition from 
any mode to any other mode. this should be a relative number, not an 
absolute number since it will be different at different clock speeds.


set_operational_mode(modeID)
  which would take you from whatever mode you are in now to the requested 
mode.



Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread david

On Sun, 22 Jul 2007, Huang, Ying wrote:


On Fri, 2007-07-20 at 08:48 -0700, [EMAIL PROTECTED] wrote:

Backuping target memory before kexec and restoring it after kexec is
planed feature for kexec jump. But I will work on image writing/reading
first.


if we can get a list of what memory is safe to backup/restore then the
reading/writing of the image should be able to be done in userspace.


The backup/restore here has nothing to do with the read/write of the
image. It means instead of preserving memory for a new kernel like that
of crash-dump, the memory for a new kernel is backupped before kexec and
restored after kexec by the kexec kernel.


Ok, I see the miscommunication here. you are talking about freeing up 
memory for the second kernel instead of reserving it from boot time.


I'm talking about getting the second kernel a list of what memory pages it 
should write to the image


if we can get the info for the list I'm looking for we should be able to 
demonstrate the kexec based hibernate.


the change you are talking about in an enhancment that is useful after 
that point to save some memory.



If the "scatter copy" is replaced by "scatter swap", we need not the
inverse list, and the state of kexeced kernel can be backuped too. There
are "scatter copy" support in normal kexec implementation in
"relocate_kernel".


what do you mean by "scatter swap"


copy:   dest=src
swap:   tmp=dest; dest=src; src=tmp

If memory is swapped, no information is lost, both that of kexec kernel
and kexeced kernel.


I'm missing why you need to preserve this memory

if you are talking about memory that will be used by the second kernel 
when you kexec to it then you don't need to preserve it (since it will be 
overwritten by the second kernel). if you aren't talking about memory that 
will be used by the second kernel why do you need to move it?


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Huang, Ying
On Fri, 2007-07-20 at 08:48 -0700, [EMAIL PROTECTED] wrote:
> > Backuping target memory before kexec and restoring it after kexec is
> > planed feature for kexec jump. But I will work on image writing/reading
> > first.
> 
> if we can get a list of what memory is safe to backup/restore then the 
> reading/writing of the image should be able to be done in userspace.

The backup/restore here has nothing to do with the read/write of the
image. It means instead of preserving memory for a new kernel like that
of crash-dump, the memory for a new kernel is backupped before kexec and
restored after kexec by the kexec kernel.

> > If the "scatter copy" is replaced by "scatter swap", we need not the
> > inverse list, and the state of kexeced kernel can be backuped too. There
> > are "scatter copy" support in normal kexec implementation in
> > "relocate_kernel".
> 
> what do you mean by "scatter swap"

copy:   dest=src
swap:   tmp=dest; dest=src; src=tmp

If memory is swapped, no information is lost, both that of kexec kernel
and kexeced kernel.

Best Regards,
Huang, Ying
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Nigel Cunningham
Hi.

On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
> It seems that you could still potentially get a failure to freeze if one
> FUSE process depends on another, and the one that is frozen second just
> happens to be waiting on the one that is frozen first when it is frozen.
> I admit that this situation is unlikely, and perhaps acceptable.
> 
> A larger concern is that it seems that freezing FUSE processes at all
> _will_ generate deadlocks if a non-synchronous or memory-map-supporting
> filesystem is loopback mounted from a FUSE filesystem.  In that case, if
> you attempt to sync or free memory once FUSE is frozen, you are sure to
> get a deadlock.

Ok. So then (in response to Alan too), how about keeping a tree of mounts, 
akin to the device tree, and working from the deepest nodes up? (In 
conjunction with what I already suggested)?

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgp1DubIOeAxL.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Nigel Cunningham
Hi.

On Sunday 22 July 2007 04:12:22 Miklos Szeredi wrote:
> > It seems that you could still potentially get a failure to freeze if one
> > FUSE process depends on another, and the one that is frozen second just
> > happens to be waiting on the one that is frozen first when it is frozen.
> > I admit that this situation is unlikely, and perhaps acceptable.
> 
> It isn't all that unlikely.  There's sshfs for example, that depends
> on a separate ssh process for transport.
> 
> Oh, there are also userspace network transports, like tun/tap,
> nfqueue, etc.  They could block any network filesystem (not just fuse)
> if frozen first, making the freezer fail.
> 
> Hmm, wonder why this isn't affecting people with VPNs?  Probably
> network mounts over VPN are rare, and ever rarer to have fs activity
> on them during suspend.
> 
> Anyway, I think it's long overdue to stop thinking about how to "fix"
> fuse, and concentrate on fixing the underlying problem instead ;)

That's what I'm seeking to do :)

> > A larger concern is that it seems that freezing FUSE processes at all
> > _will_ generate deadlocks if a non-synchronous or memory-map-supporting
> > filesystem is loopback mounted from a FUSE filesystem.  In that case, if
> > you attempt to sync or free memory once FUSE is frozen, you are sure to
> > get a deadlock.
> 
> Well, it would deadlock, if
> 
>  a) memory reclaim was synchronous, or
>  b) large part of the memory was used for dirty file data

These are problems in normal operation, aren't they?
 
> I can't remember if (a) was ever true.  And now the dirty ratio is 10%
> by default, so if we go OOM because that 10% can't be reclaimed, there
> is a more serious problem.
> 
> Swap over loop over fuse would be problematic, but that won't work for
> some time yet ;)

Hopefully people will wake up to the problems with Fuse and get rid of it 
before then :|. Of course I don't really expect that to happen.

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgpOpIxpZQh0t.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Rafael J. Wysocki
On Saturday, 21 July 2007 20:12, Miklos Szeredi wrote:
> > It seems that you could still potentially get a failure to freeze if one
> > FUSE process depends on another, and the one that is frozen second just
> > happens to be waiting on the one that is frozen first when it is frozen.
> > I admit that this situation is unlikely, and perhaps acceptable.
> 
> It isn't all that unlikely.  There's sshfs for example, that depends
> on a separate ssh process for transport.
> 
> Oh, there are also userspace network transports, like tun/tap,
> nfqueue, etc.  They could block any network filesystem (not just fuse)
> if frozen first, making the freezer fail.
> 
> Hmm, wonder why this isn't affecting people with VPNs?  Probably
> network mounts over VPN are rare, and ever rarer to have fs activity
> on them during suspend.
> 
> Anyway, I think it's long overdue to stop thinking about how to "fix"
> fuse, and concentrate on fixing the underlying problem instead ;)

To conclude this branch of the thread, I have a patch in the works that may
help a bit with unfreezable FUSE filesystems and it only affects the freezer.
I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches
that need to go first.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Miklos Szeredi
> It seems that you could still potentially get a failure to freeze if one
> FUSE process depends on another, and the one that is frozen second just
> happens to be waiting on the one that is frozen first when it is frozen.
> I admit that this situation is unlikely, and perhaps acceptable.

It isn't all that unlikely.  There's sshfs for example, that depends
on a separate ssh process for transport.

Oh, there are also userspace network transports, like tun/tap,
nfqueue, etc.  They could block any network filesystem (not just fuse)
if frozen first, making the freezer fail.

Hmm, wonder why this isn't affecting people with VPNs?  Probably
network mounts over VPN are rare, and ever rarer to have fs activity
on them during suspend.

Anyway, I think it's long overdue to stop thinking about how to "fix"
fuse, and concentrate on fixing the underlying problem instead ;)

> A larger concern is that it seems that freezing FUSE processes at all
> _will_ generate deadlocks if a non-synchronous or memory-map-supporting
> filesystem is loopback mounted from a FUSE filesystem.  In that case, if
> you attempt to sync or free memory once FUSE is frozen, you are sure to
> get a deadlock.

Well, it would deadlock, if

 a) memory reclaim was synchronous, or
 b) large part of the memory was used for dirty file data

I can't remember if (a) was ever true.  And now the dirty ratio is 10%
by default, so if we go OOM because that 10% can't be reclaimed, there
is a more serious problem.

Swap over loop over fuse would be problematic, but that won't work for
some time yet ;)

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Jeremy Maitin-Shepard
It seems that you could still potentially get a failure to freeze if one
FUSE process depends on another, and the one that is frozen second just
happens to be waiting on the one that is frozen first when it is frozen.
I admit that this situation is unlikely, and perhaps acceptable.

A larger concern is that it seems that freezing FUSE processes at all
_will_ generate deadlocks if a non-synchronous or memory-map-supporting
filesystem is loopback mounted from a FUSE filesystem.  In that case, if
you attempt to sync or free memory once FUSE is frozen, you are sure to
get a deadlock.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Alan Stern
On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote:

> > How would you prevent tasks from being scheduled?  How would you
> > prevent drivers from deadlocking because in order to put their device
> > in a low-power state they need to acquire a lock which is held by a
> > user task?
> 
> you give up on the suspend becouse you have no way of getting the user 
> task to give up the lock.

Once the deadlock has occurred it's too late.  You can't give up; in 
fact you can't do anything at all.  The system has hung.

> however, kernel locks should not be held by user tasks, user tasks are not 
> expected to behave in rational ways, allowing them to compete with kernel 
> tasks for locks is a sure way to get a deadlock or indefinate stall.

What on Earth are you talking about?  "Kernel locks should not be held 
by user tasks"?  Then who _should_ hold them?  You are aware, I hope, 
that down() and mutex_lock() can be called only in process context?

> what locks are accessed this way?

Lots of them.  For example, most drivers won't want a suspend to occur
right in the middle of an I/O transfer.  To prevent this, the driver
might use a mutex.  The task doing the I/O (which will be a user task)
acquires the mutex during a transfer and the suspend routine acquires
the mutex while quiescing the device.

> >> Does it really (fundamentally) require scheduling tasks, particularly in
> >> the case that the devices have already been put in the "quiesced" state?
> >
> > I can't say for sure.  That's the way we have been doing it.  It
> > wouldn't be easy to change, because the driver would have to busy-wait
> > during delays -- which would mean it would need to use different code
> > for system-wide suspend and runtime suspend.
> 
> please define terms so that we are all on the same page

Please read Documentation/power/devices.txt.

> what do you mean by
> system-wide suspend

That's what you would call standby, suspend-to-RAM, or hibernate.  The
entire system goes to sleep.

> runtime suspend

That's when an individual device is placed in a low-power state to 
save energy while it isn't being used.  The system as a whole remains 
awake and the device will be resumed the next time it is needed for 
anything.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Alan Stern
On Sat, 21 Jul 2007, Nigel Cunningham wrote:

> What am I missing in the following suggested solution?
> 
> 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, 
> which, 
> when set, causes a  userspace process to be frozen with kernel threads 
> instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, 
> sync and then freeze TIF_LATEFREEZE and freezable kernel threads.
> 
> 2) In the fuse code, the PID of the process that will do the work gets passed 
> to the fuse kernel code when the mount is done. The kernel code sets the 
> TIF_LATEFREEZE flag, and resets it on umount.

What happens when one FUSE filesystem makes use of another?  You'll 
still end up with unfreezable processes, except that now you won't 
detect them until the LATEFREEZE stage.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Nigel Cunningham
Hi.

On Saturday 21 July 2007 21:44:32 Miklos Szeredi wrote:
> > The problem with FUSE is related to the fact that the freezer can't
> > freeze uninterruptible tasks and we said that perhaps we might avoid
> > it if FUSE was made freezing-aware.  Still, no one has gone in this
> > direction and I don't know of any plans to do that.
> 
> I thought we have fully explored this direction.  Lots of emails, and
> an IRC session with Pavel.  Conclusion:

What am I missing in the following suggested solution?

1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, which, 
when set, causes a  userspace process to be frozen with kernel threads 
instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, 
sync and then freeze TIF_LATEFREEZE and freezable kernel threads.

2) In the fuse code, the PID of the process that will do the work gets passed 
to the fuse kernel code when the mount is done. The kernel code sets the 
TIF_LATEFREEZE flag, and resets it on umount.

Sorry, but this is a hit-and-run email - I'm off to bed now.

Regards,

Nigel


pgpvN1gXBPnTE.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Miklos Szeredi
> The problem with FUSE is related to the fact that the freezer can't
> freeze uninterruptible tasks and we said that perhaps we might avoid
> it if FUSE was made freezing-aware.  Still, no one has gone in this
> direction and I don't know of any plans to do that.

I thought we have fully explored this direction.  Lots of emails, and
an IRC session with Pavel.  Conclusion:

 - It can't be done without VFS surgery + adding various hacks to fuse

 - VFS surgery for the sake of a working suspend is not realistic

Although removing the freezer seems the cleanest solution, I'm not
saying the freezer can't be fixed up in the mean time.

Allowing tasks to remain in uninterruptible sleep seemed a nice way to
get around the fuse issues.  What was the problem with that patch?  It
was something that was supposed to have been tested in suspend2,
wasn't it?

The other one (trying to wake up task, so that may make other tasks
freezable) didn't seem such a good approach to me.

The theory is quite simple: while and after suspending devices, no
tasks must be touching said devices.

The very cleanest way to do this is in the drivers.  The very simplest
way is the current freezer.  But may be there are possibilities
between these two extremes.

But I can almost guarantee you, that any attempt at fixing the issues
though fuse will just result in an even bigger mess than what we
currently have.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Miklos Szeredi
 The problem with FUSE is related to the fact that the freezer can't
 freeze uninterruptible tasks and we said that perhaps we might avoid
 it if FUSE was made freezing-aware.  Still, no one has gone in this
 direction and I don't know of any plans to do that.

I thought we have fully explored this direction.  Lots of emails, and
an IRC session with Pavel.  Conclusion:

 - It can't be done without VFS surgery + adding various hacks to fuse

 - VFS surgery for the sake of a working suspend is not realistic

Although removing the freezer seems the cleanest solution, I'm not
saying the freezer can't be fixed up in the mean time.

Allowing tasks to remain in uninterruptible sleep seemed a nice way to
get around the fuse issues.  What was the problem with that patch?  It
was something that was supposed to have been tested in suspend2,
wasn't it?

The other one (trying to wake up task, so that may make other tasks
freezable) didn't seem such a good approach to me.

The theory is quite simple: while and after suspending devices, no
tasks must be touching said devices.

The very cleanest way to do this is in the drivers.  The very simplest
way is the current freezer.  But may be there are possibilities
between these two extremes.

But I can almost guarantee you, that any attempt at fixing the issues
though fuse will just result in an even bigger mess than what we
currently have.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Nigel Cunningham
Hi.

On Saturday 21 July 2007 21:44:32 Miklos Szeredi wrote:
  The problem with FUSE is related to the fact that the freezer can't
  freeze uninterruptible tasks and we said that perhaps we might avoid
  it if FUSE was made freezing-aware.  Still, no one has gone in this
  direction and I don't know of any plans to do that.
 
 I thought we have fully explored this direction.  Lots of emails, and
 an IRC session with Pavel.  Conclusion:

What am I missing in the following suggested solution?

1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, which, 
when set, causes a  userspace process to be frozen with kernel threads 
instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, 
sync and then freeze TIF_LATEFREEZE and freezable kernel threads.

2) In the fuse code, the PID of the process that will do the work gets passed 
to the fuse kernel code when the mount is done. The kernel code sets the 
TIF_LATEFREEZE flag, and resets it on umount.

Sorry, but this is a hit-and-run email - I'm off to bed now.

Regards,

Nigel


pgpvN1gXBPnTE.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Alan Stern
On Sat, 21 Jul 2007, Nigel Cunningham wrote:

 What am I missing in the following suggested solution?
 
 1) In the freezer code, we implement a new TIF_LATEFREEZE process flag, 
 which, 
 when set, causes a  userspace process to be frozen with kernel threads 
 instead of with userspace ones. When freezing, we freezing !TIF_LATEFREEZE, 
 sync and then freeze TIF_LATEFREEZE and freezable kernel threads.
 
 2) In the fuse code, the PID of the process that will do the work gets passed 
 to the fuse kernel code when the mount is done. The kernel code sets the 
 TIF_LATEFREEZE flag, and resets it on umount.

What happens when one FUSE filesystem makes use of another?  You'll 
still end up with unfreezable processes, except that now you won't 
detect them until the LATEFREEZE stage.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Alan Stern
On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote:

  How would you prevent tasks from being scheduled?  How would you
  prevent drivers from deadlocking because in order to put their device
  in a low-power state they need to acquire a lock which is held by a
  user task?
 
 you give up on the suspend becouse you have no way of getting the user 
 task to give up the lock.

Once the deadlock has occurred it's too late.  You can't give up; in 
fact you can't do anything at all.  The system has hung.

 however, kernel locks should not be held by user tasks, user tasks are not 
 expected to behave in rational ways, allowing them to compete with kernel 
 tasks for locks is a sure way to get a deadlock or indefinate stall.

What on Earth are you talking about?  Kernel locks should not be held 
by user tasks?  Then who _should_ hold them?  You are aware, I hope, 
that down() and mutex_lock() can be called only in process context?

 what locks are accessed this way?

Lots of them.  For example, most drivers won't want a suspend to occur
right in the middle of an I/O transfer.  To prevent this, the driver
might use a mutex.  The task doing the I/O (which will be a user task)
acquires the mutex during a transfer and the suspend routine acquires
the mutex while quiescing the device.

  Does it really (fundamentally) require scheduling tasks, particularly in
  the case that the devices have already been put in the quiesced state?
 
  I can't say for sure.  That's the way we have been doing it.  It
  wouldn't be easy to change, because the driver would have to busy-wait
  during delays -- which would mean it would need to use different code
  for system-wide suspend and runtime suspend.
 
 please define terms so that we are all on the same page

Please read Documentation/power/devices.txt.

 what do you mean by
 system-wide suspend

That's what you would call standby, suspend-to-RAM, or hibernate.  The
entire system goes to sleep.

 runtime suspend

That's when an individual device is placed in a low-power state to 
save energy while it isn't being used.  The system as a whole remains 
awake and the device will be resumed the next time it is needed for 
anything.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Jeremy Maitin-Shepard
It seems that you could still potentially get a failure to freeze if one
FUSE process depends on another, and the one that is frozen second just
happens to be waiting on the one that is frozen first when it is frozen.
I admit that this situation is unlikely, and perhaps acceptable.

A larger concern is that it seems that freezing FUSE processes at all
_will_ generate deadlocks if a non-synchronous or memory-map-supporting
filesystem is loopback mounted from a FUSE filesystem.  In that case, if
you attempt to sync or free memory once FUSE is frozen, you are sure to
get a deadlock.

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Miklos Szeredi
 It seems that you could still potentially get a failure to freeze if one
 FUSE process depends on another, and the one that is frozen second just
 happens to be waiting on the one that is frozen first when it is frozen.
 I admit that this situation is unlikely, and perhaps acceptable.

It isn't all that unlikely.  There's sshfs for example, that depends
on a separate ssh process for transport.

Oh, there are also userspace network transports, like tun/tap,
nfqueue, etc.  They could block any network filesystem (not just fuse)
if frozen first, making the freezer fail.

Hmm, wonder why this isn't affecting people with VPNs?  Probably
network mounts over VPN are rare, and ever rarer to have fs activity
on them during suspend.

Anyway, I think it's long overdue to stop thinking about how to fix
fuse, and concentrate on fixing the underlying problem instead ;)

 A larger concern is that it seems that freezing FUSE processes at all
 _will_ generate deadlocks if a non-synchronous or memory-map-supporting
 filesystem is loopback mounted from a FUSE filesystem.  In that case, if
 you attempt to sync or free memory once FUSE is frozen, you are sure to
 get a deadlock.

Well, it would deadlock, if

 a) memory reclaim was synchronous, or
 b) large part of the memory was used for dirty file data

I can't remember if (a) was ever true.  And now the dirty ratio is 10%
by default, so if we go OOM because that 10% can't be reclaimed, there
is a more serious problem.

Swap over loop over fuse would be problematic, but that won't work for
some time yet ;)

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Rafael J. Wysocki
On Saturday, 21 July 2007 20:12, Miklos Szeredi wrote:
  It seems that you could still potentially get a failure to freeze if one
  FUSE process depends on another, and the one that is frozen second just
  happens to be waiting on the one that is frozen first when it is frozen.
  I admit that this situation is unlikely, and perhaps acceptable.
 
 It isn't all that unlikely.  There's sshfs for example, that depends
 on a separate ssh process for transport.
 
 Oh, there are also userspace network transports, like tun/tap,
 nfqueue, etc.  They could block any network filesystem (not just fuse)
 if frozen first, making the freezer fail.
 
 Hmm, wonder why this isn't affecting people with VPNs?  Probably
 network mounts over VPN are rare, and ever rarer to have fs activity
 on them during suspend.
 
 Anyway, I think it's long overdue to stop thinking about how to fix
 fuse, and concentrate on fixing the underlying problem instead ;)

To conclude this branch of the thread, I have a patch in the works that may
help a bit with unfreezable FUSE filesystems and it only affects the freezer.
I'll post it when 2.6.23-rc1 is out, because it's on top of some other patches
that need to go first.

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Nigel Cunningham
Hi.

On Sunday 22 July 2007 02:13:56 Jeremy Maitin-Shepard wrote:
 It seems that you could still potentially get a failure to freeze if one
 FUSE process depends on another, and the one that is frozen second just
 happens to be waiting on the one that is frozen first when it is frozen.
 I admit that this situation is unlikely, and perhaps acceptable.
 
 A larger concern is that it seems that freezing FUSE processes at all
 _will_ generate deadlocks if a non-synchronous or memory-map-supporting
 filesystem is loopback mounted from a FUSE filesystem.  In that case, if
 you attempt to sync or free memory once FUSE is frozen, you are sure to
 get a deadlock.

Ok. So then (in response to Alan too), how about keeping a tree of mounts, 
akin to the device tree, and working from the deepest nodes up? (In 
conjunction with what I already suggested)?

Regards,

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgp1DubIOeAxL.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Nigel Cunningham
Hi.

On Sunday 22 July 2007 04:12:22 Miklos Szeredi wrote:
  It seems that you could still potentially get a failure to freeze if one
  FUSE process depends on another, and the one that is frozen second just
  happens to be waiting on the one that is frozen first when it is frozen.
  I admit that this situation is unlikely, and perhaps acceptable.
 
 It isn't all that unlikely.  There's sshfs for example, that depends
 on a separate ssh process for transport.
 
 Oh, there are also userspace network transports, like tun/tap,
 nfqueue, etc.  They could block any network filesystem (not just fuse)
 if frozen first, making the freezer fail.
 
 Hmm, wonder why this isn't affecting people with VPNs?  Probably
 network mounts over VPN are rare, and ever rarer to have fs activity
 on them during suspend.
 
 Anyway, I think it's long overdue to stop thinking about how to fix
 fuse, and concentrate on fixing the underlying problem instead ;)

That's what I'm seeking to do :)

  A larger concern is that it seems that freezing FUSE processes at all
  _will_ generate deadlocks if a non-synchronous or memory-map-supporting
  filesystem is loopback mounted from a FUSE filesystem.  In that case, if
  you attempt to sync or free memory once FUSE is frozen, you are sure to
  get a deadlock.
 
 Well, it would deadlock, if
 
  a) memory reclaim was synchronous, or
  b) large part of the memory was used for dirty file data

These are problems in normal operation, aren't they?
 
 I can't remember if (a) was ever true.  And now the dirty ratio is 10%
 by default, so if we go OOM because that 10% can't be reclaimed, there
 is a more serious problem.
 
 Swap over loop over fuse would be problematic, but that won't work for
 some time yet ;)

Hopefully people will wake up to the problems with Fuse and get rid of it 
before then :|. Of course I don't really expect that to happen.

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.


pgpOpIxpZQh0t.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread Huang, Ying
On Fri, 2007-07-20 at 08:48 -0700, [EMAIL PROTECTED] wrote:
  Backuping target memory before kexec and restoring it after kexec is
  planed feature for kexec jump. But I will work on image writing/reading
  first.
 
 if we can get a list of what memory is safe to backup/restore then the 
 reading/writing of the image should be able to be done in userspace.

The backup/restore here has nothing to do with the read/write of the
image. It means instead of preserving memory for a new kernel like that
of crash-dump, the memory for a new kernel is backupped before kexec and
restored after kexec by the kexec kernel.

  If the scatter copy is replaced by scatter swap, we need not the
  inverse list, and the state of kexeced kernel can be backuped too. There
  are scatter copy support in normal kexec implementation in
  relocate_kernel.
 
 what do you mean by scatter swap

copy:   dest=src
swap:   tmp=dest; dest=src; src=tmp

If memory is swapped, no information is lost, both that of kexec kernel
and kexeced kernel.

Best Regards,
Huang, Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread david

On Sun, 22 Jul 2007, Huang, Ying wrote:


On Fri, 2007-07-20 at 08:48 -0700, [EMAIL PROTECTED] wrote:

Backuping target memory before kexec and restoring it after kexec is
planed feature for kexec jump. But I will work on image writing/reading
first.


if we can get a list of what memory is safe to backup/restore then the
reading/writing of the image should be able to be done in userspace.


The backup/restore here has nothing to do with the read/write of the
image. It means instead of preserving memory for a new kernel like that
of crash-dump, the memory for a new kernel is backupped before kexec and
restored after kexec by the kexec kernel.


Ok, I see the miscommunication here. you are talking about freeing up 
memory for the second kernel instead of reserving it from boot time.


I'm talking about getting the second kernel a list of what memory pages it 
should write to the image


if we can get the info for the list I'm looking for we should be able to 
demonstrate the kexec based hibernate.


the change you are talking about in an enhancment that is useful after 
that point to save some memory.



If the scatter copy is replaced by scatter swap, we need not the
inverse list, and the state of kexeced kernel can be backuped too. There
are scatter copy support in normal kexec implementation in
relocate_kernel.


what do you mean by scatter swap


copy:   dest=src
swap:   tmp=dest; dest=src; src=tmp

If memory is swapped, no information is lost, both that of kexec kernel
and kexeced kernel.


I'm missing why you need to preserve this memory

if you are talking about memory that will be used by the second kernel 
when you kexec to it then you don't need to preserve it (since it will be 
overwritten by the second kernel). if you aren't talking about memory that 
will be used by the second kernel why do you need to move it?


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-21 Thread david

On Sat, 21 Jul 2007, Alan Stern wrote:


On Fri, 20 Jul 2007 [EMAIL PROTECTED] wrote:


How would you prevent tasks from being scheduled?  How would you
prevent drivers from deadlocking because in order to put their device
in a low-power state they need to acquire a lock which is held by a
user task?


you give up on the suspend becouse you have no way of getting the user
task to give up the lock.


Once the deadlock has occurred it's too late.  You can't give up; in
fact you can't do anything at all.  The system has hung.


however, kernel locks should not be held by user tasks, user tasks are not
expected to behave in rational ways, allowing them to compete with kernel
tasks for locks is a sure way to get a deadlock or indefinate stall.


What on Earth are you talking about?  Kernel locks should not be held
by user tasks?  Then who _should_ hold them?  You are aware, I hope,
that down() and mutex_lock() can be called only in process context?


what locks are accessed this way?


Lots of them.  For example, most drivers won't want a suspend to occur
right in the middle of an I/O transfer.  To prevent this, the driver
might use a mutex.  The task doing the I/O (which will be a user task)
acquires the mutex during a transfer and the suspend routine acquires
the mutex while quiescing the device.


wait a min her, it's possible we are misunderstanding each other.

as I see it.

if userspace can aquire locks that prevent the kernel from shutting off 
(or doing anything else in particular) then it's possible for misbehaving 
userspace code to stop the kernel by simply choosing to never release the 
lock.


this would be a trivial DOS from userspace.

now, if you are talking instead about the fact that when userspace makes a 
system call, the execution of that system call involves aquiring locks 
that are released before the system call completes you have a very 
different situation.


if you have locks that are held across system calls then you should 
already have problems. becouse you can't count on userspace ever taking 
whatever action is appropriate to release the lock.


what am I missing that concerns you so much?


Does it really (fundamentally) require scheduling tasks, particularly in
the case that the devices have already been put in the quiesced state?


I can't say for sure.  That's the way we have been doing it.  It
wouldn't be easy to change, because the driver would have to busy-wait
during delays -- which would mean it would need to use different code
for system-wide suspend and runtime suspend.


please define terms so that we are all on the same page


Please read Documentation/power/devices.txt.


I have done so.


what do you mean by
system-wide suspend


That's what you would call standby, suspend-to-RAM, or hibernate.  The
entire system goes to sleep.


runtime suspend


That's when an individual device is placed in a low-power state to
save energy while it isn't being used.  The system as a whole remains
awake and the device will be resumed the next time it is needed for
anything.


thanks for the defintitions.

having read through Documentation/power/devices.txt I remain convinced 
that you are making a fundamental mistake.


you are designing a system that will only work if everything (every 
driver, every state transition) participates fully in the process at all 
times. You started with the facts 'this is the info that ACPI provides and 
this is how it is designed to be used' and worked from there instead of 
looking to see what the kernel really needed and figuring how to provide a 
good interface for that that happens to be implemented (today) with ACPI. 
(a proper power management framework shouldn't care if you have ACPI, APM, 
or some other method of controlling the devices)


this leads to resume functions that can only work if the proper suspend 
function was called rather then makeing 'resume' just mean 'go to full 
operation', which is the same thing that gets called when the device is 
first initialized. internally it can examine the hardware and follow 
different paths depending on what it finds the current state of the 
hardware is, but the outside world (including the rest of the kernel) 
should not care. the fact that the rest of the kernel needs to know if it 
should call 'resume' or 'initialize' is a failure in the abstraction.


in fact, a better abstraction would be something like

report_power_modes
  which would return a series of modes (sorted only by modeID)
  modeID, %power_used_in_this_mode, %capability_in_this_mode
  (I would make mode 0 always be complete power off, and mode 1 always be 
full capacity)


report_power_mode_speed
  which would return a matrix giving how long it takes to transition from 
any mode to any other mode. this should be a relative number, not an 
absolute number since it will be different at different clock speeds.


set_operational_mode(modeID)
  which would take you from whatever mode you are in now to the requested 
mode.


most 

Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Nigel Cunningham
Hi.

On Saturday 21 July 2007 08:43:20 [EMAIL PROTECTED] wrote:
> On Fri, 20 Jul 2007, Alan Stern wrote:
> 
> > On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote:
> >
>  when doing a suspend-to-ram you get to a point where you just don't use
>  any userspace.
> >>
> >>> What do you mean?  How can you prevent user tasks from running?  That's
> >>> basically what the freezer does, and the whole point of this approach
> >>> is to eliminate the freezer.  Right?
> >>
> >> Presumably no tasks at all would be scheduled.
> >
> > How would you prevent tasks from being scheduled?  How would you
> > prevent drivers from deadlocking because in order to put their device
> > in a low-power state they need to acquire a lock which is held by a
> > user task?
> 
> you give up on the suspend becouse you have no way of getting the user 
> task to give up the lock.
> 
> however, kernel locks should not be held by user tasks, user tasks are not 
> expected to behave in rational ways, allowing them to compete with kernel 
> tasks for locks is a sure way to get a deadlock or indefinate stall.
> 
> what locks are accessed this way?

Any userspace process can do a syscall. In the process of the syscall, it can 
take kernel locks, and it can schedule (eg, while seeking to take a second 
lock).

Regards,

Nigel


pgpl7edMXgJyR.pgp
Description: PGP signature


Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread david

On Fri, 20 Jul 2007, Alan Stern wrote:


On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote:


when doing a suspend-to-ram you get to a point where you just don't use
any userspace.



What do you mean?  How can you prevent user tasks from running?  That's
basically what the freezer does, and the whole point of this approach
is to eliminate the freezer.  Right?


Presumably no tasks at all would be scheduled.


How would you prevent tasks from being scheduled?  How would you
prevent drivers from deadlocking because in order to put their device
in a low-power state they need to acquire a lock which is held by a
user task?


you give up on the suspend becouse you have no way of getting the user 
task to give up the lock.


however, kernel locks should not be held by user tasks, user tasks are not 
expected to behave in rational ways, allowing them to compete with kernel 
tasks for locks is a sure way to get a deadlock or indefinate stall.


what locks are accessed this way?


from that point on you are just walking the device tree
putting things into low-power mode. This is the point where we are talking
about jumping to.



Yes.  And putting things into low-power mode requires the ability to
run the scheduler, which means that user tasks can be scheduled, which
means that they can run.


Does it really (fundamentally) require scheduling tasks, particularly in
the case that the devices have already been put in the "quiesced" state?


I can't say for sure.  That's the way we have been doing it.  It
wouldn't be easy to change, because the driver would have to busy-wait
during delays -- which would mean it would need to use different code
for system-wide suspend and runtime suspend.


please define terms so that we are all on the same page

what do you mean by
system-wide suspend
runtime suspend

David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Jeremy Maitin-Shepard
Alan Stern <[EMAIL PROTECTED]> writes:

> On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote:
>> >> when doing a suspend-to-ram you get to a point where you just don't use 
>> >> any userspace.
>> 
>> > What do you mean?  How can you prevent user tasks from running?  That's 
>> > basically what the freezer does, and the whole point of this approach 
>> > is to eliminate the freezer.  Right?
>> 
>> Presumably no tasks at all would be scheduled.

> How would you prevent tasks from being scheduled?  How would you
> prevent drivers from deadlocking because in order to put their device
> in a low-power state they need to acquire a lock which is held by a
> user task?

Perhaps this isn't an issue once the device is already quiesced.  I'm
just conjecturing.

[snip]

-- 
Jeremy Maitin-Shepard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread david

On Sat, 21 Jul 2007, Rafael J. Wysocki wrote:


On Friday, 20 July 2007 23:39, [EMAIL PROTECTED] wrote:

On Fri, 20 Jul 2007, Rafael J. Wysocki wrote:


On Friday, 20 July 2007 17:36, [EMAIL PROTECTED] wrote:

On Fri, 20 Jul 2007, Jim Crilly wrote:


has
requested the image to be not greater than 50% of RAM.  In that case you
have
to free some memory _before_ identifying memory to save and you must not
race with applications that attempt to allocate memory while you're doing
it.


I disagree a little bit.

first off, only the suspending kernel can know what can be freed and what
is needed to do so (remember this is kernel internals, it can change from
patch to patch, let alone version to version)

second, if you have a lot of memory to free, and you can't just throw away
caches to do so, you don't know what is going to be involved in freeing
the memory, it's very possilbe that it is going to involve userspace, so
you can't freeze any significant portion of the system, so you can't
eliminate all chance of races

what you can do is

1. try to free stuff
2. stop the system and account for memory, is enough free
if not goto 1

if userspace is dirtying memory fast enough, or is just useing enough
memory that you can't meet your limit you just won't be able to suspend.

but under any other conditions you will eventually get enough memory free.

so try several times and if you still fail tell the user they have too
much stuff running and they need to kill something.


Which would be a pretty big regression from what we have now. With the
current implementation I can hibernate under virtually any workload because
the freezer stops everything and there's no competition for resources.


as long as what you are trying to save is <=50% of ram (at least with some
implementations). if you are trying to save more then 50% of ram with some
current implmenetations you just can't


With some, you can't, with the others, you can. :-)

The argument given was about the freezer and IMO it was valid.

Why didn't you address it directly?


I thought it had been covered in other messages (with as big as this
thread is I'm trying to avoid repeating the same thing more then a couple
times a day :-)

there was another message talking about ways that you could reduce the
image size without it being racy (allocate pinned memory until the
remainder is small enough, then don't backup the pinned memory)

that's a much cleaner answer then what I was thinking, so I'll go with it
instead ;-)


Wouldn't that cause the OOM killer to act, in some cases?


only in the case where the image absolutly cannot be made small enough.

and this should be detectable by the process that's pinning memory (this 
can be a kernel process) so that it stops before the OOM killer is 
triggered, even if that means that it returns 'unable to fit'


David Lang

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Alan Stern
On Fri, 20 Jul 2007, Jeremy Maitin-Shepard wrote:

> >> when doing a suspend-to-ram you get to a point where you just don't use 
> >> any userspace.
> 
> > What do you mean?  How can you prevent user tasks from running?  That's 
> > basically what the freezer does, and the whole point of this approach 
> > is to eliminate the freezer.  Right?
> 
> Presumably no tasks at all would be scheduled.

How would you prevent tasks from being scheduled?  How would you
prevent drivers from deadlocking because in order to put their device
in a low-power state they need to acquire a lock which is held by a
user task?

> >> from that point on you are just walking the device tree 
> >> putting things into low-power mode. This is the point where we are talking 
> >> about jumping to.
> 
> > Yes.  And putting things into low-power mode requires the ability to 
> > run the scheduler, which means that user tasks can be scheduled, which 
> > means that they can run.
> 
> Does it really (fundamentally) require scheduling tasks, particularly in
> the case that the devices have already been put in the "quiesced" state?

I can't say for sure.  That's the way we have been doing it.  It
wouldn't be easy to change, because the driver would have to busy-wait
during delays -- which would mean it would need to use different code
for system-wide suspend and runtime suspend.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-pm] Re: Hibernation considerations

2007-07-20 Thread Alan Stern
On Fri, 20 Jul 2007, Oliver Neukum wrote:

> > We already have a pre-suspend notification available for drivers that 
> > need to allocate large amounts of memory.
> 
> Is that facility fine grained enough?

It's a notifier chain that gets called at several points during the 
suspend transition.  One of those points is right at the start, while 
userspace is still running and reasonably large amounts of memory can 
be allocated.

Is it fine-grained enough?  I don't know -- hard to tell, since nothing 
much is using it yet.

> > You are correct about the need to delay/stop device addition.  I don't
> > know how this can be done in general; each code path calling
> > device_add() may have to be treated individually.
> 
> What about the old API?

What old API do you mean?

>  Do we have to block module loading?

No.  Registering new drivers is okay, registering new devices is bad.

Of course, some modules do want to register a new device in their init 
method.  I don't know what we should do about them.  Force the 
registration to fail, I suppose.  How often will people suspend while a 
module is loading?

> What happens if a scsi error handler is woken? If it cannot be woken,
> how are errors handled?

Why should the error handler wake up?  There isn't supposed to be any 
I/O going on, hence no errors to handle.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >