subject:"pcmcia resume 60 second hang. Re\: \[patch 00\/69\] \-stable review"

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Nigel Cunningham

Hi.

On Wed, 2007-05-30 at 16:04 +0200, Rafael J. Wysocki wrote:
> On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote:
> > On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote:
> > > On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote:
> > > 
> > > (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
> > > the original point)
> > > 
> > > > Notice that we want to be able to suspend while hibernating -- for
> > > > suspend to both behaviour. So drivers may _not_ rely on system being
> > > > runnable.
> > > 
> > > So keep the driver layers read-only and unfreeze the processes after 
> > > doing the atomic copy.
> > 
> > I know you probably won't care, but that's not an option for Suspend2 -
> > I get the possibility of a full image by overwriting LRU pages that were
> > saved prior to the atomic copy.
> 
> This generally is a problem, not only for suspend2. :-)
> 
> Once you've unfrozen the user land, we can't rely on the hibernation image any
> more, because some tasks may cause the on-disk filesystems' state to change.

True. I understood, perhaps wrongly, that when Matthew spoke of keeping
the drivers layers read-only, he was meaning stopping filesystem changes
by some other means.

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Linus Torvalds

On Wed, 30 May 2007, Rafael J. Wysocki wrote:
> 
> Very true.  And I think the right order should be to make the midlayers do
> this and then remove the freezer from the STR code path, not the other way
> around. :-)

Yes. After all, STR simply shouldn't _care_.

The rule should be that in a well-written setup, STR "just works" whether 
user processes are suspended or not. In other words, the whole freezing 
part isn't about STR. It should be totally immaterial.

(Of course, that assumes that the freezing is _sane_, of course: ie the 
core kernel threads shouldn't all be frozen. I think Rafael's patch to 
turn the defaults around are a big step in the right direction).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Pavel Machek

Hi!

> (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
> the original point)
> 
> > Notice that we want to be able to suspend while hibernating -- for
> > suspend to both behaviour. So drivers may _not_ rely on system being
> > runnable.
> 
> So keep the driver layers read-only and unfreeze the processes after 
> doing the atomic copy.

To read firmware you probably need to _write_ atimes.

Anyway, make-disks-read-only patch would be welcome. I just think it
is going to be more complex than freezer.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Randy Dunlap

On Wed, 30 May 2007 12:26:57 +0200 Romano Giannetti wrote:

> 
> On Tue, 2007-05-29 at 07:55 -0700, Linus Torvalds wrote:
> > 
> > On Tue, 29 May 2007, Romano Giannetti wrote:
> > >
> > > - The good (?) news. I have made 7 suspend/resume cycle (to ram, I
> > > haven't tested hibernation) with a 2.6.21.2 with that patch, applied
> > > manually. The system did suspend and resume nicely even compiling a
> > > kernel and opening openoffice. Normally (le me stress _normally_) no
> > > delay was apparent on resume. I do not know how dangerous is this... :-)
> > > 
> > > - The bad (?) news. One time out of 7 I had the 60 seconds delay.
> > 
> > Interesting. If you can re-create it, please do the sysrq-T thing again, 
> > to see what's up. (Also, you might do "sysrq-p", which gives the current 
> > process data, which sysrq-T does not).
> 
> 
> I've got it, but I had a problem: I filled the dmesg buffer. I will try
> to find where to enlarge it. I have posted the partial result to: 

use 'dmesg -s 10' if it's just dmesg(8) that needs help.
If it's the kernel buffer filling up, you can rebuild the kernel
after changing CONFIG_LOG_BUF_SHIFT, but it's easier just to boot
using this:
log_buf_len=n   Sets the size of the printk ring buffer, in bytes.
Format: { n | nk | nM }
n must be a power of two.  The default size
is set in the kernel config file.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Matthew Garrett

On Wed, May 30, 2007 at 04:04:22PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote:
> > On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote:
> > > So keep the driver layers read-only and unfreeze the processes after 
> > > doing the atomic copy.
> > 
> > I know you probably won't care, but that's not an option for Suspend2 -
> > I get the possibility of a full image by overwriting LRU pages that were
> > saved prior to the atomic copy.
> 
> This generally is a problem, not only for suspend2. :-)
> 
> Once you've unfrozen the user land, we can't rely on the hibernation image any
> more, because some tasks may cause the on-disk filesystems' state to change.

Hence "keep the driver layers read-only" :) 

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Rafael J. Wysocki

On Wednesday, 30 May 2007 15:29, Matthew Garrett wrote:
> On Wed, May 30, 2007 at 11:17:47PM +1000, Nigel Cunningham wrote:
> 
> > That aside, keeping the driver layers read-only sounds more complicated
> > than just freezing processes.
> 
> It's a problem that effectively has to be solved for STR anyway if 
> we're going to suspend without freezing. The midlayers need to be able 
> to block requests when the low-level devices are suspended,

Very true.  And I think the right order should be to make the midlayers do
this and then remove the freezer from the STR code path, not the other way
around. :-)

> so we can just re-use that code.

Yes, that should be possible.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Rafael J. Wysocki

On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote:
> On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote:
> > On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote:
> > 
> > (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
> > the original point)
> > 
> > > Notice that we want to be able to suspend while hibernating -- for
> > > suspend to both behaviour. So drivers may _not_ rely on system being
> > > runnable.
> > 
> > So keep the driver layers read-only and unfreeze the processes after 
> > doing the atomic copy.
> 
> I know you probably won't care, but that's not an option for Suspend2 -
> I get the possibility of a full image by overwriting LRU pages that were
> saved prior to the atomic copy.

This generally is a problem, not only for suspend2. :-)

Once you've unfrozen the user land, we can't rely on the hibernation image any
more, because some tasks may cause the on-disk filesystems' state to change.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Matthew Garrett

On Wed, May 30, 2007 at 11:17:47PM +1000, Nigel Cunningham wrote:

> That aside, keeping the driver layers read-only sounds more complicated
> than just freezing processes.

It's a problem that effectively has to be solved for STR anyway if 
we're going to suspend without freezing. The midlayers need to be able 
to block requests when the low-level devices are suspended, so we can 
just re-use that code.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Nigel Cunningham

On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote:
> On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote:
> 
> (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
> the original point)
> 
> > Notice that we want to be able to suspend while hibernating -- for
> > suspend to both behaviour. So drivers may _not_ rely on system being
> > runnable.
> 
> So keep the driver layers read-only and unfreeze the processes after 
> doing the atomic copy.

I know you probably won't care, but that's not an option for Suspend2 -
I get the possibility of a full image by overwriting LRU pages that were
saved prior to the atomic copy.

That aside, keeping the driver layers read-only sounds more complicated
than just freezing processes.

Regards,

Nigel

signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Matthew Garrett

On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote:

(Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
the original point)

> Notice that we want to be able to suspend while hibernating -- for
> suspend to both behaviour. So drivers may _not_ rely on system being
> runnable.

So keep the driver layers read-only and unfreeze the processes after 
doing the atomic copy.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Pavel Machek

Hi!

> > > How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
> > > the filesystem VFS callers?   Or is that starting to get messy again?
> > 
> > Yeah. Getting messy again :)
> 
> Indeed. And also misses the point - the point being that we don't actually 
> need to freeze anything at all most of the time. There's nothing wrong 
> with making memory allocations etc.
> 
> And yes, suspend is different from hibernate. I can see how hibernate 
> people are worried about people writing to things after doing the 
> snapshot, but those concerns don't exist with suspend. With suspend, the 
> biggest concern is accessing a device after it has been suspended, but on 
> the other hand, also the fact that we end up having driver writers used 
> to the system being "runnable", so they do things that really do require a 
> full-fledged system (and sometimes that means just some delayed action 
> using a kernel thread, other times it seems to rely on more complex 
> behaviour like firmware loading :^p )

Notice that we want to be able to suspend while hibernating -- for
suspend to both behaviour. So drivers may _not_ rely on system being
runnable.

(Suspend to both is: write image to disk, then suspend to RAM. If you
do not run out of battery, resume is from RAM and fast, if you do, you
still can do resume from disk, not loosing your data).
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Romano Giannetti


On Tue, 2007-05-29 at 07:55 -0700, Linus Torvalds wrote:
> 
> On Tue, 29 May 2007, Romano Giannetti wrote:
> >
> > - The good (?) news. I have made 7 suspend/resume cycle (to ram, I
> > haven't tested hibernation) with a 2.6.21.2 with that patch, applied
> > manually. The system did suspend and resume nicely even compiling a
> > kernel and opening openoffice. Normally (le me stress _normally_) no
> > delay was apparent on resume. I do not know how dangerous is this... :-)
> > 
> > - The bad (?) news. One time out of 7 I had the 60 seconds delay.
> 
> Interesting. If you can re-create it, please do the sysrq-T thing again, 
> to see what's up. (Also, you might do "sysrq-p", which gives the current 
> process data, which sysrq-T does not).


I've got it, but I had a problem: I filled the dmesg buffer. I will try
to find where to enlarge it. I have posted the partial result to: 

http://www.dea.icai.upcomillas.es/romano/linux/info/dmesg-resume-nofreeze.txt

in the hope that something can be used. I am running 2.6.21.2, with the
"no freeze kthreads at all" patch from Matthew Garrett, with this
add-on:

--- drivers/base/firmware_class.c.orig  2007-05-30 12:19:59.0 +0200
+++ drivers/base/firmware_class.c   2007-05-29 19:39:56.0 +0200
@@ -471,7 +471,11 @@
  struct device *device)
 {
 int uevent = 1;
-return _request_firmware(firmware_p, name, device, uevent);
+int rval;
+printk(KERN_ERR "FW: requesting firmware (sync) for %s\n", name);
+rval = _request_firmware(firmware_p, name, device, uevent);
+printk(KERN_ERR "FW: return %d\n", rval);
+return rval;
 }
 
 /**
@@ -545,7 +549,9 @@
struct task_struct *task;
struct firmware_work *fw_work = kmalloc(sizeof (struct firmware_work),
GFP_ATOMIC);
-
+
+printk(KERN_ERR "FW: requesting firmware (async) for %s\n", name);
+
if (!fw_work)
return -ENOMEM;
if (!try_module_get(module)) {
@@ -569,8 +575,12 @@
fw_work->cont(NULL, fw_work->context);
module_put(fw_work->module);
kfree(fw_work);
+printk(KERN_ERR "FW: failing return %d\n", PTR_ERR(task));
return PTR_ERR(task);
}
+
+printk(KERN_ERR "FW: normal return\n");
+
return 0;
 }
 




-- 
Romano Giannetti --- [EMAIL PROTECTED]
Sorry for the following disclaimer, it's attached by our otugoing server
and I cannot shut it up.
 


--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Nigel Cunningham

Hi.

On Wed, 2007-05-30 at 16:04 +0200, Rafael J. Wysocki wrote:
 On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote:
  On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote:
   On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote:
   
   (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
   the original point)
   
Notice that we want to be able to suspend while hibernating -- for
suspend to both behaviour. So drivers may _not_ rely on system being
runnable.
   
   So keep the driver layers read-only and unfreeze the processes after 
   doing the atomic copy.
  
  I know you probably won't care, but that's not an option for Suspend2 -
  I get the possibility of a full image by overwriting LRU pages that were
  saved prior to the atomic copy.
 
 This generally is a problem, not only for suspend2. :-)
 
 Once you've unfrozen the user land, we can't rely on the hibernation image any
 more, because some tasks may cause the on-disk filesystems' state to change.

True. I understood, perhaps wrongly, that when Matthew spoke of keeping
the drivers layers read-only, he was meaning stopping filesystem changes
by some other means.

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Romano Giannetti


On Tue, 2007-05-29 at 07:55 -0700, Linus Torvalds wrote:
 
 On Tue, 29 May 2007, Romano Giannetti wrote:
 
  - The good (?) news. I have made 7 suspend/resume cycle (to ram, I
  haven't tested hibernation) with a 2.6.21.2 with that patch, applied
  manually. The system did suspend and resume nicely even compiling a
  kernel and opening openoffice. Normally (le me stress _normally_) no
  delay was apparent on resume. I do not know how dangerous is this... :-)
  
  - The bad (?) news. One time out of 7 I had the 60 seconds delay.
 
 Interesting. If you can re-create it, please do the sysrq-T thing again, 
 to see what's up. (Also, you might do sysrq-p, which gives the current 
 process data, which sysrq-T does not).


I've got it, but I had a problem: I filled the dmesg buffer. I will try
to find where to enlarge it. I have posted the partial result to: 

http://www.dea.icai.upcomillas.es/romano/linux/info/dmesg-resume-nofreeze.txt

in the hope that something can be used. I am running 2.6.21.2, with the
no freeze kthreads at all patch from Matthew Garrett, with this
add-on:

--- drivers/base/firmware_class.c.orig  2007-05-30 12:19:59.0 +0200
+++ drivers/base/firmware_class.c   2007-05-29 19:39:56.0 +0200
@@ -471,7 +471,11 @@
  struct device *device)
 {
 int uevent = 1;
-return _request_firmware(firmware_p, name, device, uevent);
+int rval;
+printk(KERN_ERR FW: requesting firmware (sync) for %s\n, name);
+rval = _request_firmware(firmware_p, name, device, uevent);
+printk(KERN_ERR FW: return %d\n, rval);
+return rval;
 }
 
 /**
@@ -545,7 +549,9 @@
struct task_struct *task;
struct firmware_work *fw_work = kmalloc(sizeof (struct firmware_work),
GFP_ATOMIC);
-
+
+printk(KERN_ERR FW: requesting firmware (async) for %s\n, name);
+
if (!fw_work)
return -ENOMEM;
if (!try_module_get(module)) {
@@ -569,8 +575,12 @@
fw_work-cont(NULL, fw_work-context);
module_put(fw_work-module);
kfree(fw_work);
+printk(KERN_ERR FW: failing return %d\n, PTR_ERR(task));
return PTR_ERR(task);
}
+
+printk(KERN_ERR FW: normal return\n);
+
return 0;
 }
 




-- 
Romano Giannetti --- [EMAIL PROTECTED]
Sorry for the following disclaimer, it's attached by our otugoing server
and I cannot shut it up.
 


--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso 
del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, 
le informamos que cualquier forma de distribución, reproducción o uso de esta 
comunicación y/o de la información contenida en la misma están estrictamente 
prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por 
favor, notifíquelo inmediatamente al remitente contestando a este mensaje y 
proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive 
use of the intended addressee. If you are not the intended addressee, please 
note that any form of distribution, copying or use of this communication or the 
information in it is strictly prohibited by law. If you have received this 
communication in error, please immediately notify the sender by reply e-mail 
and destroy this message. Thank you for your cooperation. 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Pavel Machek

Hi!

   How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
   the filesystem VFS callers?   Or is that starting to get messy again?
  
  Yeah. Getting messy again :)
 
 Indeed. And also misses the point - the point being that we don't actually 
 need to freeze anything at all most of the time. There's nothing wrong 
 with making memory allocations etc.
 
 And yes, suspend is different from hibernate. I can see how hibernate 
 people are worried about people writing to things after doing the 
 snapshot, but those concerns don't exist with suspend. With suspend, the 
 biggest concern is accessing a device after it has been suspended, but on 
 the other hand, also the fact that we end up having driver writers used 
 to the system being runnable, so they do things that really do require a 
 full-fledged system (and sometimes that means just some delayed action 
 using a kernel thread, other times it seems to rely on more complex 
 behaviour like firmware loading :^p )

Notice that we want to be able to suspend while hibernating -- for
suspend to both behaviour. So drivers may _not_ rely on system being
runnable.

(Suspend to both is: write image to disk, then suspend to RAM. If you
do not run out of battery, resume is from RAM and fast, if you do, you
still can do resume from disk, not loosing your data).
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Matthew Garrett

On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote:

(Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
the original point)

 Notice that we want to be able to suspend while hibernating -- for
 suspend to both behaviour. So drivers may _not_ rely on system being
 runnable.

So keep the driver layers read-only and unfreeze the processes after 
doing the atomic copy.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Nigel Cunningham

On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote:
 On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote:
 
 (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
 the original point)
 
  Notice that we want to be able to suspend while hibernating -- for
  suspend to both behaviour. So drivers may _not_ rely on system being
  runnable.
 
 So keep the driver layers read-only and unfreeze the processes after 
 doing the atomic copy.

I know you probably won't care, but that's not an option for Suspend2 -
I get the possibility of a full image by overwriting LRU pages that were
saved prior to the atomic copy.

That aside, keeping the driver layers read-only sounds more complicated
than just freezing processes.

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Matthew Garrett

On Wed, May 30, 2007 at 11:17:47PM +1000, Nigel Cunningham wrote:

 That aside, keeping the driver layers read-only sounds more complicated
 than just freezing processes.

It's a problem that effectively has to be solved for STR anyway if 
we're going to suspend without freezing. The midlayers need to be able 
to block requests when the low-level devices are suspended, so we can 
just re-use that code.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Rafael J. Wysocki

On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote:
 On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote:
  On Wed, May 30, 2007 at 01:49:21PM +0200, Pavel Machek wrote:
  
  (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
  the original point)
  
   Notice that we want to be able to suspend while hibernating -- for
   suspend to both behaviour. So drivers may _not_ rely on system being
   runnable.
  
  So keep the driver layers read-only and unfreeze the processes after 
  doing the atomic copy.
 
 I know you probably won't care, but that's not an option for Suspend2 -
 I get the possibility of a full image by overwriting LRU pages that were
 saved prior to the atomic copy.

This generally is a problem, not only for suspend2. :-)

Once you've unfrozen the user land, we can't rely on the hibernation image any
more, because some tasks may cause the on-disk filesystems' state to change.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Rafael J. Wysocki

On Wednesday, 30 May 2007 15:29, Matthew Garrett wrote:
 On Wed, May 30, 2007 at 11:17:47PM +1000, Nigel Cunningham wrote:
 
  That aside, keeping the driver layers read-only sounds more complicated
  than just freezing processes.
 
 It's a problem that effectively has to be solved for STR anyway if 
 we're going to suspend without freezing. The midlayers need to be able 
 to block requests when the low-level devices are suspended,

Very true.  And I think the right order should be to make the midlayers do
this and then remove the freezer from the STR code path, not the other way
around. :-)

 so we can just re-use that code.

Yes, that should be possible.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Matthew Garrett

On Wed, May 30, 2007 at 04:04:22PM +0200, Rafael J. Wysocki wrote:
 On Wednesday, 30 May 2007 15:17, Nigel Cunningham wrote:
  On Wed, 2007-05-30 at 13:40 +0100, Matthew Garrett wrote:
   So keep the driver layers read-only and unfreeze the processes after 
   doing the atomic copy.
  
  I know you probably won't care, but that's not an option for Suspend2 -
  I get the possibility of a full image by overwriting LRU pages that were
  saved prior to the atomic copy.
 
 This generally is a problem, not only for suspend2. :-)
 
 Once you've unfrozen the user land, we can't rely on the hibernation image any
 more, because some tasks may cause the on-disk filesystems' state to change.

Hence keep the driver layers read-only :) 

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Randy Dunlap

On Wed, 30 May 2007 12:26:57 +0200 Romano Giannetti wrote:

 
 On Tue, 2007-05-29 at 07:55 -0700, Linus Torvalds wrote:
  
  On Tue, 29 May 2007, Romano Giannetti wrote:
  
   - The good (?) news. I have made 7 suspend/resume cycle (to ram, I
   haven't tested hibernation) with a 2.6.21.2 with that patch, applied
   manually. The system did suspend and resume nicely even compiling a
   kernel and opening openoffice. Normally (le me stress _normally_) no
   delay was apparent on resume. I do not know how dangerous is this... :-)
   
   - The bad (?) news. One time out of 7 I had the 60 seconds delay.
  
  Interesting. If you can re-create it, please do the sysrq-T thing again, 
  to see what's up. (Also, you might do sysrq-p, which gives the current 
  process data, which sysrq-T does not).
 
 
 I've got it, but I had a problem: I filled the dmesg buffer. I will try
 to find where to enlarge it. I have posted the partial result to: 

use 'dmesg -s 10' if it's just dmesg(8) that needs help.
If it's the kernel buffer filling up, you can rebuild the kernel
after changing CONFIG_LOG_BUF_SHIFT, but it's easier just to boot
using this:
log_buf_len=n   Sets the size of the printk ring buffer, in bytes.
Format: { n | nk | nM }
n must be a power of two.  The default size
is set in the kernel config file.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Pavel Machek

Hi!

 (Trimmed the Cc:s quite heavily - I think this has gone somewhere beyond 
 the original point)
 
  Notice that we want to be able to suspend while hibernating -- for
  suspend to both behaviour. So drivers may _not_ rely on system being
  runnable.
 
 So keep the driver layers read-only and unfreeze the processes after 
 doing the atomic copy.

To read firmware you probably need to _write_ atimes.

Anyway, make-disks-read-only patch would be welcome. I just think it
is going to be more complex than freezer.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-30 Thread Linus Torvalds



On Wed, 30 May 2007, Rafael J. Wysocki wrote:
 
 Very true.  And I think the right order should be to make the midlayers do
 this and then remove the freezer from the STR code path, not the other way
 around. :-)

Yes. After all, STR simply shouldn't _care_.

The rule should be that in a well-written setup, STR just works whether 
user processes are suspended or not. In other words, the whole freezing 
part isn't about STR. It should be totally immaterial.

(Of course, that assumes that the freezing is _sane_, of course: ie the 
core kernel threads shouldn't all be frozen. I think Rafael's patch to 
turn the defaults around are a big step in the right direction).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Nigel Cunningham

Hi.

On Tue, 2007-05-29 at 14:33 -0700, Linus Torvalds wrote:
> 
> On Wed, 30 May 2007, Nigel Cunningham wrote:
> > 
> > On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote:
> > > 
> > > How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
> > > the filesystem VFS callers?   Or is that starting to get messy again?
> > 
> > Yeah. Getting messy again :)
> 
> Indeed. And also misses the point - the point being that we don't actually 
> need to freeze anything at all most of the time. There's nothing wrong 
> with making memory allocations etc.
> 
> And yes, suspend is different from hibernate. I can see how hibernate 
> people are worried about people writing to things after doing the 
> snapshot, but those concerns don't exist with suspend. With suspend, the 
> biggest concern is accessing a device after it has been suspended, but on 
> the other hand, also the fact that we end up having driver writers used 
> to the system being "runnable", so they do things that really do require a 
> full-fledged system (and sometimes that means just some delayed action 
> using a kernel thread, other times it seems to rely on more complex 
> behaviour like firmware loading :^p )

Yeah, but they can't. Even after the freezing of processes has been
removed from the normal suspend to ram path, we're still going to have
this issue with the suspend to ram after writing a hibernation image
path.

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Linus Torvalds

On Wed, 30 May 2007, Nigel Cunningham wrote:
> 
> On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote:
> > 
> > How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
> > the filesystem VFS callers?   Or is that starting to get messy again?
> 
> Yeah. Getting messy again :)

Indeed. And also misses the point - the point being that we don't actually 
need to freeze anything at all most of the time. There's nothing wrong 
with making memory allocations etc.

And yes, suspend is different from hibernate. I can see how hibernate 
people are worried about people writing to things after doing the 
snapshot, but those concerns don't exist with suspend. With suspend, the 
biggest concern is accessing a device after it has been suspended, but on 
the other hand, also the fact that we end up having driver writers used 
to the system being "runnable", so they do things that really do require a 
full-fledged system (and sometimes that means just some delayed action 
using a kernel thread, other times it seems to rely on more complex 
behaviour like firmware loading :^p )

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Nigel Cunningham

Hi.

On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote:
> Nigel Cunningham wrote:
> >
> > I'm sorry to say it, but dropping process freezing still seems to me
> > like the better way though. I prefer it because of the reliability
> > aspect. With the current code, having frozen processes, I can look at
> > the state of memory, calculate how much I'll need for this or that and
> > know that I'll have sufficient memory for the atomic copy and for doing
> > the I/O  (making assumptions about how much memory drivers will
> > allocate) before I start to do either. If we stop freezing processes,
> > that predictability will go away. There'll always be a possibility that
> > some process will get memory hungry and stop me from being able to get
> > the image on disk, and I'll have to either abort or give up and try
> > again and again until I can complete writing the image, the battery runs
> > out or whatever... 
> 
> How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
> the filesystem VFS callers?   Or is that starting to get messy again?

Yeah. Getting messy again :)

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Linus Torvalds



On Tue, 29 May 2007, Romano Giannetti wrote:
>
> - The good (?) news. I have made 7 suspend/resume cycle (to ram, I
> haven't tested hibernation) with a 2.6.21.2 with that patch, applied
> manually. The system did suspend and resume nicely even compiling a
> kernel and opening openoffice. Normally (le me stress _normally_) no
> delay was apparent on resume. I do not know how dangerous is this... :-)
> 
> - The bad (?) news. One time out of 7 I had the 60 seconds delay.

Interesting. If you can re-create it, please do the sysrq-T thing again, 
to see what's up. (Also, you might do "sysrq-p", which gives the current 
process data, which sysrq-T does not).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Mark Lord


Nigel Cunningham wrote:


I'm sorry to say it, but dropping process freezing still seems to me
like the better way though. I prefer it because of the reliability
aspect. With the current code, having frozen processes, I can look at
the state of memory, calculate how much I'll need for this or that and
know that I'll have sufficient memory for the atomic copy and for doing
the I/O  (making assumptions about how much memory drivers will
allocate) before I start to do either. If we stop freezing processes,
that predictability will go away. There'll always be a possibility that
some process will get memory hungry and stop me from being able to get
the image on disk, and I'll have to either abort or give up and try
again and again until I can complete writing the image, the battery runs
out or whatever... 


How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
the filesystem VFS callers?   Or is that starting to get messy again?

Cheers

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Mark Lord


Linus Torvalds wrote:


On Fri, 25 May 2007, Nigel Cunningham wrote:

Does that mean you never ever power off your laptop (assuming you have
one), and the battery never runs out? Surely you must power it off
completely sometimes?


So? The bootup isn't that much worse than a disk suspend/resume, and it's 
reliable.


I very much prefer suspend (to RAM) over hibernate (to DISK).
But once in a while, primarily when travelling, I'll use hibernate.

And the "swsusp" in the kernel is just plain crappy and slow,
which leads many people (including our beloved chief penguin, it seems)
into thinking that hibernate *has* to be too slow to be useful.

But with Suspend2, it is very quick and usable by comparism.
Try it, you'll like it (at least a little).

Cheers
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Kay Sievers

On Tue, 2007-05-29 at 13:00 +0100, Michael-Luke Jones wrote:
> Rafael J. Wysocki wrote:
> > On Tuesday, 29 May 2007 08:55, Kay Sievers wrote:
> >> The shiny userspace firmware loading causes problems since it exists,
> >> every second box has problems with it, in all sorts of situations. If
> >> people are still sold to the idea of userspace firmware loading, why
> >> don't we keep the data in the driver, instead of immediately
> >> discarding it after the first upload? Not to waste a few hundred
> >> kilobytes? That doesn't sound like a convincing deal, after all the
> >> years people try to work around the issues it causes.
> > 
> > Agreed.
> > 
> > Rafael
> 
> Rather than most drivers being told to make this step, can this be added 
> to the firmware_class such that firmware objects are cached in RAM and 
> subsequent calls to request_firmware() don't have to query userspace.
> 
> This seems the least intrusive solution to this problem.

Who is going to keep track of the data hiding in the firmware_class? On
driver unbind, module unload, you want to release the data.

Kay

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Pavel Machek

Hi!

> > I guess we should warn the driver authors, then; and decide what driver 
> > authors should do.
> 
> Drivers really shouldn't do anythign at all.

*)

> > If I'm video4linux driver for grabbing screen, have been suspended, and 
> > someone asks me to read a frame, should I
> > 
> > a) return -ESORRYIMSUSPENDED
> > 
> > b) just block the caller
> 
> The "subsystem" thing should have stopped the queues, and the device 
> should never even _see_ this.

Okay, _if_ there's a subsystem, subsystem should have stopped the
queues. End result should be that userspace is blocked when trying to
access suspended device/suspended subsystem.

I guess we are in violent agreement.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Rafael J. Wysocki

On Tuesday, 29 May 2007 14:00, Michael-Luke Jones wrote:
> Rafael J. Wysocki wrote:
> > On Tuesday, 29 May 2007 08:55, Kay Sievers wrote:
> >> The shiny userspace firmware loading causes problems since it exists,
> >> every second box has problems with it, in all sorts of situations. If
> >> people are still sold to the idea of userspace firmware loading, why
> >> don't we keep the data in the driver, instead of immediately
> >> discarding it after the first upload? Not to waste a few hundred
> >> kilobytes? That doesn't sound like a convincing deal, after all the
> >> years people try to work around the issues it causes.
> > 
> > Agreed.
> > 
> > Rafael
> 
> Rather than most drivers being told to make this step, can this be added 
> to the firmware_class such that firmware objects are cached in RAM and 
> subsequent calls to request_firmware() don't have to query userspace.
> 
> This seems the least intrusive solution to this problem.

Agreed again. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Michael-Luke Jones


Rafael J. Wysocki wrote:

On Tuesday, 29 May 2007 08:55, Kay Sievers wrote:

The shiny userspace firmware loading causes problems since it exists,
every second box has problems with it, in all sorts of situations. If
people are still sold to the idea of userspace firmware loading, why
don't we keep the data in the driver, instead of immediately
discarding it after the first upload? Not to waste a few hundred
kilobytes? That doesn't sound like a convincing deal, after all the
years people try to work around the issues it causes.


Agreed.

Rafael


Rather than most drivers being told to make this step, can this be added 
to the firmware_class such that firmware objects are cached in RAM and 
subsequent calls to request_firmware() don't have to query userspace.


This seems the least intrusive solution to this problem.

Thanks,

Michael-Luke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Rafael J. Wysocki

On Tuesday, 29 May 2007 08:55, Kay Sievers wrote:
> On 5/25/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > On Fri, 25 May 2007, Pavel Machek wrote:
> > >
> > > 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
> > > PEOPLE FOR FIVE YEARS NOW.
> >
> > And people aren't listening. Have you thought about _why_?
> >
> > The thing is, it should just work. Even without pre-loading.
> >
> > > Imageine we killed freezer. Also imagine Romano has IDE card his
> > > PCMCIA slot. Kaboom, we solved nothing.
> >
> > Don't be silly. We solved it. The firmware has to be loadable from
> > somewhere else, since otherwise his IDE card wouldn't have been accessible
> > in the first place!
> 
> The shiny userspace firmware loading causes problems since it exists,
> every second box has problems with it, in all sorts of situations. If
> people are still sold to the idea of userspace firmware loading, why
> don't we keep the data in the driver, instead of immediately
> discarding it after the first upload? Not to waste a few hundred
> kilobytes? That doesn't sound like a convincing deal, after all the
> years people try to work around the issues it causes.

Agreed.

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Romano Giannetti

On Sun, 2007-05-27 at 19:44 +0100, Matthew Garrett wrote:
>

> Anyway. I've tested the following patch on a dual-core x86. No obvious

> issues yet, but I'll try to put it through a few hundred cycles.
[patch to disable freezer deleted]

First of all, excuse me for being a quite lousy tester. Could not come
around to try bisecting, no time at all. Yesterday in the autobus I gave
a shot to this "wild test" and I report the results here.


- The good (?) news. I have made 7 suspend/resume cycle (to ram, I
haven't tested hibernation) with a 2.6.21.2 with that patch, applied
manually. The system did suspend and resume nicely even compiling a
kernel and opening openoffice. Normally (le me stress _normally_) no
delay was apparent on resume. I do not know how dangerous is this... :-)

- The bad (?) news. One time out of 7 I had the 60 seconds delay. I
attach here the dmesg(s) of the resumes,  a good one, a delayed one, and
another good one after a reboot (where you can, by the way, see the
dancing serial effect... the card is sometime /dev/ttyS1,
sometime /dev/ttyS2). 



[ 1112.984000] Back to C!
[ 1112.985000] Applying VIA southbridge workaround.
[ 1112.985000] PCI: Disabling Via external APIC routing
[ 1113.418000] PM: Writing back config space on device :00:00.0 at offset 1 
(was 216, writing 1216)
[ 1113.418000] PM: Writing back config space on device :00:01.0 at offset 9 
(was fff0, writing 38003800)
[ 1113.418000] PCI: Setting latency timer of device :00:01.0 to 64
[ 1114.408000] ACPI: PCI Interrupt :00:07.2[D] -> Link [LNKD] -> GSI 9 
(level, low) -> IRQ 9
[ 1114.408000] PCI: Setting latency timer of device :00:07.2 to 64
[ 1114.408000] usb usb1: root hub lost power or was reset
[ 1114.481000] ACPI: PCI Interrupt :00:07.3[D] -> Link [LNKD] -> GSI 9 
(level, low) -> IRQ 9
[ 1114.481000] PCI: Setting latency timer of device :00:07.3 to 64
[ 1114.481000] usb usb2: root hub lost power or was reset
[ 1114.657000] ACPI: PCI Interrupt :00:07.5[C] -> Link [LNKC] -> GSI 5 
(level, low) -> IRQ 5
[ 1114.657000] PCI: Setting latency timer of device :00:07.5 to 64
[ 1115.347000] pccard: PCMCIA card inserted into slot 1
[ 1115.347000] pcmcia: registering new device pcmcia1.0
[ 1115.459000] pcmcia: request for exclusive IRQ could not be fulfilled.
[ 1115.459000] pcmcia: the driver needs updating to supported shared IRQ lines.
[ 1115.504000] eth0: 3Com 3c589, io 0x300, irq 3, hw_addr 00:00:86:1A:4E:A8
[ 1115.504000]   8K FIFO split 5:3 Rx:Tx, auto xcvr
[ 1115.504000] pcmcia: registering new device pcmcia1.1
[ 1115.504000] pcmcia: request for exclusive IRQ could not be fulfilled.
[ 1115.504000] pcmcia: the driver needs updating to supported shared IRQ lines.
[ 1115.545000] 1.1: ttyS2 at I/O 0x3e8 (irq = 3) is a 16550A
[ 1115.558000] PM: Writing back config space on device :00:0e.0 at offset 3 
(was 8, writing 4008)
[ 1115.558000] PM: Writing back config space on device :00:0e.0 at offset 1 
(was 2100012, writing 2100016)
[ 1115.609000] ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[9]  
MMIO=[e8004000-e80047ff]  Max Packet=[2048]  IR/IT contexts=[4/8]
[ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 5 
(was 0, writing e8004800)
[ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 4 
(was 1, writing 1801)
[ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 1 
(was 290, writing 293)
[ 1115.67] pnp: Device 00:08 activated.
[ 1115.67] pnp: Failed to activate device 00:0a.
[ 1115.67] pnp: Failed to activate device 00:0b.
[ 1117.648000] 8139too Fast Ethernet driver 0.9.28
[ 1117.65] ACPI: PCI Interrupt :00:10.0[A] -> Link [LNKB] -> GSI 10 
(level, low) -> IRQ 10
[ 1117.653000] eth0: RealTek RTL8139 at 0xe0a72800, 08:00:46:6e:93:a8, IRQ 10
[ 1117.653000] eth0:  Identified 8139 chip type 'RTL-8139C'
[ 1118.403000] input: Power Button (FF) as /class/input/input13
[ 1118.404000] ACPI: Power Button (FF) [PWRF]
[ 1118.404000] input: Sleep Button (CM) as /class/input/input14
[ 1118.405000] ACPI: Sleep Button (CM) [SBTN]
[ 1118.406000] input: Lid Switch as /class/input/input15
[ 1118.407000] ACPI: Lid Switch [LID]
[ 1118.83] ACPI: Thermal Zone [THRM] (36 C)
[ 1119.431000] ACPI: AC Adapter [ACAD] (off-line)
[ 1119.899000] ACPI: Battery Slot [BAT1] (battery present)
[ 1119.904000] ACPI: Battery Slot [BAT2] (battery absent)
[ 1126.226000] eth1: no IPv6 routers present

suspend...

[ 2019.31] pccard: card ejected from slot 0
[ 2019.345000] PCMCIA: socket dc99bc28: *** DANGER *** unable to remove socket 
power
[ 2019.346000] pccard: card ejected from slot 1
[ 2020.041000] ACPI: PCI interrupt for device :00:10.0 disabled
[ 2024.641000] Suspending console(s)
[ 2024.656000]  usbdev2.1: PM: suspend 0->2, parent usb2 already 2
[ 2024.656000]  usbdev2.1_ep81: PM: suspend 0->2, parent 2-0:1.0 already 2
[ 2024.656000] hub 2-0:1.0: PM: suspend 2->2, parent usb2 already 2
[

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Kay Sievers

On 5/25/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:

On Fri, 25 May 2007, Pavel Machek wrote:
>
> 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
> PEOPLE FOR FIVE YEARS NOW.

And people aren't listening. Have you thought about _why_?

The thing is, it should just work. Even without pre-loading.

> Imageine we killed freezer. Also imagine Romano has IDE card his
> PCMCIA slot. Kaboom, we solved nothing.

Don't be silly. We solved it. The firmware has to be loadable from
somewhere else, since otherwise his IDE card wouldn't have been accessible
in the first place!

The shiny userspace firmware loading causes problems since it exists,
every second box has problems with it, in all sorts of situations. If
people are still sold to the idea of userspace firmware loading, why
don't we keep the data in the driver, instead of immediately
discarding it after the first upload? Not to waste a few hundred
kilobytes? That doesn't sound like a convincing deal, after all the
years people try to work around the issues it causes.

Kay
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Kay Sievers


On 5/25/07, Linus Torvalds [EMAIL PROTECTED] wrote:

On Fri, 25 May 2007, Pavel Machek wrote:

 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
 PEOPLE FOR FIVE YEARS NOW.

And people aren't listening. Have you thought about _why_?

The thing is, it should just work. Even without pre-loading.

 Imageine we killed freezer. Also imagine Romano has IDE card his
 PCMCIA slot. Kaboom, we solved nothing.

Don't be silly. We solved it. The firmware has to be loadable from
somewhere else, since otherwise his IDE card wouldn't have been accessible
in the first place!


The shiny userspace firmware loading causes problems since it exists,
every second box has problems with it, in all sorts of situations. If
people are still sold to the idea of userspace firmware loading, why
don't we keep the data in the driver, instead of immediately
discarding it after the first upload? Not to waste a few hundred
kilobytes? That doesn't sound like a convincing deal, after all the
years people try to work around the issues it causes.

Kay
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Romano Giannetti

On Sun, 2007-05-27 at 19:44 +0100, Matthew Garrett wrote:


 Anyway. I've tested the following patch on a dual-core x86. No obvious

 issues yet, but I'll try to put it through a few hundred cycles.
[patch to disable freezer deleted]

First of all, excuse me for being a quite lousy tester. Could not come
around to try bisecting, no time at all. Yesterday in the autobus I gave
a shot to this wild test and I report the results here.


- The good (?) news. I have made 7 suspend/resume cycle (to ram, I
haven't tested hibernation) with a 2.6.21.2 with that patch, applied
manually. The system did suspend and resume nicely even compiling a
kernel and opening openoffice. Normally (le me stress _normally_) no
delay was apparent on resume. I do not know how dangerous is this... :-)

- The bad (?) news. One time out of 7 I had the 60 seconds delay. I
attach here the dmesg(s) of the resumes,  a good one, a delayed one, and
another good one after a reboot (where you can, by the way, see the
dancing serial effect... the card is sometime /dev/ttyS1,
sometime /dev/ttyS2). 



[ 1112.984000] Back to C!
[ 1112.985000] Applying VIA southbridge workaround.
[ 1112.985000] PCI: Disabling Via external APIC routing
[ 1113.418000] PM: Writing back config space on device :00:00.0 at offset 1 
(was 216, writing 1216)
[ 1113.418000] PM: Writing back config space on device :00:01.0 at offset 9 
(was fff0, writing 38003800)
[ 1113.418000] PCI: Setting latency timer of device :00:01.0 to 64
[ 1114.408000] ACPI: PCI Interrupt :00:07.2[D] - Link [LNKD] - GSI 9 
(level, low) - IRQ 9
[ 1114.408000] PCI: Setting latency timer of device :00:07.2 to 64
[ 1114.408000] usb usb1: root hub lost power or was reset
[ 1114.481000] ACPI: PCI Interrupt :00:07.3[D] - Link [LNKD] - GSI 9 
(level, low) - IRQ 9
[ 1114.481000] PCI: Setting latency timer of device :00:07.3 to 64
[ 1114.481000] usb usb2: root hub lost power or was reset
[ 1114.657000] ACPI: PCI Interrupt :00:07.5[C] - Link [LNKC] - GSI 5 
(level, low) - IRQ 5
[ 1114.657000] PCI: Setting latency timer of device :00:07.5 to 64
[ 1115.347000] pccard: PCMCIA card inserted into slot 1
[ 1115.347000] pcmcia: registering new device pcmcia1.0
[ 1115.459000] pcmcia: request for exclusive IRQ could not be fulfilled.
[ 1115.459000] pcmcia: the driver needs updating to supported shared IRQ lines.
[ 1115.504000] eth0: 3Com 3c589, io 0x300, irq 3, hw_addr 00:00:86:1A:4E:A8
[ 1115.504000]   8K FIFO split 5:3 Rx:Tx, auto xcvr
[ 1115.504000] pcmcia: registering new device pcmcia1.1
[ 1115.504000] pcmcia: request for exclusive IRQ could not be fulfilled.
[ 1115.504000] pcmcia: the driver needs updating to supported shared IRQ lines.
[ 1115.545000] 1.1: ttyS2 at I/O 0x3e8 (irq = 3) is a 16550A
[ 1115.558000] PM: Writing back config space on device :00:0e.0 at offset 3 
(was 8, writing 4008)
[ 1115.558000] PM: Writing back config space on device :00:0e.0 at offset 1 
(was 2100012, writing 2100016)
[ 1115.609000] ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[9]  
MMIO=[e8004000-e80047ff]  Max Packet=[2048]  IR/IT contexts=[4/8]
[ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 5 
(was 0, writing e8004800)
[ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 4 
(was 1, writing 1801)
[ 1115.615000] PM: Writing back config space on device :00:10.0 at offset 1 
(was 290, writing 293)
[ 1115.67] pnp: Device 00:08 activated.
[ 1115.67] pnp: Failed to activate device 00:0a.
[ 1115.67] pnp: Failed to activate device 00:0b.
[ 1117.648000] 8139too Fast Ethernet driver 0.9.28
[ 1117.65] ACPI: PCI Interrupt :00:10.0[A] - Link [LNKB] - GSI 10 
(level, low) - IRQ 10
[ 1117.653000] eth0: RealTek RTL8139 at 0xe0a72800, 08:00:46:6e:93:a8, IRQ 10
[ 1117.653000] eth0:  Identified 8139 chip type 'RTL-8139C'
[ 1118.403000] input: Power Button (FF) as /class/input/input13
[ 1118.404000] ACPI: Power Button (FF) [PWRF]
[ 1118.404000] input: Sleep Button (CM) as /class/input/input14
[ 1118.405000] ACPI: Sleep Button (CM) [SBTN]
[ 1118.406000] input: Lid Switch as /class/input/input15
[ 1118.407000] ACPI: Lid Switch [LID]
[ 1118.83] ACPI: Thermal Zone [THRM] (36 C)
[ 1119.431000] ACPI: AC Adapter [ACAD] (off-line)
[ 1119.899000] ACPI: Battery Slot [BAT1] (battery present)
[ 1119.904000] ACPI: Battery Slot [BAT2] (battery absent)
[ 1126.226000] eth1: no IPv6 routers present

suspend...

[ 2019.31] pccard: card ejected from slot 0
[ 2019.345000] PCMCIA: socket dc99bc28: *** DANGER *** unable to remove socket 
power
[ 2019.346000] pccard: card ejected from slot 1
[ 2020.041000] ACPI: PCI interrupt for device :00:10.0 disabled
[ 2024.641000] Suspending console(s)
[ 2024.656000]  usbdev2.1: PM: suspend 0-2, parent usb2 already 2
[ 2024.656000]  usbdev2.1_ep81: PM: suspend 0-2, parent 2-0:1.0 already 2
[ 2024.656000] hub 2-0:1.0: PM: suspend 2-2, parent usb2 already 2
[ 2024.656000]

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Rafael J. Wysocki

On Tuesday, 29 May 2007 08:55, Kay Sievers wrote:
 On 5/25/07, Linus Torvalds [EMAIL PROTECTED] wrote:
  On Fri, 25 May 2007, Pavel Machek wrote:
  
   2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
   PEOPLE FOR FIVE YEARS NOW.
 
  And people aren't listening. Have you thought about _why_?
 
  The thing is, it should just work. Even without pre-loading.
 
   Imageine we killed freezer. Also imagine Romano has IDE card his
   PCMCIA slot. Kaboom, we solved nothing.
 
  Don't be silly. We solved it. The firmware has to be loadable from
  somewhere else, since otherwise his IDE card wouldn't have been accessible
  in the first place!
 
 The shiny userspace firmware loading causes problems since it exists,
 every second box has problems with it, in all sorts of situations. If
 people are still sold to the idea of userspace firmware loading, why
 don't we keep the data in the driver, instead of immediately
 discarding it after the first upload? Not to waste a few hundred
 kilobytes? That doesn't sound like a convincing deal, after all the
 years people try to work around the issues it causes.

Agreed.

Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Michael-Luke Jones


Rafael J. Wysocki wrote:

On Tuesday, 29 May 2007 08:55, Kay Sievers wrote:

The shiny userspace firmware loading causes problems since it exists,
every second box has problems with it, in all sorts of situations. If
people are still sold to the idea of userspace firmware loading, why
don't we keep the data in the driver, instead of immediately
discarding it after the first upload? Not to waste a few hundred
kilobytes? That doesn't sound like a convincing deal, after all the
years people try to work around the issues it causes.


Agreed.

Rafael


Rather than most drivers being told to make this step, can this be added 
to the firmware_class such that firmware objects are cached in RAM and 
subsequent calls to request_firmware() don't have to query userspace.


This seems the least intrusive solution to this problem.

Thanks,

Michael-Luke
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Rafael J. Wysocki

On Tuesday, 29 May 2007 14:00, Michael-Luke Jones wrote:
 Rafael J. Wysocki wrote:
  On Tuesday, 29 May 2007 08:55, Kay Sievers wrote:
  The shiny userspace firmware loading causes problems since it exists,
  every second box has problems with it, in all sorts of situations. If
  people are still sold to the idea of userspace firmware loading, why
  don't we keep the data in the driver, instead of immediately
  discarding it after the first upload? Not to waste a few hundred
  kilobytes? That doesn't sound like a convincing deal, after all the
  years people try to work around the issues it causes.
  
  Agreed.
  
  Rafael
 
 Rather than most drivers being told to make this step, can this be added 
 to the firmware_class such that firmware objects are cached in RAM and 
 subsequent calls to request_firmware() don't have to query userspace.
 
 This seems the least intrusive solution to this problem.

Agreed again. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Pavel Machek

Hi!

  I guess we should warn the driver authors, then; and decide what driver 
  authors should do.
 
 Drivers really shouldn't do anythign at all.

*)

  If I'm video4linux driver for grabbing screen, have been suspended, and 
  someone asks me to read a frame, should I
  
  a) return -ESORRYIMSUSPENDED
  
  b) just block the caller
 
 The subsystem thing should have stopped the queues, and the device 
 should never even _see_ this.

Okay, _if_ there's a subsystem, subsystem should have stopped the
queues. End result should be that userspace is blocked when trying to
access suspended device/suspended subsystem.

I guess we are in violent agreement.
Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Kay Sievers

On Tue, 2007-05-29 at 13:00 +0100, Michael-Luke Jones wrote:
 Rafael J. Wysocki wrote:
  On Tuesday, 29 May 2007 08:55, Kay Sievers wrote:
  The shiny userspace firmware loading causes problems since it exists,
  every second box has problems with it, in all sorts of situations. If
  people are still sold to the idea of userspace firmware loading, why
  don't we keep the data in the driver, instead of immediately
  discarding it after the first upload? Not to waste a few hundred
  kilobytes? That doesn't sound like a convincing deal, after all the
  years people try to work around the issues it causes.
  
  Agreed.
  
  Rafael
 
 Rather than most drivers being told to make this step, can this be added 
 to the firmware_class such that firmware objects are cached in RAM and 
 subsequent calls to request_firmware() don't have to query userspace.
 
 This seems the least intrusive solution to this problem.

Who is going to keep track of the data hiding in the firmware_class? On
driver unbind, module unload, you want to release the data.

Kay

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Mark Lord


Linus Torvalds wrote:


On Fri, 25 May 2007, Nigel Cunningham wrote:

Does that mean you never ever power off your laptop (assuming you have
one), and the battery never runs out? Surely you must power it off
completely sometimes?


So? The bootup isn't that much worse than a disk suspend/resume, and it's 
reliable.


I very much prefer suspend (to RAM) over hibernate (to DISK).
But once in a while, primarily when travelling, I'll use hibernate.

And the swsusp in the kernel is just plain crappy and slow,
which leads many people (including our beloved chief penguin, it seems)
into thinking that hibernate *has* to be too slow to be useful.

But with Suspend2, it is very quick and usable by comparism.
Try it, you'll like it (at least a little).

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Mark Lord


Nigel Cunningham wrote:


I'm sorry to say it, but dropping process freezing still seems to me
like the better way though. I prefer it because of the reliability
aspect. With the current code, having frozen processes, I can look at
the state of memory, calculate how much I'll need for this or that and
know that I'll have sufficient memory for the atomic copy and for doing
the I/O  (making assumptions about how much memory drivers will
allocate) before I start to do either. If we stop freezing processes,
that predictability will go away. There'll always be a possibility that
some process will get memory hungry and stop me from being able to get
the image on disk, and I'll have to either abort or give up and try
again and again until I can complete writing the image, the battery runs
out or whatever... 


How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
the filesystem VFS callers?   Or is that starting to get messy again?

Cheers

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Linus Torvalds



On Tue, 29 May 2007, Romano Giannetti wrote:

 - The good (?) news. I have made 7 suspend/resume cycle (to ram, I
 haven't tested hibernation) with a 2.6.21.2 with that patch, applied
 manually. The system did suspend and resume nicely even compiling a
 kernel and opening openoffice. Normally (le me stress _normally_) no
 delay was apparent on resume. I do not know how dangerous is this... :-)
 
 - The bad (?) news. One time out of 7 I had the 60 seconds delay.

Interesting. If you can re-create it, please do the sysrq-T thing again, 
to see what's up. (Also, you might do sysrq-p, which gives the current 
process data, which sysrq-T does not).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Nigel Cunningham

Hi.

On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote:
 Nigel Cunningham wrote:
 
  I'm sorry to say it, but dropping process freezing still seems to me
  like the better way though. I prefer it because of the reliability
  aspect. With the current code, having frozen processes, I can look at
  the state of memory, calculate how much I'll need for this or that and
  know that I'll have sufficient memory for the atomic copy and for doing
  the I/O  (making assumptions about how much memory drivers will
  allocate) before I start to do either. If we stop freezing processes,
  that predictability will go away. There'll always be a possibility that
  some process will get memory hungry and stop me from being able to get
  the image on disk, and I'll have to either abort or give up and try
  again and again until I can complete writing the image, the battery runs
  out or whatever... 
 
 How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
 the filesystem VFS callers?   Or is that starting to get messy again?

Yeah. Getting messy again :)

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Linus Torvalds



On Wed, 30 May 2007, Nigel Cunningham wrote:
 
 On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote:
  
  How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
  the filesystem VFS callers?   Or is that starting to get messy again?
 
 Yeah. Getting messy again :)

Indeed. And also misses the point - the point being that we don't actually 
need to freeze anything at all most of the time. There's nothing wrong 
with making memory allocations etc.

And yes, suspend is different from hibernate. I can see how hibernate 
people are worried about people writing to things after doing the 
snapshot, but those concerns don't exist with suspend. With suspend, the 
biggest concern is accessing a device after it has been suspended, but on 
the other hand, also the fact that we end up having driver writers used 
to the system being runnable, so they do things that really do require a 
full-fledged system (and sometimes that means just some delayed action 
using a kernel thread, other times it seems to rely on more complex 
behaviour like firmware loading :^p )

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-29 Thread Nigel Cunningham

Hi.

On Tue, 2007-05-29 at 14:33 -0700, Linus Torvalds wrote:
 
 On Wed, 30 May 2007, Nigel Cunningham wrote:
  
  On Tue, 2007-05-29 at 10:19 -0400, Mark Lord wrote:
   
   How about blocking brk() and mmap(MAP_ANONYMOUS) in addition to
   the filesystem VFS callers?   Or is that starting to get messy again?
  
  Yeah. Getting messy again :)
 
 Indeed. And also misses the point - the point being that we don't actually 
 need to freeze anything at all most of the time. There's nothing wrong 
 with making memory allocations etc.
 
 And yes, suspend is different from hibernate. I can see how hibernate 
 people are worried about people writing to things after doing the 
 snapshot, but those concerns don't exist with suspend. With suspend, the 
 biggest concern is accessing a device after it has been suspended, but on 
 the other hand, also the fact that we end up having driver writers used 
 to the system being runnable, so they do things that really do require a 
 full-fledged system (and sometimes that means just some delayed action 
 using a kernel thread, other times it seems to rely on more complex 
 behaviour like firmware loading :^p )

Yeah, but they can't. Even after the freezing of processes has been
removed from the normal suspend to ram path, we're still going to have
this issue with the suspend to ram after writing a hibernation image
path.

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: [stable] pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Greg KH

On Mon, May 28, 2007 at 09:53:50AM -0700, Linus Torvalds wrote:
> 
> Before we suspend a device, we call the subsystem that that device has 
> been registered with. Ie, we have code like this:
> 
>   if (dev->class && dev->class->suspend)
>   error = dev->class->suspend(dev, state);
> 
> which was very much designed so that individual devices wouldn't have to 
> always know - if the upper layer devices for that class can handle these 
> things, they should.
> 
> Do people actually _do_ this, right now? No. But we do actually have the 
> infrastructure, and I think we have one or two classes that actually do 
> use it (at least the "rfkill_class" has a suspend member, dunno how well 
> this model actually works).
> 
> I think Greg had some patches to make network drivers use this, for 
> example. Network drivers right now all end up doing stuff that really 
> doesn't belong in the driver at all when they suspend, and the 
> infrastructure _should_ just do it for them (ie do all the _network_ 
> related stuff, as opposed to the actual hardware-related stuff).

Yes, I started to work on it, as it is the correct thing to do, but got
sidetracked, sorry :(

> (Examples of things that we probably _should_ do for network devices on a 
> class level:
> 
>   suspend:
>   netif_poll_disable()
>   if (netif_running(dev))
>   dev->stop(dev);
> 
>   resume:
>   if (netif_running(dev))
>   dev->start(dev);
>   netif_poll_enable(dev);
> 
> or something similar).

I'll try to hack something together later this week along this line and
see how it works...

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Nigel Cunningham

Hi.

On Mon, 2007-05-28 at 14:03 +0100, Matthew Garrett wrote:
> On Mon, May 28, 2007 at 02:55:07PM +0200, Pavel Machek wrote:
> 
> > Well, PPC people are aware of this, and they think they can fix the
> > drivers. We probably want to drop the freezer for suspend long-term,
> > so. PPC machines use small subset of all the drivers, so it apparently
> > is not big problem for them.
> 
> I'm fairly certain that PPC uses USB. In any case, it's not limited to 
> PPC - APM has the same issue. Any driver that assumes processes will be 
> frozen during suspend to RAM is broken now, not the future.

The converse is also true, though. Any process that assumes processes
aren't frozen during suspend to RAM is also broken now, and will be
while we allow the possibility of suspending to ram after writing a
hibernation image.

In short, drivers should be designed to work whether processes are
frozen or not.

Regards,

Nigel

signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Linus Torvalds

On Mon, 28 May 2007, Pavel Machek wrote:
> 
> I guess we should warn the driver authors, then; and decide what driver 
> authors should do.

Drivers really shouldn't do anythign at all.

> If I'm video4linux driver for grabbing screen, have been suspended, and 
> someone asks me to read a frame, should I
> 
> a) return -ESORRYIMSUSPENDED
> 
> b) just block the caller

The "subsystem" thing should have stopped the queues, and the device 
should never even _see_ this.

Before we suspend a device, we call the subsystem that that device has 
been registered with. Ie, we have code like this:

if (dev->class && dev->class->suspend)
error = dev->class->suspend(dev, state);

which was very much designed so that individual devices wouldn't have to 
always know - if the upper layer devices for that class can handle these 
things, they should.

Do people actually _do_ this, right now? No. But we do actually have the 
infrastructure, and I think we have one or two classes that actually do 
use it (at least the "rfkill_class" has a suspend member, dunno how well 
this model actually works).

I think Greg had some patches to make network drivers use this, for 
example. Network drivers right now all end up doing stuff that really 
doesn't belong in the driver at all when they suspend, and the 
infrastructure _should_ just do it for them (ie do all the _network_ 
related stuff, as opposed to the actual hardware-related stuff).

(Examples of things that we probably _should_ do for network devices on a 
class level:

suspend:
netif_poll_disable()
if (netif_running(dev))
dev->stop(dev);

resume:
if (netif_running(dev))
dev->start(dev);
netif_poll_enable(dev);

or something similar).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Pavel Machek

Hi!

> > Well, PPC people are aware of this, and they think they can fix the
> > drivers. We probably want to drop the freezer for suspend long-term,
> > so. PPC machines use small subset of all the drivers, so it apparently
> > is not big problem for them.
> 
> I'm fairly certain that PPC uses USB. In any case, it's not limited to 
> PPC - APM has the same issue. Any driver that assumes processes will be 
> frozen during suspend to RAM is broken now, not the future.

Yup, that's a possible view. Fixes welcome.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Matthew Garrett

On Mon, May 28, 2007 at 02:55:07PM +0200, Pavel Machek wrote:

> Well, PPC people are aware of this, and they think they can fix the
> drivers. We probably want to drop the freezer for suspend long-term,
> so. PPC machines use small subset of all the drivers, so it apparently
> is not big problem for them.

I'm fairly certain that PPC uses USB. In any case, it's not limited to 
PPC - APM has the same issue. Any driver that assumes processes will be 
frozen during suspend to RAM is broken now, not the future.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Pavel Machek

Hi!

> > > This /mostly/ works - I've had my test machine cycling through a suspend 
> > > cycle every 10 seconds for the past hour without any difficulties 
> > > providing I unload USB first. If USB is loaded, the suspend occasionally 
> > > fails with one of the devices returning -EBUSY and causing it to be 
> > > aborted. I haven't looked into this in any detail yet, but it's 
> > > presumably sufficiently generic code that it's potentially biting people 
> > > on PPC anyway.
> > 
> > Most probably.
> > 
> > Still, please take what I said in the other thread into consideration: We've
> > been using the freezer for so long that at least some drivers started to 
> > rely
> > on it being used.
> > 
> > Even if there are no such drivers on your system, they can be used by other
> > systems.
> 
> Sure, but if any of these drivers run on PPC then they're broken anyway. 
> The assumption that processes will be frozen during suspend is true in 
> the specific case of ACPI and some of the ARM platforms, but not true on 
> PPC or APM systems. We either need to fix the drivers to stop assuming 
> this or add the process freezer to the other PM systems. Right now, 
> they're buggy.

Well, PPC people are aware of this, and they think they can fix the
drivers. We probably want to drop the freezer for suspend long-term,
so. PPC machines use small subset of all the drivers, so it apparently
is not big problem for them.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Matthew Garrett

On Mon, May 28, 2007 at 10:11:15AM +0200, Rafael J. Wysocki wrote:
> On Monday, 28 May 2007 03:05, Matthew Garrett wrote:
> > This /mostly/ works - I've had my test machine cycling through a suspend 
> > cycle every 10 seconds for the past hour without any difficulties 
> > providing I unload USB first. If USB is loaded, the suspend occasionally 
> > fails with one of the devices returning -EBUSY and causing it to be 
> > aborted. I haven't looked into this in any detail yet, but it's 
> > presumably sufficiently generic code that it's potentially biting people 
> > on PPC anyway.
> 
> Most probably.
> 
> Still, please take what I said in the other thread into consideration: We've
> been using the freezer for so long that at least some drivers started to rely
> on it being used.
> 
> Even if there are no such drivers on your system, they can be used by other
> systems.

Sure, but if any of these drivers run on PPC then they're broken anyway. 
The assumption that processes will be frozen during suspend is true in 
the specific case of ACPI and some of the ARM platforms, but not true on 
PPC or APM systems. We either need to fix the drivers to stop assuming 
this or add the process freezer to the other PM systems. Right now, 
they're buggy.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Pavel Machek

Hi!

> > As far as I can tell the PPC code simply shuts down the devices without 
> > worrying about userspace at all. If this was reliable, what prevents us 
> > from simply disabling the freezer for STR?
> 
> Personally, I think that's the right thing to do. 
> 
> And by "disabling the freezer", I think we should just not call it at all. 
> However, sadly, right now it's called from common code. I'll happily take 
> a tested patch to split that code sequence up, and try to do it in 2.6.23, 
> if somebody has the energy (I'm getting to the point where I may just do 
> it myself, but my lazy nature still hopes for a STR person to step 
> forward).

I guess we should warn the driver authors, then; and decide what
driver authors should do.

If I'm video4linux driver for grabbing screen, have been suspended,
and someone asks me to read a frame, should I

a) return -ESORRYIMSUSPENDED

b) just block the caller

?

a) seems ugly to my eyes (userspace should not know about suspend).
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Rafael J. Wysocki

On Monday, 28 May 2007 03:05, Matthew Garrett wrote:
> On Sun, May 27, 2007 at 07:44:02PM +0100, Matthew Garrett wrote:
> 
> > Anyway. I've tested the following patch on a dual-core x86. No obvious 
> > issues yet, but I'll try to put it through a few hundred cycles.
> 
> This /mostly/ works - I've had my test machine cycling through a suspend 
> cycle every 10 seconds for the past hour without any difficulties 
> providing I unload USB first. If USB is loaded, the suspend occasionally 
> fails with one of the devices returning -EBUSY and causing it to be 
> aborted. I haven't looked into this in any detail yet, but it's 
> presumably sufficiently generic code that it's potentially biting people 
> on PPC anyway.

Most probably.

Still, please take what I said in the other thread into consideration: We've
been using the freezer for so long that at least some drivers started to rely
on it being used.

Even if there are no such drivers on your system, they can be used by other
systems.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Rafael J. Wysocki

On Monday, 28 May 2007 03:05, Matthew Garrett wrote:
 On Sun, May 27, 2007 at 07:44:02PM +0100, Matthew Garrett wrote:
 
  Anyway. I've tested the following patch on a dual-core x86. No obvious 
  issues yet, but I'll try to put it through a few hundred cycles.
 
 This /mostly/ works - I've had my test machine cycling through a suspend 
 cycle every 10 seconds for the past hour without any difficulties 
 providing I unload USB first. If USB is loaded, the suspend occasionally 
 fails with one of the devices returning -EBUSY and causing it to be 
 aborted. I haven't looked into this in any detail yet, but it's 
 presumably sufficiently generic code that it's potentially biting people 
 on PPC anyway.

Most probably.

Still, please take what I said in the other thread into consideration: We've
been using the freezer for so long that at least some drivers started to rely
on it being used.

Even if there are no such drivers on your system, they can be used by other
systems.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Pavel Machek

Hi!

  As far as I can tell the PPC code simply shuts down the devices without 
  worrying about userspace at all. If this was reliable, what prevents us 
  from simply disabling the freezer for STR?
 
 Personally, I think that's the right thing to do. 
 
 And by disabling the freezer, I think we should just not call it at all. 
 However, sadly, right now it's called from common code. I'll happily take 
 a tested patch to split that code sequence up, and try to do it in 2.6.23, 
 if somebody has the energy (I'm getting to the point where I may just do 
 it myself, but my lazy nature still hopes for a STR person to step 
 forward).

I guess we should warn the driver authors, then; and decide what
driver authors should do.

If I'm video4linux driver for grabbing screen, have been suspended,
and someone asks me to read a frame, should I

a) return -ESORRYIMSUSPENDED

b) just block the caller

?

a) seems ugly to my eyes (userspace should not know about suspend).
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Matthew Garrett

On Mon, May 28, 2007 at 10:11:15AM +0200, Rafael J. Wysocki wrote:
 On Monday, 28 May 2007 03:05, Matthew Garrett wrote:
  This /mostly/ works - I've had my test machine cycling through a suspend 
  cycle every 10 seconds for the past hour without any difficulties 
  providing I unload USB first. If USB is loaded, the suspend occasionally 
  fails with one of the devices returning -EBUSY and causing it to be 
  aborted. I haven't looked into this in any detail yet, but it's 
  presumably sufficiently generic code that it's potentially biting people 
  on PPC anyway.
 
 Most probably.
 
 Still, please take what I said in the other thread into consideration: We've
 been using the freezer for so long that at least some drivers started to rely
 on it being used.
 
 Even if there are no such drivers on your system, they can be used by other
 systems.

Sure, but if any of these drivers run on PPC then they're broken anyway. 
The assumption that processes will be frozen during suspend is true in 
the specific case of ACPI and some of the ARM platforms, but not true on 
PPC or APM systems. We either need to fix the drivers to stop assuming 
this or add the process freezer to the other PM systems. Right now, 
they're buggy.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Pavel Machek

Hi!

   This /mostly/ works - I've had my test machine cycling through a suspend 
   cycle every 10 seconds for the past hour without any difficulties 
   providing I unload USB first. If USB is loaded, the suspend occasionally 
   fails with one of the devices returning -EBUSY and causing it to be 
   aborted. I haven't looked into this in any detail yet, but it's 
   presumably sufficiently generic code that it's potentially biting people 
   on PPC anyway.
  
  Most probably.
  
  Still, please take what I said in the other thread into consideration: We've
  been using the freezer for so long that at least some drivers started to 
  rely
  on it being used.
  
  Even if there are no such drivers on your system, they can be used by other
  systems.
 
 Sure, but if any of these drivers run on PPC then they're broken anyway. 
 The assumption that processes will be frozen during suspend is true in 
 the specific case of ACPI and some of the ARM platforms, but not true on 
 PPC or APM systems. We either need to fix the drivers to stop assuming 
 this or add the process freezer to the other PM systems. Right now, 
 they're buggy.

Well, PPC people are aware of this, and they think they can fix the
drivers. We probably want to drop the freezer for suspend long-term,
so. PPC machines use small subset of all the drivers, so it apparently
is not big problem for them.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Matthew Garrett

On Mon, May 28, 2007 at 02:55:07PM +0200, Pavel Machek wrote:

 Well, PPC people are aware of this, and they think they can fix the
 drivers. We probably want to drop the freezer for suspend long-term,
 so. PPC machines use small subset of all the drivers, so it apparently
 is not big problem for them.

I'm fairly certain that PPC uses USB. In any case, it's not limited to 
PPC - APM has the same issue. Any driver that assumes processes will be 
frozen during suspend to RAM is broken now, not the future.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Pavel Machek

Hi!

  Well, PPC people are aware of this, and they think they can fix the
  drivers. We probably want to drop the freezer for suspend long-term,
  so. PPC machines use small subset of all the drivers, so it apparently
  is not big problem for them.
 
 I'm fairly certain that PPC uses USB. In any case, it's not limited to 
 PPC - APM has the same issue. Any driver that assumes processes will be 
 frozen during suspend to RAM is broken now, not the future.

Yup, that's a possible view. Fixes welcome.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Linus Torvalds



On Mon, 28 May 2007, Pavel Machek wrote:
 
 I guess we should warn the driver authors, then; and decide what driver 
 authors should do.

Drivers really shouldn't do anythign at all.

 If I'm video4linux driver for grabbing screen, have been suspended, and 
 someone asks me to read a frame, should I
 
 a) return -ESORRYIMSUSPENDED
 
 b) just block the caller

The subsystem thing should have stopped the queues, and the device 
should never even _see_ this.

Before we suspend a device, we call the subsystem that that device has 
been registered with. Ie, we have code like this:

if (dev-class  dev-class-suspend)
error = dev-class-suspend(dev, state);

which was very much designed so that individual devices wouldn't have to 
always know - if the upper layer devices for that class can handle these 
things, they should.

Do people actually _do_ this, right now? No. But we do actually have the 
infrastructure, and I think we have one or two classes that actually do 
use it (at least the rfkill_class has a suspend member, dunno how well 
this model actually works).

I think Greg had some patches to make network drivers use this, for 
example. Network drivers right now all end up doing stuff that really 
doesn't belong in the driver at all when they suspend, and the 
infrastructure _should_ just do it for them (ie do all the _network_ 
related stuff, as opposed to the actual hardware-related stuff).

(Examples of things that we probably _should_ do for network devices on a 
class level:

suspend:
netif_poll_disable()
if (netif_running(dev))
dev-stop(dev);

resume:
if (netif_running(dev))
dev-start(dev);
netif_poll_enable(dev);

or something similar).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Nigel Cunningham

Hi.

On Mon, 2007-05-28 at 14:03 +0100, Matthew Garrett wrote:
 On Mon, May 28, 2007 at 02:55:07PM +0200, Pavel Machek wrote:
 
  Well, PPC people are aware of this, and they think they can fix the
  drivers. We probably want to drop the freezer for suspend long-term,
  so. PPC machines use small subset of all the drivers, so it apparently
  is not big problem for them.
 
 I'm fairly certain that PPC uses USB. In any case, it's not limited to 
 PPC - APM has the same issue. Any driver that assumes processes will be 
 frozen during suspend to RAM is broken now, not the future.

The converse is also true, though. Any process that assumes processes
aren't frozen during suspend to RAM is also broken now, and will be
while we allow the possibility of suspending to ram after writing a
hibernation image.

In short, drivers should be designed to work whether processes are
frozen or not.

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: [stable] pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-28 Thread Greg KH

On Mon, May 28, 2007 at 09:53:50AM -0700, Linus Torvalds wrote:
 
 Before we suspend a device, we call the subsystem that that device has 
 been registered with. Ie, we have code like this:
 
   if (dev-class  dev-class-suspend)
   error = dev-class-suspend(dev, state);
 
 which was very much designed so that individual devices wouldn't have to 
 always know - if the upper layer devices for that class can handle these 
 things, they should.
 
 Do people actually _do_ this, right now? No. But we do actually have the 
 infrastructure, and I think we have one or two classes that actually do 
 use it (at least the rfkill_class has a suspend member, dunno how well 
 this model actually works).
 
 I think Greg had some patches to make network drivers use this, for 
 example. Network drivers right now all end up doing stuff that really 
 doesn't belong in the driver at all when they suspend, and the 
 infrastructure _should_ just do it for them (ie do all the _network_ 
 related stuff, as opposed to the actual hardware-related stuff).

Yes, I started to work on it, as it is the correct thing to do, but got
sidetracked, sorry :(

 (Examples of things that we probably _should_ do for network devices on a 
 class level:
 
   suspend:
   netif_poll_disable()
   if (netif_running(dev))
   dev-stop(dev);
 
   resume:
   if (netif_running(dev))
   dev-start(dev);
   netif_poll_enable(dev);
 
 or something similar).

I'll try to hack something together later this week along this line and
see how it works...

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Matthew Garrett

On Sun, May 27, 2007 at 07:44:02PM +0100, Matthew Garrett wrote:

> Anyway. I've tested the following patch on a dual-core x86. No obvious 
> issues yet, but I'll try to put it through a few hundred cycles.

This /mostly/ works - I've had my test machine cycling through a suspend 
cycle every 10 seconds for the past hour without any difficulties 
providing I unload USB first. If USB is loaded, the suspend occasionally 
fails with one of the devices returning -EBUSY and causing it to be 
aborted. I haven't looked into this in any detail yet, but it's 
presumably sufficiently generic code that it's potentially biting people 
on PPC anyway.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Rafael J. Wysocki

On Sunday, 27 May 2007 20:44, Matthew Garrett wrote:
> On Sun, May 27, 2007 at 08:32:14PM +0200, Rafael J. Wysocki wrote:
> 
> > In particular, please see this message:
> > 
> > https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html
> 
> Yes, there's also the notifier chain for the hardware. However, very few 
> drivers seem to use that - adb seems to be the only one still in the 
> tree. For everything else, the device tree is used in exactly the same 
> way as on x86. If it's safe on Macs but not on x86, then (as far as I 
> can tell) it looks like it's only by luck.
> 
> Anyway. I've tested the following patch on a dual-core x86. No obvious 
> issues yet, but I'll try to put it through a few hundred cycles.

OK

I'm working on a patch that introduces hibernation/suspend notifiers.  It will
conflict with this one a bit, but OTOH it might be useful here too.

I'll post it in a while in a separate thread.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Matthew Garrett

On Sun, May 27, 2007 at 08:32:14PM +0200, Rafael J. Wysocki wrote:

> In particular, please see this message:
> 
> https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html

Yes, there's also the notifier chain for the hardware. However, very few 
drivers seem to use that - adb seems to be the only one still in the 
tree. For everything else, the device tree is used in exactly the same 
way as on x86. If it's safe on Macs but not on x86, then (as far as I 
can tell) it looks like it's only by luck.

Anyway. I've tested the following patch on a dual-core x86. No obvious 
issues yet, but I'll try to put it through a few hundred cycles.

diff --git a/include/linux/pm.h b/include/linux/pm.h
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 8812985..1db3012 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -19,7 +19,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "power.h"
@@ -66,9 +65,10 @@ static inline void pm_finish(suspend_state_t state)
  * suspend_prepare - Do prep work before entering low-power state.
  * @state: State we're entering.
  *
- * This is common code that is called for each state that we're 
- * entering. Allocate a console, stop all processes, then make sure
- * the platform can enter the requested state.
+ * This is common code that is called for each state that we're
+ * entering. Allocate a console, then make sure the platform can
+ * enter the requested state. This is not called for
+ * suspend-to-disk.
  */
 
 static int suspend_prepare(suspend_state_t state)
@@ -81,11 +81,6 @@ static int suspend_prepare(suspend_state_t state)
 
pm_prepare_console();
 
-   if (freeze_processes()) {
-   error = -EAGAIN;
-   goto Thaw;
-   }
-
if ((free_pages = global_page_state(NR_FREE_PAGES))
< FREE_PAGE_NUMBER) {
pr_debug("PM: free some memory\n");
@@ -93,7 +88,7 @@ static int suspend_prepare(suspend_state_t state)
if (nr_free_pages() < FREE_PAGE_NUMBER) {
error = -ENOMEM;
printk(KERN_ERR "PM: No enough memory\n");
-   goto Thaw;
+   goto Exit;
}
}
 
@@ -118,8 +113,7 @@ static int suspend_prepare(suspend_state_t state)
device_resume();
  Resume_console:
resume_console();
- Thaw:
-   thaw_processes();
+ Exit:
pm_restore_console();
return error;
 }
@@ -160,8 +154,8 @@ int suspend_enter(suspend_state_t state)
  * suspend_finish - Do final work before exiting suspend sequence.
  * @state: State we're coming out of.
  *
- * Call platform code to clean up, restart processes, and free the 
- * console that we've allocated. This is not called for suspend-to-disk.
+ * Call platform code to clean up and free the console that we've
+ * allocated. This is not called for suspend-to-disk.
  */
 
 static void suspend_finish(suspend_state_t state)
@@ -170,7 +164,6 @@ static void suspend_finish(suspend_state_t state)
pm_finish(state);
device_resume();
resume_console();
-   thaw_processes();
pm_restore_console();
 }
 
-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Rafael J. Wysocki

On Sunday, 27 May 2007 18:43, Matthew Garrett wrote:
> On Sun, May 27, 2007 at 09:26:00AM -0700, Linus Torvalds wrote:
> 
> > And by "disabling the freezer", I think we should just not call it at all. 
> > However, sadly, right now it's called from common code. I'll happily take 
> > a tested patch to split that code sequence up, and try to do it in 2.6.23, 
> > if somebody has the energy (I'm getting to the point where I may just do 
> > it myself, but my lazy nature still hopes for a STR person to step 
> > forward).
> 
> I'll take a look at this. It probably makes sense to build on Rafael's 
> work on splitting the codepaths up.

Actaully, removing the freezer from the suspend code path is simple.  You only
need to remove calls to freeze_processes() and thaw_processes() from
kernel/power/main.c .

That said, I don't think that PPC does what you say only.  We've discussed this
a bit on linux-pm, in this thread:

https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012242.html

In particular, please see this message:

https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Matthew Garrett

On Sun, May 27, 2007 at 09:26:00AM -0700, Linus Torvalds wrote:

> And by "disabling the freezer", I think we should just not call it at all. 
> However, sadly, right now it's called from common code. I'll happily take 
> a tested patch to split that code sequence up, and try to do it in 2.6.23, 
> if somebody has the energy (I'm getting to the point where I may just do 
> it myself, but my lazy nature still hopes for a STR person to step 
> forward).

I'll take a look at this. It probably makes sense to build on Rafael's 
work on splitting the codepaths up.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Linus Torvalds

On Sun, 27 May 2007, Matthew Garrett wrote:
> 
> As far as I can tell the PPC code simply shuts down the devices without 
> worrying about userspace at all. If this was reliable, what prevents us 
> from simply disabling the freezer for STR?

Personally, I think that's the right thing to do. 

And by "disabling the freezer", I think we should just not call it at all. 
However, sadly, right now it's called from common code. I'll happily take 
a tested patch to split that code sequence up, and try to do it in 2.6.23, 
if somebody has the energy (I'm getting to the point where I may just do 
it myself, but my lazy nature still hopes for a STR person to step 
forward).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Matthew Garrett

On Thu, May 24, 2007 at 03:53:28PM -0700, Linus Torvalds wrote:

> And I repeat: PowerPC had working and stable suspend five _years_ ago, 
> without any of that freezing crud. We should rip it out.

As far as I can tell the PPC code simply shuts down the devices without 
worrying about userspace at all. If this was reliable, what prevents us 
from simply disabling the freezer for STR?

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Matthew Garrett

On Thu, May 24, 2007 at 03:53:28PM -0700, Linus Torvalds wrote:

 And I repeat: PowerPC had working and stable suspend five _years_ ago, 
 without any of that freezing crud. We should rip it out.

As far as I can tell the PPC code simply shuts down the devices without 
worrying about userspace at all. If this was reliable, what prevents us 
from simply disabling the freezer for STR?

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Linus Torvalds



On Sun, 27 May 2007, Matthew Garrett wrote:
 
 As far as I can tell the PPC code simply shuts down the devices without 
 worrying about userspace at all. If this was reliable, what prevents us 
 from simply disabling the freezer for STR?

Personally, I think that's the right thing to do. 

And by disabling the freezer, I think we should just not call it at all. 
However, sadly, right now it's called from common code. I'll happily take 
a tested patch to split that code sequence up, and try to do it in 2.6.23, 
if somebody has the energy (I'm getting to the point where I may just do 
it myself, but my lazy nature still hopes for a STR person to step 
forward).

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Matthew Garrett

On Sun, May 27, 2007 at 09:26:00AM -0700, Linus Torvalds wrote:

 And by disabling the freezer, I think we should just not call it at all. 
 However, sadly, right now it's called from common code. I'll happily take 
 a tested patch to split that code sequence up, and try to do it in 2.6.23, 
 if somebody has the energy (I'm getting to the point where I may just do 
 it myself, but my lazy nature still hopes for a STR person to step 
 forward).

I'll take a look at this. It probably makes sense to build on Rafael's 
work on splitting the codepaths up.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Rafael J. Wysocki

On Sunday, 27 May 2007 18:43, Matthew Garrett wrote:
 On Sun, May 27, 2007 at 09:26:00AM -0700, Linus Torvalds wrote:
 
  And by disabling the freezer, I think we should just not call it at all. 
  However, sadly, right now it's called from common code. I'll happily take 
  a tested patch to split that code sequence up, and try to do it in 2.6.23, 
  if somebody has the energy (I'm getting to the point where I may just do 
  it myself, but my lazy nature still hopes for a STR person to step 
  forward).
 
 I'll take a look at this. It probably makes sense to build on Rafael's 
 work on splitting the codepaths up.

Actaully, removing the freezer from the suspend code path is simple.  You only
need to remove calls to freeze_processes() and thaw_processes() from
kernel/power/main.c .

That said, I don't think that PPC does what you say only.  We've discussed this
a bit on linux-pm, in this thread:

https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012242.html

In particular, please see this message:

https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Matthew Garrett

On Sun, May 27, 2007 at 08:32:14PM +0200, Rafael J. Wysocki wrote:

 In particular, please see this message:
 
 https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html

Yes, there's also the notifier chain for the hardware. However, very few 
drivers seem to use that - adb seems to be the only one still in the 
tree. For everything else, the device tree is used in exactly the same 
way as on x86. If it's safe on Macs but not on x86, then (as far as I 
can tell) it looks like it's only by luck.

Anyway. I've tested the following patch on a dual-core x86. No obvious 
issues yet, but I'll try to put it through a few hundred cycles.

diff --git a/include/linux/pm.h b/include/linux/pm.h
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 8812985..1db3012 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -19,7 +19,6 @@
 #include linux/console.h
 #include linux/cpu.h
 #include linux/resume-trace.h
-#include linux/freezer.h
 #include linux/vmstat.h
 
 #include power.h
@@ -66,9 +65,10 @@ static inline void pm_finish(suspend_state_t state)
  * suspend_prepare - Do prep work before entering low-power state.
  * @state: State we're entering.
  *
- * This is common code that is called for each state that we're 
- * entering. Allocate a console, stop all processes, then make sure
- * the platform can enter the requested state.
+ * This is common code that is called for each state that we're
+ * entering. Allocate a console, then make sure the platform can
+ * enter the requested state. This is not called for
+ * suspend-to-disk.
  */
 
 static int suspend_prepare(suspend_state_t state)
@@ -81,11 +81,6 @@ static int suspend_prepare(suspend_state_t state)
 
pm_prepare_console();
 
-   if (freeze_processes()) {
-   error = -EAGAIN;
-   goto Thaw;
-   }
-
if ((free_pages = global_page_state(NR_FREE_PAGES))
 FREE_PAGE_NUMBER) {
pr_debug(PM: free some memory\n);
@@ -93,7 +88,7 @@ static int suspend_prepare(suspend_state_t state)
if (nr_free_pages()  FREE_PAGE_NUMBER) {
error = -ENOMEM;
printk(KERN_ERR PM: No enough memory\n);
-   goto Thaw;
+   goto Exit;
}
}
 
@@ -118,8 +113,7 @@ static int suspend_prepare(suspend_state_t state)
device_resume();
  Resume_console:
resume_console();
- Thaw:
-   thaw_processes();
+ Exit:
pm_restore_console();
return error;
 }
@@ -160,8 +154,8 @@ int suspend_enter(suspend_state_t state)
  * suspend_finish - Do final work before exiting suspend sequence.
  * @state: State we're coming out of.
  *
- * Call platform code to clean up, restart processes, and free the 
- * console that we've allocated. This is not called for suspend-to-disk.
+ * Call platform code to clean up and free the console that we've
+ * allocated. This is not called for suspend-to-disk.
  */
 
 static void suspend_finish(suspend_state_t state)
@@ -170,7 +164,6 @@ static void suspend_finish(suspend_state_t state)
pm_finish(state);
device_resume();
resume_console();
-   thaw_processes();
pm_restore_console();
 }
 
-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Rafael J. Wysocki

On Sunday, 27 May 2007 20:44, Matthew Garrett wrote:
 On Sun, May 27, 2007 at 08:32:14PM +0200, Rafael J. Wysocki wrote:
 
  In particular, please see this message:
  
  https://lists.linux-foundation.org/pipermail/linux-pm/2007-May/012301.html
 
 Yes, there's also the notifier chain for the hardware. However, very few 
 drivers seem to use that - adb seems to be the only one still in the 
 tree. For everything else, the device tree is used in exactly the same 
 way as on x86. If it's safe on Macs but not on x86, then (as far as I 
 can tell) it looks like it's only by luck.
 
 Anyway. I've tested the following patch on a dual-core x86. No obvious 
 issues yet, but I'll try to put it through a few hundred cycles.

OK

I'm working on a patch that introduces hibernation/suspend notifiers.  It will
conflict with this one a bit, but OTOH it might be useful here too.

I'll post it in a while in a separate thread.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-27 Thread Matthew Garrett

On Sun, May 27, 2007 at 07:44:02PM +0100, Matthew Garrett wrote:

 Anyway. I've tested the following patch on a dual-core x86. No obvious 
 issues yet, but I'll try to put it through a few hundred cycles.

This /mostly/ works - I've had my test machine cycling through a suspend 
cycle every 10 seconds for the past hour without any difficulties 
providing I unload USB first. If USB is loaded, the suspend occasionally 
fails with one of the devices returning -EBUSY and causing it to be 
aborted. I haven't looked into this in any detail yet, but it's 
presumably sufficiently generic code that it's potentially biting people 
on PPC anyway.

-- 
Matthew Garrett | [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-25 Thread Rafael J. Wysocki

On Friday, 25 May 2007 01:19, Pavel Machek wrote:
> On Thu 2007-05-24 20:16:38, Henrique de Moraes Holschuh wrote:
> > On Fri, 25 May 2007, Pavel Machek wrote:
> > > My proposed solution is "fix pcmcia to load firmware before suspend
> > > even starts"
> > 
> > s/pcmcia/all drivers that load firmware/ if you are going to go that way.
> 
> I'm not "going that way". It always was that way. If driver tries to
> load firmware during suspend, it will deadlock.

Exactly.

And the freezing of user land has _nothing_ to do with that.  The fact is
the user land is not reliable while device drivers are being suspended,
regardless of whether it's frozen at that point or not.

BTW, we are going (or at least I'm going) to untangle the hibernation and
suspend code paths, but I have limited time for that and I just _can't_ do this
any faster.  In the meantime, we have bugs like this one that need to be fixed
_within_ the current limitations, because we just _can't_ remove these
limitations overnight..

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Need suspend-to-ram maintainer Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-25 Thread Pavel Machek

Hi!

> > To answer the question, I guess the answer is that although they're
> > different creatures, they have similarities. This is one of them, which
> > is why I could make the mistake I did. Nothing in the issue being
> > discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus
> > too much on the similarities, but that doesn't mean they're not there.
> 
> I agree that the current bug is not unique to STR. In fact, I think Romano 
> tested both STD and STR, and both had the same bug with the 60s timeout.
> 
> But what irritates me is that STR really shouldn't have _had_ that bug at 
> all. The only reason STR had the same bug as STD was exactly the fact that 
> the two features are too closely inter-twined in the kernel. 

And what do you expect? We have three people working on
hibernation, and suspend-to-ram was created as "oh, if we do this,
this, and this, we get get suspend-to-ram with existing code".

> I agree that disk snapshotting is much harder. If we had a bug just in 
> that part, I wouldn't mind it so much. Getting hard problems wrong isn't 
> something you should be ashamed of. What I mind is that the _easier_ 
> problem got infected by all the bugs from the _harder_ issue. That just 
> makes me really really angry and frustrated.
> 
> Look at it this way: if you designed a CPU, and you made the integer 
> code-path share everything with the floating point side, because "addition 
> is addition", and as a result the latency for the simple arithmetic and 
> logical ops in integer ALU was four cycles, what would you be?

You'd be seriously overstaffed in FPU side, and seriously understaffed
on ALU side.

This is basically what happened here. I tell people to get hibernation
to work _first_ because it is usually easier.

And what does that mean? We need three people to work on
suspend-to-RAM. Heck, we need at least _one_ person to work on
suspend-to-RAM, but he needs to be listed in MAINTAINERS.

With hibernation people trying to maintain suspend in their spare
cycles, how do you expect suspend to work? Similar to hibernation,
that's how it looks today.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-25 Thread Pavel Machek

Hi!

> > 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
> > PEOPLE FOR FIVE YEARS NOW.
> 
> And people aren't listening. Have you thought about _why_?
> 
> The thing is, it should just work. Even without pre-loading.

But it does not work, and as you demonstrated, getting it to work w/o
preloading is awful lot of work. Feel free to send a patch. Unless you
are ready to do that, stop confusing driver authors.

> > Imageine we killed freezer. Also imagine Romano has IDE card his
> > PCMCIA slot. Kaboom, we solved nothing.
> 
> Don't be silly. We solved it. The firmware has to be loadable from 
> somewhere else, since otherwise his IDE card wouldn't have been accessible 
> in the first place! 

Firmware loader is complex userspace process. That's not silly. It is
userland, and I'd hate to explain to its authors detailed rules. 

It could do 'find / -name "pcmcia-card-firmware"' for example. It
could do dbus message to tell gnome-graphical-crap to display window
to say that it is loading firmware. Maybe it also writes to syslog
when syslogd is available.

It is userland process, so it is allowed to do stupid stuff.

[If you do not agree, please try to write up
"Doc*/what-firmware-loader-must-do.txt" -- at that point you should
realize how ugly the solution you are suggesting is.]
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Need suspend-to-ram maintainer Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-25 Thread Pavel Machek

Hi!

  To answer the question, I guess the answer is that although they're
  different creatures, they have similarities. This is one of them, which
  is why I could make the mistake I did. Nothing in the issue being
  discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus
  too much on the similarities, but that doesn't mean they're not there.
 
 I agree that the current bug is not unique to STR. In fact, I think Romano 
 tested both STD and STR, and both had the same bug with the 60s timeout.
 
 But what irritates me is that STR really shouldn't have _had_ that bug at 
 all. The only reason STR had the same bug as STD was exactly the fact that 
 the two features are too closely inter-twined in the kernel. 

And what do you expect? We have three people working on
hibernation, and suspend-to-ram was created as oh, if we do this,
this, and this, we get get suspend-to-ram with existing code.

 I agree that disk snapshotting is much harder. If we had a bug just in 
 that part, I wouldn't mind it so much. Getting hard problems wrong isn't 
 something you should be ashamed of. What I mind is that the _easier_ 
 problem got infected by all the bugs from the _harder_ issue. That just 
 makes me really really angry and frustrated.
 
 Look at it this way: if you designed a CPU, and you made the integer 
 code-path share everything with the floating point side, because addition 
 is addition, and as a result the latency for the simple arithmetic and 
 logical ops in integer ALU was four cycles, what would you be?

You'd be seriously overstaffed in FPU side, and seriously understaffed
on ALU side.

This is basically what happened here. I tell people to get hibernation
to work _first_ because it is usually easier.

And what does that mean? We need three people to work on
suspend-to-RAM. Heck, we need at least _one_ person to work on
suspend-to-RAM, but he needs to be listed in MAINTAINERS.

With hibernation people trying to maintain suspend in their spare
cycles, how do you expect suspend to work? Similar to hibernation,
that's how it looks today.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-25 Thread Pavel Machek

Hi!

  2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
  PEOPLE FOR FIVE YEARS NOW.
 
 And people aren't listening. Have you thought about _why_?
 
 The thing is, it should just work. Even without pre-loading.

But it does not work, and as you demonstrated, getting it to work w/o
preloading is awful lot of work. Feel free to send a patch. Unless you
are ready to do that, stop confusing driver authors.

  Imageine we killed freezer. Also imagine Romano has IDE card his
  PCMCIA slot. Kaboom, we solved nothing.
 
 Don't be silly. We solved it. The firmware has to be loadable from 
 somewhere else, since otherwise his IDE card wouldn't have been accessible 
 in the first place! 

Firmware loader is complex userspace process. That's not silly. It is
userland, and I'd hate to explain to its authors detailed rules. 

It could do 'find / -name pcmcia-card-firmware' for example. It
could do dbus message to tell gnome-graphical-crap to display window
to say that it is loading firmware. Maybe it also writes to syslog
when syslogd is available.

It is userland process, so it is allowed to do stupid stuff.

[If you do not agree, please try to write up
Doc*/what-firmware-loader-must-do.txt -- at that point you should
realize how ugly the solution you are suggesting is.]
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-25 Thread Rafael J. Wysocki

On Friday, 25 May 2007 01:19, Pavel Machek wrote:
 On Thu 2007-05-24 20:16:38, Henrique de Moraes Holschuh wrote:
  On Fri, 25 May 2007, Pavel Machek wrote:
   My proposed solution is fix pcmcia to load firmware before suspend
   even starts
  
  s/pcmcia/all drivers that load firmware/ if you are going to go that way.
 
 I'm not going that way. It always was that way. If driver tries to
 load firmware during suspend, it will deadlock.

Exactly.

And the freezing of user land has _nothing_ to do with that.  The fact is
the user land is not reliable while device drivers are being suspended,
regardless of whether it's frozen at that point or not.

BTW, we are going (or at least I'm going) to untangle the hibernation and
suspend code paths, but I have limited time for that and I just _can't_ do this
any faster.  In the meantime, we have bugs like this one that need to be fixed
_within_ the current limitations, because we just _can't_ remove these
limitations overnight..

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Nigel Cunningham

Hi.

On Thu, 2007-05-24 at 21:49 -0700, Linus Torvalds wrote:
> 
> On Fri, 25 May 2007, Nigel Cunningham wrote:
> > 
> > Does that mean you never ever power off your laptop (assuming you have
> > one), and the battery never runs out? Surely you must power it off
> > completely sometimes?
> 
> So? The bootup isn't that much worse than a disk suspend/resume, and it's 
> reliable.
> 
> And actually, I don't use laptops much. I use mostly desktops, and STR 
> works fine on at least some of them. In contrast, doing some 
> suspend-to-disk thing would just be insane and idiotic. If I have to wait 
> for half a minute and have a slow system even after that because my git 
> trees aren't in the cache, I really might as well just shut them off.
> 
> In contrast, STR means they are quiet and don't waste energy when I don't 
> use them, but they're instantly available when I care. HUGE difference.
> 
> I really think suspend-to-disk is just a total waste of my time.

Ah. That's because you're using [u]swsusp. If you used Suspend2, your
git trees would be in the cache, your system wouldn't be slow and you'd
still be back up in that half a minute or so - probably less time. Give
it a try for a week, and then go back to rebooting. After that, tell me
rebooting is better and I've wasted the last 5 or 6 years improving the
code.

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Linus Torvalds

On Fri, 25 May 2007, Nigel Cunningham wrote:
> 
> Does that mean you never ever power off your laptop (assuming you have
> one), and the battery never runs out? Surely you must power it off
> completely sometimes?

So? The bootup isn't that much worse than a disk suspend/resume, and it's 
reliable.

And actually, I don't use laptops much. I use mostly desktops, and STR 
works fine on at least some of them. In contrast, doing some 
suspend-to-disk thing would just be insane and idiotic. If I have to wait 
for half a minute and have a slow system even after that because my git 
trees aren't in the cache, I really might as well just shut them off.

In contrast, STR means they are quiet and don't waste energy when I don't 
use them, but they're instantly available when I care. HUGE difference.

I really think suspend-to-disk is just a total waste of my time.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Nigel Cunningham

Howdy.

On Thu, 2007-05-24 at 20:31 -0700, Linus Torvalds wrote:
> 
> On Fri, 25 May 2007, Nigel Cunningham wrote:
> > > 
> > > That said, I think freezing is crap even for 
> > > snapshotting/suspend-to-disk, 
> > > but the point of the above rant is to show how insane it is to think that 
> > > problems and complexity in one area should translate into problems and 
> > > complexity in another area.
> > 
> > Does that imply that you'd prefer to see filesystem checkpointing code,
> > that you think freezing can be done better, or do you have some other
> > solution that hasn't occurred to me?
> 
> I actually don't think that processes should be frozen really at all.
> 
> I agree that filesystems have to be frozen (and I think that checkpointing 
> of the filesystem or block device is "too clever"), but I just don't think 
> that has anything to do with freezing processes.
> 
> So I'd actually much prefer to freeze at the VFS (and socket layers, etc), 
> and make sure that anybody who tries to write or do something else that we 
> cannot do until resuming, will just be blocked (or perhaps just buffered)!
> 
> See? I actually think that this process-based thing is barking up the 
> wrong tree. After all, it's really not the case that we need to stop 
> processes, and stopping processes really does have some problems. It's 
> simpler in some ways, but I think a more directed solution would actually 
> be better.

That does sound doable.

I'm sorry to say it, but dropping process freezing still seems to me
like the better way though. I prefer it because of the reliability
aspect. With the current code, having frozen processes, I can look at
the state of memory, calculate how much I'll need for this or that and
know that I'll have sufficient memory for the atomic copy and for doing
the I/O  (making assumptions about how much memory drivers will
allocate) before I start to do either. If we stop freezing processes,
that predictability will go away. There'll always be a possibility that
some process will get memory hungry and stop me from being able to get
the image on disk, and I'll have to either abort or give up and try
again and again until I can complete writing the image, the battery runs
out or whatever... 

> >bviously we _do_ want to actually try to quiesce normal user processes. 
> >But by "normal user", I'd be willing to limit it to non-uid-zero things, 
> >for example. Exactly because it does turn out that the kernel kind of 
> >depends on user-land things for stuff like network filesystem connection 
> >setup etc (ie we tend to do things like the mount encryption stuff in 
> >userland!).

Not sure who you're quoting here, but it's not me. Pavel maybe? I was
unsub'd for a couple of weeks, so guess it's from during that period.

> But I really don't care that deeply per se, exactly because I don't use it 
> myself. I think people are going down the wrong rabbit-hole, but it 
> wouldn't _irritate_ me that much except for the fact that it now also 
> impacts suspend-to-RAM.

Does that mean you never ever power off your laptop (assuming you have
one), and the battery never runs out? Surely you must power it off
completely sometimes? If you do, doesn't that ever happen at a time when
you're part way through something and you'd like to be able to pick up
your merge or whatever later without having to say "Now, where was I up
to?" *shrug* Maybe you're just exceptional :) (Yeah, we know you are in
other ways, but this way?...)

Regards,

Nigel

signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Linus Torvalds

On Fri, 25 May 2007, Nigel Cunningham wrote:
> > 
> > That said, I think freezing is crap even for snapshotting/suspend-to-disk, 
> > but the point of the above rant is to show how insane it is to think that 
> > problems and complexity in one area should translate into problems and 
> > complexity in another area.
> 
> Does that imply that you'd prefer to see filesystem checkpointing code,
> that you think freezing can be done better, or do you have some other
> solution that hasn't occurred to me?

I actually don't think that processes should be frozen really at all.

I agree that filesystems have to be frozen (and I think that checkpointing 
of the filesystem or block device is "too clever"), but I just don't think 
that has anything to do with freezing processes.

So I'd actually much prefer to freeze at the VFS (and socket layers, etc), 
and make sure that anybody who tries to write or do something else that we 
cannot do until resuming, will just be blocked (or perhaps just buffered)!

See? I actually think that this process-based thing is barking up the 
wrong tree. After all, it's really not the case that we need to stop 
processes, and stopping processes really does have some problems. It's 
simpler in some ways, but I think a more directed solution would actually 
be better.

>bviously we _do_ want to actually try to quiesce normal user processes. 
>But by "normal user", I'd be willing to limit it to non-uid-zero things, 
>for example. Exactly because it does turn out that the kernel kind of 
>depends on user-land things for stuff like network filesystem connection 
>setup etc (ie we tend to do things like the mount encryption stuff in 
>userland!).

But I really don't care that deeply per se, exactly because I don't use it 
myself. I think people are going down the wrong rabbit-hole, but it 
wouldn't _irritate_ me that much except for the fact that it now also 
impacts suspend-to-RAM.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Nigel Cunningham

Hi.

On Thu, 2007-05-24 at 19:41 -0700, Linus Torvalds wrote:
> 
> On Fri, 25 May 2007, Nigel Cunningham wrote:
> > 
> > To answer the question, I guess the answer is that although they're
> > different creatures, they have similarities. This is one of them, which
> > is why I could make the mistake I did. Nothing in the issue being
> > discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus
> > too much on the similarities, but that doesn't mean they're not there.
> 
> I agree that the current bug is not unique to STR. In fact, I think Romano 
> tested both STD and STR, and both had the same bug with the 60s timeout.
> 
> But what irritates me is that STR really shouldn't have _had_ that bug at 
> all. The only reason STR had the same bug as STD was exactly the fact that 
> the two features are too closely inter-twined in the kernel. 
> 
> That irritates me hugely. We had a bug we should never had had! We had a 
> bug because people are sharing code that shouldn't be shared! We had a bug 
> because of code that makes no sense in the first place!
> 
> I agree that disk snapshotting is much harder. If we had a bug just in 
> that part, I wouldn't mind it so much. Getting hard problems wrong isn't 
> something you should be ashamed of. What I mind is that the _easier_ 
> problem got infected by all the bugs from the _harder_ issue. That just 
> makes me really really angry and frustrated.
> 
> Look at it this way: if you designed a CPU, and you made the integer 
> code-path share everything with the floating point side, because "addition 
> is addition", and as a result the latency for the simple arithmetic and 
> logical ops in integer ALU was four cycles, what would you be?
> 
> You'd be a moron, that's what. 
> 
> And that is _exactly_ what the current STD/STR code does. It says "suspend 
> is suspend" and tries to share the same pipeline, even though the two 
> operations are totally different, and share nothing but the name people 
> use for it (and even the name is really pretty weak, and I've tried to 
> get people to use some other name for STD).

I think I get what you're trying to say, but I also think you're either
overstating your case ("...totally different and share nothing but the
name...") or underestimating the similiarity - they both need (albeit
for different reasons) to do the cpu hotplugging, driver suspending
(yeah, there are similarities and differences there) and irq disabling.
That's _some_ similarity. Apart from that, yeah - they are totally
different.

Re the name, we discussed changing the name of Suspend2 on IRC a night
or two ago. We came to the conclusion that, for better or for worse,
it's too well known now. I can see your logic in wanting to
differentiate them, but I seem to be a bit stuck :\. Push some more.
Maybe we'll get there anyway :) Maybe you can get rid of that horrible,
unpronounceable 'swsusp' name while you're at it? :)

> So yes,the two things _do_ share the problem, but they really really 
> shouldn't. There's no reason to think that they should. And it drives me 
> absolutely bonkers that people seem to have such a hard time seeing that.
> 
> That said, I think freezing is crap even for snapshotting/suspend-to-disk, 
> but the point of the above rant is to show how insane it is to think that 
> problems and complexity in one area should translate into problems and 
> complexity in another area.

Does that imply that you'd prefer to see filesystem checkpointing code,
that you think freezing can be done better, or do you have some other
solution that hasn't occurred to me?

> And if the snapshot people want to screw up their snapshots with freezing, 
> I don't actually care that much. I'd much rather have working STR. As it 
> is now, they're now _both_ broken.

Fair enough :).

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Linus Torvalds

On Fri, 25 May 2007, Nigel Cunningham wrote:
> 
> To answer the question, I guess the answer is that although they're
> different creatures, they have similarities. This is one of them, which
> is why I could make the mistake I did. Nothing in the issue being
> discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus
> too much on the similarities, but that doesn't mean they're not there.

I agree that the current bug is not unique to STR. In fact, I think Romano 
tested both STD and STR, and both had the same bug with the 60s timeout.

But what irritates me is that STR really shouldn't have _had_ that bug at 
all. The only reason STR had the same bug as STD was exactly the fact that 
the two features are too closely inter-twined in the kernel. 

That irritates me hugely. We had a bug we should never had had! We had a 
bug because people are sharing code that shouldn't be shared! We had a bug 
because of code that makes no sense in the first place!

I agree that disk snapshotting is much harder. If we had a bug just in 
that part, I wouldn't mind it so much. Getting hard problems wrong isn't 
something you should be ashamed of. What I mind is that the _easier_ 
problem got infected by all the bugs from the _harder_ issue. That just 
makes me really really angry and frustrated.

Look at it this way: if you designed a CPU, and you made the integer 
code-path share everything with the floating point side, because "addition 
is addition", and as a result the latency for the simple arithmetic and 
logical ops in integer ALU was four cycles, what would you be?

You'd be a moron, that's what. 

And that is _exactly_ what the current STD/STR code does. It says "suspend 
is suspend" and tries to share the same pipeline, even though the two 
operations are totally different, and share nothing but the name people 
use for it (and even the name is really pretty weak, and I've tried to 
get people to use some other name for STD).

So yes,the two things _do_ share the problem, but they really really 
shouldn't. There's no reason to think that they should. And it drives me 
absolutely bonkers that people seem to have such a hard time seeing that.

That said, I think freezing is crap even for snapshotting/suspend-to-disk, 
but the point of the above rant is to show how insane it is to think that 
problems and complexity in one area should translate into problems and 
complexity in another area.

And if the snapshot people want to screw up their snapshots with freezing, 
I don't actually care that much. I'd much rather have working STR. As it 
is now, they're now _both_ broken.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Nigel Cunningham

Hi Linus.

On Thu, 2007-05-24 at 19:10 -0700, Linus Torvalds wrote:
> 
> On Fri, 25 May 2007, Nigel Cunningham wrote:
> > 
> > First, let me agree with you that for the atomic copy itself, the
> > freezer is unnecessary. Disabling irqs and so on is enough to ensure the
> > atomic copy is atomic. I don't think any of us are arguing with you
> > there.
> 
> First off, realize that the problem actually happens during 
> suspend-to-ram.
> 
> Think about that for a second.
> 
> In fact, think about it for a _loong_ time. Because dammit, people seem to 
> have a really hard time even realizing this.
> 
>   There is no "atomic copy".
> 
>   There is no "checkpointing".
> 
>   There is no "spoon".
> 
> > Hope this helps.
> 
> Hope _the_above_ helps. Why is it so hard for people to accept that 
> suspend-to-ram shouldn't break because of some IDIOTIC issues with disk 
> snapshots?
> 
> And why do you people _always_ keep mixing the two up?

It does. Sorry. I didn't read enough of the context.

To answer the question, I guess the answer is that although they're
different creatures, they have similarities. This is one of them, which
is why I could make the mistake I did. Nothing in the issue being
discussed was unique to suspend-to-ram. Perhaps we (or at least I) focus
too much on the similarities, but that doesn't mean they're not there.

Regards,

Nigel


signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Linus Torvalds

On Fri, 25 May 2007, Nigel Cunningham wrote:
> 
> First, let me agree with you that for the atomic copy itself, the
> freezer is unnecessary. Disabling irqs and so on is enough to ensure the
> atomic copy is atomic. I don't think any of us are arguing with you
> there.

First off, realize that the problem actually happens during 
suspend-to-ram.

Think about that for a second.

In fact, think about it for a _loong_ time. Because dammit, people seem to 
have a really hard time even realizing this.

There is no "atomic copy".

There is no "checkpointing".

There is no "spoon".

> Hope this helps.

Hope _the_above_ helps. Why is it so hard for people to accept that 
suspend-to-ram shouldn't break because of some IDIOTIC issues with disk 
snapshots?

And why do you people _always_ keep mixing the two up?

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Nigel Cunningham

Hi Linus.

On Thu, 2007-05-24 at 17:37 -0700, Linus Torvalds wrote:
> 
> On Fri, 25 May 2007, Pavel Machek wrote:
> > 
> > 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
> > PEOPLE FOR FIVE YEARS NOW.
> 
> And people aren't listening. Have you thought about _why_?
> 
> The thing is, it should just work. Even without pre-loading.
> 
> > Imageine we killed freezer. Also imagine Romano has IDE card his
> > PCMCIA slot. Kaboom, we solved nothing.
> 
> Don't be silly. We solved it. The firmware has to be loadable from 
> somewhere else, since otherwise his IDE card wouldn't have been accessible 
> in the first place! 
> 
> So all your arguments are just bogus crap.

Let me see if I can help. I'll probably fail miserably, but I can only
try :)

First, let me agree with you that for the atomic copy itself, the
freezer is unnecessary. Disabling irqs and so on is enough to ensure the
atomic copy is atomic. I don't think any of us are arguing with you
there.

Where we see the problem is with what happens after the atomic copy is
made. The problem is that the atomic copy includes struct inodes, dnodes
and such like - an in memory representation of the state of mounted
filesystems. Imagine that, post atomic copy, we don't have the freezer.
Processes can then make on-disk changes to these mounted filesystems in
the time before we complete saving the image and powering down. If, at
resume time, we then restore the atomic copy, we have a mismatch between
what the in-memory data structures say and what the on-disk data says.
This leads to corruption.

How to avoid?

Well, there are only two options as far as I can see. We either stop
those changes occurring in the first place, or we make them undoable.

Freezing processes, and/or filesystems would be the first path,
checkpointing the second.

So, as far as I can see, we're stuck with freezing processes at least
until checkpointing is implemented.

I have to admit though, that even if checkpointing was implemented, I'd
like to see freezing processes remain. The image gets written faster if
we don't have to compete for cpu and i/o. It also allows us to do a
fuller image of memory than is otherwise possible (Yes, I know some
people don't care for full images, but others of us have usage patterns
that make the system far more useable if a full image is kept, or simply
prefer to have our machines as if the power had never gone away).
Without processes freezing, I'd have to work a lot harder to find a way
to do that full image. The simplest way would probably be to carry the
freezer code myself. (Yeah, I'm reconciled to the idea of never getting
Suspend2 merged. I'd like it to happen, but won't hold my breath.
Someone needs to break your suspend-to-ram or battery so you see the use
for hibernation :>).

Hope this helps.

Nigel

signature.asc
Description: This is a digitally signed message part

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Linus Torvalds

On Fri, 25 May 2007, Pavel Machek wrote:
> 
> 2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
> PEOPLE FOR FIVE YEARS NOW.

And people aren't listening. Have you thought about _why_?

The thing is, it should just work. Even without pre-loading.

> Imageine we killed freezer. Also imagine Romano has IDE card his
> PCMCIA slot. Kaboom, we solved nothing.

Don't be silly. We solved it. The firmware has to be loadable from 
somewhere else, since otherwise his IDE card wouldn't have been accessible 
in the first place! 

So all your arguments are just bogus crap.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Pavel Machek

On Thu 2007-05-24 20:16:38, Henrique de Moraes Holschuh wrote:
> On Fri, 25 May 2007, Pavel Machek wrote:
> > My proposed solution is "fix pcmcia to load firmware before suspend
> > even starts"
> 
> s/pcmcia/all drivers that load firmware/ if you are going to go that way.

I'm not "going that way". It always was that way. If driver tries to
load firmware during suspend, it will deadlock.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: pcmcia resume 60 second hang. Re: [patch 00/69] -stable review

2007-05-24 Thread Pavel Machek

Hi!

> > > Why the HELL cannot you realize that kernel threads are different?
> > 
> > Ugh? We are talking about request_firmware() here, right? That's
> > calling userland helper to load the firmware...? Looks like USER
> > thread to me.
> 
> Right. And if we had had the nice old /sbin/hotplug thing, it would all 
> have worked fine - because it would just have done an execve(), and things 
> would be happy.
> 
> But people screwed that up too, and now udevd is an undebuggable user 
> thread. Shit happens. See my other email about why even user threads can 
> probably not be frozen, and the whole freezer thing is misdesigned.

I'm not ready to redesign udevd :-(.

Your other mail proves that either

1) we can stop freezing udevd, and pray udevd does not become confused
by "half hardware not available" while system is being suspended

_or_

2) we need to preload firmware during _suspend_. I AM TELLING THAT TO
PEOPLE FOR FIVE YEARS NOW.

> And I repeat: PowerPC had working and stable suspend five _years_ ago, 
> without any of that freezing crud. We should rip it out.

Imageine we killed freezer. Also imagine Romano has IDE card his
PCMCIA slot. Kaboom, we solved nothing. We'll either deadlock or do
something very nasty to the filesystem on the IDE card ... because
we'll have udevd running, but fs on IDE card not available.

That does not work. "Not freezing udevd" only makes problems hard to
trigger, see?

Now... "should we rip freezer out of suspend" is a different story. It
does not help _here_. We still need to load the firmware during
_suspend_.

[Can you ack this point and we can have nice flamewar about ripping
out freezer?]

But I'd actually like to get rid of freezer for suspend (I believe
it is needed for hibernation) -- we'll need to do similar that for
runtime suspending of devices, anyway. But "just rip it out" will
cause some hard to debug breakage, we need to somehow audit the
drivers, or ask driver writers to audit them or something. ... and
yes, ripping freezer out _will_ make drivers more complex. Your video
capture card will now have to deal with

"ouch, I was already told to suspend, and now someone is calling my
ioctls() ?!".

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 134 matches

Mail list logo