Re: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Shaohua Li
On Mon, 2005-08-01 at 09:09 +0200, Pavel Machek wrote:
> Hi!
> 
> > > If you think it is a linux bug, can you produce small test case doing 
> > > just the sigwait, and post it on l-k with big title "sigwait() breaks 
> > > when straced, and on suspend"?
> > > 
> > > That way it is going to get some attetion, and you'll get either 
> > > documentation or kernel fixed. 
> > Looks like a linux bug to me. The refrigerator fake signal waked the
> > task up and without restart for the sigwait case. How about below
> > patch:
> 
> Is there chance to fix strace case, too? sigwait() is broken in more
> than one way it seems...
This patch should fix two cases. Can anybody familiar with signal
handling look at it?
The posix standard said for sigtimedwait:
"If no signal in set is pending at the time of the call, the calling
thread shall be suspended until one or more signals in set become
pending or until it is interrupted by an unblocked, caught signal."
Systemcall might be restarted if it's not interrupted by a caught
signal.

Thanks,
Shaohua



---

 linux-2.6.13-rc4-root/kernel/signal.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
--- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08-02 
10:33:16.798179984 +0800
+++ linux-2.6.13-rc4-root/kernel/signal.c   2005-08-02 12:49:06.688208376 
+0800
@@ -2231,7 +2231,8 @@ sys_rt_sigtimedwait(const sigset_t __use
current->state = TASK_INTERRUPTIBLE;
timeout = schedule_timeout(timeout);
 
-   try_to_freeze();
+   if (freezing(current))
+   return -ERESTARTNOINTR;
spin_lock_irq(>sighand->siglock);
sig = dequeue_signal(current, , );
current->blocked = current->real_blocked;
@@ -2250,7 +2251,7 @@ sys_rt_sigtimedwait(const sigset_t __use
} else {
ret = -EAGAIN;
if (timeout)
-   ret = -EINTR;
+   ret = -ERESTARTNOHAND;
}
 
return ret;
_



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Li, Shaohua
Hi,
>> > If you think it is a linux bug, can you produce small test case
doing
>> > just the sigwait, and post it on l-k with big title "sigwait()
breaks
>> > when straced, and on suspend"?
>> >
>> > That way it is going to get some attetion, and you'll get either
>> > documentation or kernel fixed.
>> Looks like a linux bug to me. The refrigerator fake signal waked the
>> task up and without restart for the sigwait case. How about below
>> patch:
>
>Is there chance to fix strace case, too? sigwait() is broken in more
>than one way it seems...
Not sure about it. strace shows sigwait using sigtimedwait, which
doesn't say it can't return error.

>>  linux-2.6.13-rc4-root/kernel/signal.c |   11 ++-
>>  1 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
>> --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume  2005-08-
>01 14:00:39.089460688 +0800
>> +++ linux-2.6.13-rc4-root/kernel/signal.c2005-08-01
>14:30:13.821660384 +0800
>> @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
>>  struct timespec ts;
>>  siginfo_t info;
>>  long timeout = 0;
>> +int recover = 0;
>>
>>  /* XXX: Don't preclude handling different sized sigset_t's.  */
>>  if (sigsetsize != sizeof(sigset_t))
>> @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
>>   * be awakened when they arrive.  */
>>  current->real_blocked = current->blocked;
>>  sigandsets(>blocked, >blocked,
);
>> +do_recover:
>>  recalc_sigpending();
>>  spin_unlock_irq(>sighand->siglock);
>>
>>  current->state = TASK_INTERRUPTIBLE;
>>  timeout = schedule_timeout(timeout);
>>
>> -try_to_freeze();
>> +if (try_to_freeze())
>> +recover = 1;
>
>Can't you just goto do_recover here?
Not sure again.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Pavel Machek
Hi!

> > If you think it is a linux bug, can you produce small test case doing 
> > just the sigwait, and post it on l-k with big title "sigwait() breaks 
> > when straced, and on suspend"?
> > 
> > That way it is going to get some attetion, and you'll get either 
> > documentation or kernel fixed. 
> Looks like a linux bug to me. The refrigerator fake signal waked the
> task up and without restart for the sigwait case. How about below
> patch:

Is there chance to fix strace case, too? sigwait() is broken in more
than one way it seems...
Pavel


>  linux-2.6.13-rc4-root/kernel/signal.c |   11 ++-
>  1 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
> --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume   2005-08-01 
> 14:00:39.089460688 +0800
> +++ linux-2.6.13-rc4-root/kernel/signal.c 2005-08-01 14:30:13.821660384 
> +0800
> @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
>   struct timespec ts;
>   siginfo_t info;
>   long timeout = 0;
> + int recover = 0;
>  
>   /* XXX: Don't preclude handling different sized sigset_t's.  */
>   if (sigsetsize != sizeof(sigset_t))
> @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
>* be awakened when they arrive.  */
>   current->real_blocked = current->blocked;
>   sigandsets(>blocked, >blocked, 
> );
> +do_recover:
>   recalc_sigpending();
>   spin_unlock_irq(>sighand->siglock);
>  
>   current->state = TASK_INTERRUPTIBLE;
>   timeout = schedule_timeout(timeout);
>  
> - try_to_freeze();
> + if (try_to_freeze())
> + recover = 1;

Can't you just goto do_recover here?

>   spin_lock_irq(>sighand->siglock);
>   sig = dequeue_signal(current, , );
> + if (!sig && recover) {
> + if (timeout == 0)
> + timeout = MAX_SCHEDULE_TIMEOUT;
> + recover = 0;
> + goto do_recover;
> + }
>   current->blocked = current->real_blocked;
>   siginitset(>real_blocked, 0);
>   recalc_sigpending();
> _
> 

-- 
if you have sharp zaurus hardware you don't need... you know my address
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Shaohua Li
On Sat, 2005-07-30 at 18:30 +0800, Pavel Machek wrote:
> Hi!
> 
> > >> One other glitch is that pdnsd (a nameserver caching daemon) has
> crashed 
> > >> when the system wakes up from swsusp.  It also happens when
> waking up 
> > >> from S3, which was working with 2.6.11.4 although not with
> 2.6.13-rc3. 
> > >> Many people have said mysql also does not suspend well.  Is their
> use of 
> > >> a named pipe or socket causing the problem? 
> >  
> > > No idea, strace? 
> >  
> > The upshot of stracing is in tthe Debian BTS  
> > #319572.  Paul Rombouts, an author of pdnsd, reproduced the strace 
> > crash and found the problem: 
> >  
> > > Apparently strace causes sigwait to return EINTR, which is 
> > > inconsistent with the documentation I could find on sigwait. 
> >  
> > Which is true.  The sigwait man entry (Debian 'etch') says: 
> >The !sigwait! function never returns an error. 
> >  
> > His patch (available in the BTS and included below) fixed the
> problem 
> > of strace or S3 sleep crashing pdnsd.
> 
> If you think it is a linux bug, can you produce small test case doing 
> just the sigwait, and post it on l-k with big title "sigwait() breaks 
> when straced, and on suspend"?
> 
> That way it is going to get some attetion, and you'll get either 
> documentation or kernel fixed. 
Looks like a linux bug to me. The refrigerator fake signal waked the
task up and without restart for the sigwait case. How about below patch:


Thanks,
Shaohua
---

 linux-2.6.13-rc4-root/kernel/signal.c |   11 ++-
 1 files changed, 10 insertions(+), 1 deletion(-)

diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
--- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08-01 
14:00:39.089460688 +0800
+++ linux-2.6.13-rc4-root/kernel/signal.c   2005-08-01 14:30:13.821660384 
+0800
@@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
struct timespec ts;
siginfo_t info;
long timeout = 0;
+   int recover = 0;
 
/* XXX: Don't preclude handling different sized sigset_t's.  */
if (sigsetsize != sizeof(sigset_t))
@@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
 * be awakened when they arrive.  */
current->real_blocked = current->blocked;
sigandsets(>blocked, >blocked, 
);
+do_recover:
recalc_sigpending();
spin_unlock_irq(>sighand->siglock);
 
current->state = TASK_INTERRUPTIBLE;
timeout = schedule_timeout(timeout);
 
-   try_to_freeze();
+   if (try_to_freeze())
+   recover = 1;
spin_lock_irq(>sighand->siglock);
sig = dequeue_signal(current, , );
+   if (!sig && recover) {
+   if (timeout == 0)
+   timeout = MAX_SCHEDULE_TIMEOUT;
+   recover = 0;
+   goto do_recover;
+   }
current->blocked = current->real_blocked;
siginitset(>real_blocked, 0);
recalc_sigpending();
_


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Shaohua Li
On Sat, 2005-07-30 at 18:30 +0800, Pavel Machek wrote:
 Hi!
 
   One other glitch is that pdnsd (a nameserver caching daemon) has
 crashed 
   when the system wakes up from swsusp.  It also happens when
 waking up 
   from S3, which was working with 2.6.11.4 although not with
 2.6.13-rc3. 
   Many people have said mysql also does not suspend well.  Is their
 use of 
   a named pipe or socket causing the problem? 
   
   No idea, strace? 
   
  The upshot of stracing is in tthe Debian BTS bugs.debian.org 
  #319572.  Paul Rombouts, an author of pdnsd, reproduced the strace 
  crash and found the problem: 
   
   Apparently strace causes sigwait to return EINTR, which is 
   inconsistent with the documentation I could find on sigwait. 
   
  Which is true.  The sigwait man entry (Debian 'etch') says: 
 The !sigwait! function never returns an error. 
   
  His patch (available in the BTS and included below) fixed the
 problem 
  of strace or S3 sleep crashing pdnsd.
 
 If you think it is a linux bug, can you produce small test case doing 
 just the sigwait, and post it on l-k with big title sigwait() breaks 
 when straced, and on suspend?
 
 That way it is going to get some attetion, and you'll get either 
 documentation or kernel fixed. 
Looks like a linux bug to me. The refrigerator fake signal waked the
task up and without restart for the sigwait case. How about below patch:


Thanks,
Shaohua
---

 linux-2.6.13-rc4-root/kernel/signal.c |   11 ++-
 1 files changed, 10 insertions(+), 1 deletion(-)

diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
--- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08-01 
14:00:39.089460688 +0800
+++ linux-2.6.13-rc4-root/kernel/signal.c   2005-08-01 14:30:13.821660384 
+0800
@@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
struct timespec ts;
siginfo_t info;
long timeout = 0;
+   int recover = 0;
 
/* XXX: Don't preclude handling different sized sigset_t's.  */
if (sigsetsize != sizeof(sigset_t))
@@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
 * be awakened when they arrive.  */
current-real_blocked = current-blocked;
sigandsets(current-blocked, current-blocked, 
these);
+do_recover:
recalc_sigpending();
spin_unlock_irq(current-sighand-siglock);
 
current-state = TASK_INTERRUPTIBLE;
timeout = schedule_timeout(timeout);
 
-   try_to_freeze();
+   if (try_to_freeze())
+   recover = 1;
spin_lock_irq(current-sighand-siglock);
sig = dequeue_signal(current, these, info);
+   if (!sig  recover) {
+   if (timeout == 0)
+   timeout = MAX_SCHEDULE_TIMEOUT;
+   recover = 0;
+   goto do_recover;
+   }
current-blocked = current-real_blocked;
siginitset(current-real_blocked, 0);
recalc_sigpending();
_


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Pavel Machek
Hi!

  If you think it is a linux bug, can you produce small test case doing 
  just the sigwait, and post it on l-k with big title sigwait() breaks 
  when straced, and on suspend?
  
  That way it is going to get some attetion, and you'll get either 
  documentation or kernel fixed. 
 Looks like a linux bug to me. The refrigerator fake signal waked the
 task up and without restart for the sigwait case. How about below
 patch:

Is there chance to fix strace case, too? sigwait() is broken in more
than one way it seems...
Pavel


  linux-2.6.13-rc4-root/kernel/signal.c |   11 ++-
  1 files changed, 10 insertions(+), 1 deletion(-)
 
 diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
 --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume   2005-08-01 
 14:00:39.089460688 +0800
 +++ linux-2.6.13-rc4-root/kernel/signal.c 2005-08-01 14:30:13.821660384 
 +0800
 @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
   struct timespec ts;
   siginfo_t info;
   long timeout = 0;
 + int recover = 0;
  
   /* XXX: Don't preclude handling different sized sigset_t's.  */
   if (sigsetsize != sizeof(sigset_t))
 @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
* be awakened when they arrive.  */
   current-real_blocked = current-blocked;
   sigandsets(current-blocked, current-blocked, 
 these);
 +do_recover:
   recalc_sigpending();
   spin_unlock_irq(current-sighand-siglock);
  
   current-state = TASK_INTERRUPTIBLE;
   timeout = schedule_timeout(timeout);
  
 - try_to_freeze();
 + if (try_to_freeze())
 + recover = 1;

Can't you just goto do_recover here?

   spin_lock_irq(current-sighand-siglock);
   sig = dequeue_signal(current, these, info);
 + if (!sig  recover) {
 + if (timeout == 0)
 + timeout = MAX_SCHEDULE_TIMEOUT;
 + recover = 0;
 + goto do_recover;
 + }
   current-blocked = current-real_blocked;
   siginitset(current-real_blocked, 0);
   recalc_sigpending();
 _
 

-- 
if you have sharp zaurus hardware you don't need... you know my address
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Li, Shaohua
Hi,
  If you think it is a linux bug, can you produce small test case
doing
  just the sigwait, and post it on l-k with big title sigwait()
breaks
  when straced, and on suspend?
 
  That way it is going to get some attetion, and you'll get either
  documentation or kernel fixed.
 Looks like a linux bug to me. The refrigerator fake signal waked the
 task up and without restart for the sigwait case. How about below
 patch:

Is there chance to fix strace case, too? sigwait() is broken in more
than one way it seems...
Not sure about it. strace shows sigwait using sigtimedwait, which
doesn't say it can't return error.

  linux-2.6.13-rc4-root/kernel/signal.c |   11 ++-
  1 files changed, 10 insertions(+), 1 deletion(-)

 diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
 --- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume  2005-08-
01 14:00:39.089460688 +0800
 +++ linux-2.6.13-rc4-root/kernel/signal.c2005-08-01
14:30:13.821660384 +0800
 @@ -2188,6 +2188,7 @@ sys_rt_sigtimedwait(const sigset_t __use
  struct timespec ts;
  siginfo_t info;
  long timeout = 0;
 +int recover = 0;

  /* XXX: Don't preclude handling different sized sigset_t's.  */
  if (sigsetsize != sizeof(sigset_t))
 @@ -2225,15 +2226,23 @@ sys_rt_sigtimedwait(const sigset_t __use
   * be awakened when they arrive.  */
  current-real_blocked = current-blocked;
  sigandsets(current-blocked, current-blocked,
these);
 +do_recover:
  recalc_sigpending();
  spin_unlock_irq(current-sighand-siglock);

  current-state = TASK_INTERRUPTIBLE;
  timeout = schedule_timeout(timeout);

 -try_to_freeze();
 +if (try_to_freeze())
 +recover = 1;

Can't you just goto do_recover here?
Not sure again.

Thanks,
Shaohua
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-08-01 Thread Shaohua Li
On Mon, 2005-08-01 at 09:09 +0200, Pavel Machek wrote:
 Hi!
 
   If you think it is a linux bug, can you produce small test case doing 
   just the sigwait, and post it on l-k with big title sigwait() breaks 
   when straced, and on suspend?
   
   That way it is going to get some attetion, and you'll get either 
   documentation or kernel fixed. 
  Looks like a linux bug to me. The refrigerator fake signal waked the
  task up and without restart for the sigwait case. How about below
  patch:
 
 Is there chance to fix strace case, too? sigwait() is broken in more
 than one way it seems...
This patch should fix two cases. Can anybody familiar with signal
handling look at it?
The posix standard said for sigtimedwait:
If no signal in set is pending at the time of the call, the calling
thread shall be suspended until one or more signals in set become
pending or until it is interrupted by an unblocked, caught signal.
Systemcall might be restarted if it's not interrupted by a caught
signal.

Thanks,
Shaohua



---

 linux-2.6.13-rc4-root/kernel/signal.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff -puN kernel/signal.c~sigwait-suspend-resume kernel/signal.c
--- linux-2.6.13-rc4/kernel/signal.c~sigwait-suspend-resume 2005-08-02 
10:33:16.798179984 +0800
+++ linux-2.6.13-rc4-root/kernel/signal.c   2005-08-02 12:49:06.688208376 
+0800
@@ -2231,7 +2231,8 @@ sys_rt_sigtimedwait(const sigset_t __use
current-state = TASK_INTERRUPTIBLE;
timeout = schedule_timeout(timeout);
 
-   try_to_freeze();
+   if (freezing(current))
+   return -ERESTARTNOINTR;
spin_lock_irq(current-sighand-siglock);
sig = dequeue_signal(current, these, info);
current-blocked = current-real_blocked;
@@ -2250,7 +2251,7 @@ sys_rt_sigtimedwait(const sigset_t __use
} else {
ret = -EAGAIN;
if (timeout)
-   ret = -EINTR;
+   ret = -ERESTARTNOHAND;
}
 
return ret;
_



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ACPI] S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-07-30 Thread Pavel Machek
Hi!

> >> One other glitch is that pdnsd (a nameserver caching daemon) has crashed
> >> when the system wakes up from swsusp.  It also happens when waking up
> >> from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
> >> Many people have said mysql also does not suspend well.  Is their use of
> >> a named pipe or socket causing the problem?
> 
> > No idea, strace?
> 
> The upshot of stracing is in tthe Debian BTS 
> #319572.  Paul Rombouts, an author of pdnsd, reproduced the strace
> crash and found the problem:
> 
> > Apparently strace causes sigwait to return EINTR, which is
> > inconsistent with the documentation I could find on sigwait.
> 
> Which is true.  The sigwait man entry (Debian 'etch') says:
>The !sigwait! function never returns an error.
> 
> His patch (available in the BTS and included below) fixed the problem
> of strace or S3 sleep crashing pdnsd.

If you think it is a linux bug, can you produce small test case doing
just the sigwait, and post it on l-k with big title "sigwait() breaks
when straced, and on suspend"?

That way it is going to get some attetion, and you'll get either
documentation or kernel fixed.
Pavel


-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-07-29 Thread Sanjoy Mahajan
>> One other glitch is that pdnsd (a nameserver caching daemon) has crashed
>> when the system wakes up from swsusp.  It also happens when waking up
>> from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
>> Many people have said mysql also does not suspend well.  Is their use of
>> a named pipe or socket causing the problem?

> No idea, strace?

The upshot of stracing is in tthe Debian BTS 
#319572.  Paul Rombouts, an author of pdnsd, reproduced the strace
crash and found the problem:

> Apparently strace causes sigwait to return EINTR, which is
> inconsistent with the documentation I could find on sigwait.

Which is true.  The sigwait man entry (Debian 'etch') says:
   The !sigwait! function never returns an error.

His patch (available in the BTS and included below) fixed the problem
of strace or S3 sleep crashing pdnsd.

Shouldn't sleeping and suspension be invisible to user-space processes
such as pdnsd?  Drivers and other kernel code need rewriting so that
devices and buses are not abandoned in a weird state, but going to
sleep should just pull the rug out from under the entire user space.
Then no user space process would need rewriting to survive a
sleep/wake, as long as the deep-freeze were cold enough.  Or is there
a subtlety with threads that I'm missing?

With APM, maybe such transparency was more possible since going to bed
was arranged by the firmware rather than by the OS, and the firmware
would pull out the rug from under the entire user and kernel space
(after maybe a bit of kernel prep).

-Sanjoy

--- src/main.c~ 2005-07-08 20:13:14.0 +0200
+++ src/main.c  2005-07-29 16:16:12.0 +0200
@@ -659,11 +659,20 @@
pthread_sigmask(SIG_BLOCK,_msk,NULL);
waiting=1;
 #endif
-   sigwait(_msk,);
-   DEBUG_MSG("Signal %i caught.\n",sig);
+   {
+   int err;
+   while ((err=sigwait(_msk,))) {
+   if(err!=EINTR) {
+   log_warn("sigwait failed: %s",strerror(err));
+   sig=0;
+   break;
+   }
+   }
+   }
+   if(sig) DEBUG_MSG("Signal %i caught.\n",sig);
write_disk_cache();
destroy_cache();
-   log_warn("Caught signal %i. Exiting.",sig);
+   if(sig) log_warn("Caught signal %i. Exiting.",sig);
if (sig==SIGSEGV || sig==SIGILL || sig==SIGBUS)
crash_msg("This is a fatal signal probably triggered by a 
bug.");
if (ping_isocket!=-1)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


S3 and sigwait (was Re: 2.6.13-rc3: swsusp works (TP 600X))

2005-07-29 Thread Sanjoy Mahajan
 One other glitch is that pdnsd (a nameserver caching daemon) has crashed
 when the system wakes up from swsusp.  It also happens when waking up
 from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
 Many people have said mysql also does not suspend well.  Is their use of
 a named pipe or socket causing the problem?

 No idea, strace?

The upshot of stracing is in tthe Debian BTS bugs.debian.org
#319572.  Paul Rombouts, an author of pdnsd, reproduced the strace
crash and found the problem:

 Apparently strace causes sigwait to return EINTR, which is
 inconsistent with the documentation I could find on sigwait.

Which is true.  The sigwait man entry (Debian 'etch') says:
   The !sigwait! function never returns an error.

His patch (available in the BTS and included below) fixed the problem
of strace or S3 sleep crashing pdnsd.

Shouldn't sleeping and suspension be invisible to user-space processes
such as pdnsd?  Drivers and other kernel code need rewriting so that
devices and buses are not abandoned in a weird state, but going to
sleep should just pull the rug out from under the entire user space.
Then no user space process would need rewriting to survive a
sleep/wake, as long as the deep-freeze were cold enough.  Or is there
a subtlety with threads that I'm missing?

With APM, maybe such transparency was more possible since going to bed
was arranged by the firmware rather than by the OS, and the firmware
would pull out the rug from under the entire user and kernel space
(after maybe a bit of kernel prep).

-Sanjoy

--- src/main.c~ 2005-07-08 20:13:14.0 +0200
+++ src/main.c  2005-07-29 16:16:12.0 +0200
@@ -659,11 +659,20 @@
pthread_sigmask(SIG_BLOCK,sigs_msk,NULL);
waiting=1;
 #endif
-   sigwait(sigs_msk,sig);
-   DEBUG_MSG(Signal %i caught.\n,sig);
+   {
+   int err;
+   while ((err=sigwait(sigs_msk,sig))) {
+   if(err!=EINTR) {
+   log_warn(sigwait failed: %s,strerror(err));
+   sig=0;
+   break;
+   }
+   }
+   }
+   if(sig) DEBUG_MSG(Signal %i caught.\n,sig);
write_disk_cache();
destroy_cache();
-   log_warn(Caught signal %i. Exiting.,sig);
+   if(sig) log_warn(Caught signal %i. Exiting.,sig);
if (sig==SIGSEGV || sig==SIGILL || sig==SIGBUS)
crash_msg(This is a fatal signal probably triggered by a 
bug.);
if (ping_isocket!=-1)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Sanjoy Mahajan
> Perhaps the patch from Daniel Ritz to free the yenta IRQ on suspend
> (attached) will help?

Alas, when I went to apply it, patch said it was already there, and
sure enough 2.6.13-rc3-mm2 does have it.

One approach is to find out why PCMCIA cannot remove the socket power
when using cardctl eject (assuming that the error is related to the
swsusp failing).  The error is puzzling because physically ejecting
the card doesn't produce the message.  I'll try to chase that one
down, and welcome any hints on where to look or what debugging to turn
on.  I've looked in drivers/pcmcia/cs.c, which is where the error is
printed, but no enlightenment dawned, and will try setting pcmcia
debugging.

-Sanjoy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Sanjoy Mahajan
> So, in short, problem is that if you leave prism54 card in, even
> with module removed, swsusp hangs, right?

Right, in some circumstances.  To narrow them down I spent many hours
rebooting into combinations of runlevels and loaded modules.  It is
reproducible even in single-user mode.  The various permutations, all
from booting single user with almost no modules loaded (cmdline:
idebus=66 apm=off acpi=force pci=noacpi single)

  card: prism54 card/xircom Ethernet+modem card/no card
  cardctl eject: done before the hibernate/ not done before the hibernate

The one combination that always breaks is to have the prism54 card in,
then do 'cardctl eject', which always produces:

[4295438.17] PCMCIA: socket e233c02c: *** DANGER *** unable to
   remove socket power

If I then use a simple hibernate script (basically just unload
prism54, then echo disk > /sys/power/state), swsusp doesn't write the
pages.  These are the only modules loaded before the swsusp begins:

pcmcia 34276  0 
crc32   3808  1 pcmcia
intel_agp  20188  1 
firmware_class  7936  1 pcmcia
yenta_socket   23244  3 
rsrc_nonstatic 11776  1 yenta_socket
pcmcia_core39508  3 pcmcia,yenta_socket,rsrc_nonstatic
agpgart29800  1 intel_agp

If I don't do 'cardctl eject', or do 'cardctl eject' and 'cardctl
insert', then run the hibernate script (which unloads prism54), it
hibernates fine.

With no card in the slot, all is well.

With the xircom card in the slot, hibernation works fine if I don't do
'cardctl eject' first.  If I do 'cardctl eject' that produces the same
DANGER message as with the prism54 card.  But hibernation still works,
although it seems a bit suspect: As it is hibernating, messages appear
about it enabling eth0.

Here's the lspci for the xircom card:

:06:00.0 Ethernet controller: Xircom Cardbus Ethernet 10/100 (rev 03)
:06:00.1 Serial controller: Xircom Cardbus Ethernet + 56k Modem
   (rev 03) (prog-if 02 [16550])

And lspci -vv for the prism54:

:06:00.0 Network controller: Intersil Corporation Intersil ISL3890
[Prism GT/Prism Duette] (rev 01)
   Subsystem: Intersil Corporation: Unknown device 
   Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop-
   ParErr- Stepping- SERR- FastB2B-
   Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
   >TAbort- SERR- [nosave pfn
0x3b2]<3>[4296242.039000] irq 9: nobody cared (try booting with the
"irqpoll" option)
[4296242.039000]  [] __report_bad_irq+0x2a/0xa0
[4296242.039000]  [] handle_IRQ_event+0x30/0x70
[4296242.039000]  [] note_interrupt+0x80/0xf0
[4296242.039000]  [] __do_IRQ+0x134/0x140
[4296242.039000]  [] do_IRQ+0x23/0x40
[4296242.039000]  [] common_interrupt+0x1a/0x20
[4296242.039000]  [] __do_softirq+0x43/0xb0
[4296242.039000]  [] do_softirq+0x2d/0x30
[4296242.039000]  [] irq_exit+0x37/0x40
[4296242.039000]  [] do_IRQ+0x28/0x40
[4296242.039000]  [] common_interrupt+0x1a/0x20
[4296242.039000]  [] swsusp_suspend+0x50/0xc0
[4296242.039000]  [] pm_suspend_disk+0x61/0xd0
[4296242.039000]  [] enter_state+0xa6/0xb0
[4296242.039000]  [] state_store+0x92/0x9e
[4296242.039000]  [] subsys_attr_store+0x3d/0x50
[4296242.039000]  [] flush_write_buffer+0x3e/0x50
[4296242.039000]  [] sysfs_write_file+0x54/0x80
[4296242.039000]  [] vfs_write+0xb6/0x180
[4296242.039000]  [] sys_write+0x51/0x80
[4296242.039000]  [] syscall_call+0x7/0xb
[4296242.039000] handlers:
[4296242.039000] [] (acpi_irq+0x0/0x16)
[4296242.039000] Disabling IRQ #9

The lspci -vv has this, so somebody should care about irq 9!

:00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 03)
 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV-
 VGASnoop- ParErr- Stepping- SERR- FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
 >TAbort- SERR-  Perhaps the patch from Daniel Ritz to free the yenta IRQ on suspend
> (attached) will help?

I will try that next.

-Sanjoy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Rafael J. Wysocki
On Thursday, 28 of July 2005 23:36, Pavel Machek wrote:
> Hi!
> 
> > >>If I don't eject the pcmcia card (usually a prism54 wireless card),
> > >>swsusp begins the process of hibernation, but never gets to the
> > >>writing pages part.
> > 
> > > Well, it really may be the firmware loading. Add some printks to
> > > confirm it, then fix it.
> > 
> > I did more tests, this time with 2.6.13-rc3-mm2 (machine is a TP 600X),
> > and I don't think the problem is related to firmware loading.  If I
> > first physically eject the card (an Intersil wireless card), swsusp
> > prints
> > 
> ..
> > 
> > then it writes pages to swap and all is well.  Well, almost 100%; the
> > one glitch is that sometimes X comes back blank and I have to
> > ctrl-alt-F7 to bring back the display; or X comes back with the keyboard
> > acting strange ( shifts the display left by a few hundred
> > pixels), and again ctrl-alt-F7 fixes it.  This is with XFree86
> > 4.3.0.dfsg.1-14, and maybe after I upgrade (?) to the xorg server, that
> > glitch will go away.  Anyway, it's easy to work around.
> 
> So, in short, problem is that if you leave prism54 card in, even with
> module removed, swsusp hangs, right?
> 
> Okay then, start looking into pcmcia layer ;-).

Perhaps the patch from Daniel Ritz to free the yenta IRQ on suspend (attached)
will help?

Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll "Alice's Adventures in Wonderland"
--- linux-2.6.13-rc3-git5/drivers/pcmcia/yenta_socket.c	2005-07-23 19:26:30.0 +0200
+++ patched/drivers/pcmcia/yenta_socket.c	2005-07-24 11:44:04.0 +0200
@@ -1107,6 +1107,8 @@ static int yenta_dev_suspend (struct pci
 		pci_read_config_dword(dev, 17*4, >saved_state[1]);
 		pci_disable_device(dev);
 
+		free_irq(dev->irq, socket);
+
 		/*
 		 * Some laptops (IBM T22) do not like us putting the Cardbus
 		 * bridge into D3.  At a guess, some other laptop will
@@ -1132,6 +1134,13 @@ static int yenta_dev_resume (struct pci_
 		pci_enable_device(dev);
 		pci_set_master(dev);
 
+		if (socket->cb_irq)
+			if (request_irq(socket->cb_irq, yenta_interrupt,
+			SA_SHIRQ, "yenta", socket)) {
+printk(KERN_WARNING "Yenta: request_irq() failed on resume!\n");
+socket->cb_irq = 0;
+			}
+
 		if (socket->type && socket->type->restore_state)
 			socket->type->restore_state(socket);
 	}


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Pavel Machek
Hi!

> >>If I don't eject the pcmcia card (usually a prism54 wireless card),
> >>swsusp begins the process of hibernation, but never gets to the
> >>writing pages part.
> 
> > Well, it really may be the firmware loading. Add some printks to
> > confirm it, then fix it.
> 
> I did more tests, this time with 2.6.13-rc3-mm2 (machine is a TP 600X),
> and I don't think the problem is related to firmware loading.  If I
> first physically eject the card (an Intersil wireless card), swsusp
> prints
> 
...
> 
> then it writes pages to swap and all is well.  Well, almost 100%; the
> one glitch is that sometimes X comes back blank and I have to
> ctrl-alt-F7 to bring back the display; or X comes back with the keyboard
> acting strange ( shifts the display left by a few hundred
> pixels), and again ctrl-alt-F7 fixes it.  This is with XFree86
> 4.3.0.dfsg.1-14, and maybe after I upgrade (?) to the xorg server, that
> glitch will go away.  Anyway, it's easy to work around.

So, in short, problem is that if you leave prism54 card in, even with
module removed, swsusp hangs, right?

Okay then, start looking into pcmcia layer ;-).
Pavel

-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Sanjoy Mahajan
>>If I don't eject the pcmcia card (usually a prism54 wireless card),
>>swsusp begins the process of hibernation, but never gets to the
>>writing pages part.

> Well, it really may be the firmware loading. Add some printks to
> confirm it, then fix it.

I did more tests, this time with 2.6.13-rc3-mm2 (machine is a TP 600X),
and I don't think the problem is related to firmware loading.  If I
first physically eject the card (an Intersil wireless card), swsusp
prints

PM: Writing image
PCI: Found IRQ 11 for device .
PCI: Sharing IRQ 11 with ...
PCI: Sharing IRQ 11 with ...
PCI: Found IRQ 11 for device .

then it writes pages to swap and all is well.  Well, almost 100%; the
one glitch is that sometimes X comes back blank and I have to
ctrl-alt-F7 to bring back the display; or X comes back with the keyboard
acting strange ( shifts the display left by a few hundred
pixels), and again ctrl-alt-F7 fixes it.  This is with XFree86
4.3.0.dfsg.1-14, and maybe after I upgrade (?) to the xorg server, that
glitch will go away.  Anyway, it's easy to work around.

But, if I leave the card in and prepare the hibernation with

ifdown eth0
cardctl eject
modprobe -r prism54

(so eject the module for and stop all uses of the card), then swsusp
prints the PCI messages above, but hangs before writing the pages to
swap.  I'm using a hibernate.sh script (included below) for those steps.
It does a few others like stopping the hotplug system.

After 'cardctl eject' and removing the module, there's no evidence of
the hardware available to the kernel, as far as I can tell.  lspci
doesn't show it, for example.  So the system is not loading firmware
during the hibernate attempt, and I'm not sure what step is hanging.

[Should this report go to acpi-devel and/or the ACPI or kernel bugzilla,
or is that more for S1/S3 rather than for hibernation?]

Here is lspci with the card inserted:

  :00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge 
(rev 03)
  :00:01.0 PCI bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge 
(rev 03)
  :00:02.0 CardBus bridge: Texas Instruments PCI1450 (rev 03)
  :00:02.1 CardBus bridge: Texas Instruments PCI1450 (rev 03)
  :00:03.0 Communication controller: Agere Systems (former Lucent 
Microelectronics) WinModem 56k (rev 01)
  :00:06.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 
[CrystalClear SoundFusion Audio Accelerator] (rev 01)
  :00:07.0 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02)
  :00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01)
  :00:07.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01)
  :00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 03)
  :01:00.0 VGA compatible controller: Neomagic Corporation NM2360 
[MagicMedia 256ZX]
  :06:00.0 Network controller: Intersil Corporation Intersil ISL3890 [Prism 
GT/Prism Duette] (rev 01)

Here's the lspci -v for just the card:

  :06:00.0 Network controller: Intersil Corporation Intersil ISL3890 [Prism 
GT/Prism Duette] (rev 01)
  Subsystem: Intersil Corporation: Unknown device 
  Flags: bus master, medium devsel, latency 80, IRQ 11
  Memory at 2480 (32-bit, non-prefetchable) [size=8K]
  Capabilities: [dc] Power Management version 1

And lspci -vv for just the card:

  :06:00.0 Network controller: Intersil Corporation Intersil ISL3890 [Prism 
GT/Prism Duette] (rev 01)
  Subsystem: Intersil Corporation: Unknown device 
  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR-  /sys/power/state
hwclock --hctosys
logger -t hibernate.sh Returning from hibernation
for s in $to_start ; do
  /etc/init.d/$s start
done



-Sanjoy

`A society of sheep must in time beget a government of wolves.'
   - Bertrand de Jouvenal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Sanjoy Mahajan
If I don't eject the pcmcia card (usually a prism54 wireless card),
swsusp begins the process of hibernation, but never gets to the
writing pages part.

 Well, it really may be the firmware loading. Add some printks to
 confirm it, then fix it.

I did more tests, this time with 2.6.13-rc3-mm2 (machine is a TP 600X),
and I don't think the problem is related to firmware loading.  If I
first physically eject the card (an Intersil wireless card), swsusp
prints

PM: Writing image
PCI: Found IRQ 11 for device .
PCI: Sharing IRQ 11 with ...
PCI: Sharing IRQ 11 with ...
PCI: Found IRQ 11 for device .

then it writes pages to swap and all is well.  Well, almost 100%; the
one glitch is that sometimes X comes back blank and I have to
ctrl-alt-F7 to bring back the display; or X comes back with the keyboard
acting strange (ENTER shifts the display left by a few hundred
pixels), and again ctrl-alt-F7 fixes it.  This is with XFree86
4.3.0.dfsg.1-14, and maybe after I upgrade (?) to the xorg server, that
glitch will go away.  Anyway, it's easy to work around.

But, if I leave the card in and prepare the hibernation with

ifdown eth0
cardctl eject
modprobe -r prism54

(so eject the module for and stop all uses of the card), then swsusp
prints the PCI messages above, but hangs before writing the pages to
swap.  I'm using a hibernate.sh script (included below) for those steps.
It does a few others like stopping the hotplug system.

After 'cardctl eject' and removing the module, there's no evidence of
the hardware available to the kernel, as far as I can tell.  lspci
doesn't show it, for example.  So the system is not loading firmware
during the hibernate attempt, and I'm not sure what step is hanging.

[Should this report go to acpi-devel and/or the ACPI or kernel bugzilla,
or is that more for S1/S3 rather than for hibernation?]

Here is lspci with the card inserted:

  :00:00.0 Host bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX Host bridge 
(rev 03)
  :00:01.0 PCI bridge: Intel Corp. 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge 
(rev 03)
  :00:02.0 CardBus bridge: Texas Instruments PCI1450 (rev 03)
  :00:02.1 CardBus bridge: Texas Instruments PCI1450 (rev 03)
  :00:03.0 Communication controller: Agere Systems (former Lucent 
Microelectronics) WinModem 56k (rev 01)
  :00:06.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 
[CrystalClear SoundFusion Audio Accelerator] (rev 01)
  :00:07.0 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ISA (rev 02)
  :00:07.1 IDE interface: Intel Corp. 82371AB/EB/MB PIIX4 IDE (rev 01)
  :00:07.2 USB Controller: Intel Corp. 82371AB/EB/MB PIIX4 USB (rev 01)
  :00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 03)
  :01:00.0 VGA compatible controller: Neomagic Corporation NM2360 
[MagicMedia 256ZX]
  :06:00.0 Network controller: Intersil Corporation Intersil ISL3890 [Prism 
GT/Prism Duette] (rev 01)

Here's the lspci -v for just the card:

  :06:00.0 Network controller: Intersil Corporation Intersil ISL3890 [Prism 
GT/Prism Duette] (rev 01)
  Subsystem: Intersil Corporation: Unknown device 
  Flags: bus master, medium devsel, latency 80, IRQ 11
  Memory at 2480 (32-bit, non-prefetchable) [size=8K]
  Capabilities: [dc] Power Management version 1

And lspci -vv for just the card:

  :06:00.0 Network controller: Intersil Corporation Intersil ISL3890 [Prism 
GT/Prism Duette] (rev 01)
  Subsystem: Intersil Corporation: Unknown device 
  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
  Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR-
  Latency: 80 (2500ns min, 7000ns max), Cache Line Size: 0x08 (32 bytes)
  Interrupt: pin A routed to IRQ 11
  Region 0: Memory at 2480 (32-bit, non-prefetchable) [size=8K]
  Capabilities: [dc] Power Management version 1
  Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0+,D1+,D2+,D3hot+,D3cold+)
  Status: D0 PME-Enable- DSel=0 DScale=0 PME-


Here is the list of modules present just before the (working and
non-working) hibernation (for debugging, the hibernate.sh script does
'lsmod |logger' before hibernating):

  ip_conntrack_ftp
  snd_mixer_oss
  ipv6
  pcmcia
  crc32
  parport_pc
  lp
  parport
  thermal
  fan
  button
  processor
  ac
  battery
  ipt_state
  ipt_LOG
  iptable_filter
  iptable_nat
  ip_conntrack
  ip_tables
  8250
  serial_core
  intel_agp
  firmware_class
  snd_cs46xx
  snd_rawmidi
  snd_seq_device
  snd_ac97_codec
  snd_pcm
  snd_timer
  snd
  soundcore
  snd_page_alloc
  yenta_socket
  rsrc_nonstatic
  pcmcia_core
  agpgart
  speedstep_lib

I don't think any of those modules intrinsically are a problem, since
swsusp works with all of them (as long as the Intersil card is not
inserted).  And here is the hibernate.sh script:


#!/bin/bash

# suspend to disk 

Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Pavel Machek
Hi!

 If I don't eject the pcmcia card (usually a prism54 wireless card),
 swsusp begins the process of hibernation, but never gets to the
 writing pages part.
 
  Well, it really may be the firmware loading. Add some printks to
  confirm it, then fix it.
 
 I did more tests, this time with 2.6.13-rc3-mm2 (machine is a TP 600X),
 and I don't think the problem is related to firmware loading.  If I
 first physically eject the card (an Intersil wireless card), swsusp
 prints
 
...
 
 then it writes pages to swap and all is well.  Well, almost 100%; the
 one glitch is that sometimes X comes back blank and I have to
 ctrl-alt-F7 to bring back the display; or X comes back with the keyboard
 acting strange (ENTER shifts the display left by a few hundred
 pixels), and again ctrl-alt-F7 fixes it.  This is with XFree86
 4.3.0.dfsg.1-14, and maybe after I upgrade (?) to the xorg server, that
 glitch will go away.  Anyway, it's easy to work around.

So, in short, problem is that if you leave prism54 card in, even with
module removed, swsusp hangs, right?

Okay then, start looking into pcmcia layer ;-).
Pavel

-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Rafael J. Wysocki
On Thursday, 28 of July 2005 23:36, Pavel Machek wrote:
 Hi!
 
  If I don't eject the pcmcia card (usually a prism54 wireless card),
  swsusp begins the process of hibernation, but never gets to the
  writing pages part.
  
   Well, it really may be the firmware loading. Add some printks to
   confirm it, then fix it.
  
  I did more tests, this time with 2.6.13-rc3-mm2 (machine is a TP 600X),
  and I don't think the problem is related to firmware loading.  If I
  first physically eject the card (an Intersil wireless card), swsusp
  prints
  
 ..
  
  then it writes pages to swap and all is well.  Well, almost 100%; the
  one glitch is that sometimes X comes back blank and I have to
  ctrl-alt-F7 to bring back the display; or X comes back with the keyboard
  acting strange (ENTER shifts the display left by a few hundred
  pixels), and again ctrl-alt-F7 fixes it.  This is with XFree86
  4.3.0.dfsg.1-14, and maybe after I upgrade (?) to the xorg server, that
  glitch will go away.  Anyway, it's easy to work around.
 
 So, in short, problem is that if you leave prism54 card in, even with
 module removed, swsusp hangs, right?
 
 Okay then, start looking into pcmcia layer ;-).

Perhaps the patch from Daniel Ritz to free the yenta IRQ on suspend (attached)
will help?

Rafael


-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
-- Lewis Carroll Alice's Adventures in Wonderland
--- linux-2.6.13-rc3-git5/drivers/pcmcia/yenta_socket.c	2005-07-23 19:26:30.0 +0200
+++ patched/drivers/pcmcia/yenta_socket.c	2005-07-24 11:44:04.0 +0200
@@ -1107,6 +1107,8 @@ static int yenta_dev_suspend (struct pci
 		pci_read_config_dword(dev, 17*4, socket-saved_state[1]);
 		pci_disable_device(dev);
 
+		free_irq(dev-irq, socket);
+
 		/*
 		 * Some laptops (IBM T22) do not like us putting the Cardbus
 		 * bridge into D3.  At a guess, some other laptop will
@@ -1132,6 +1134,13 @@ static int yenta_dev_resume (struct pci_
 		pci_enable_device(dev);
 		pci_set_master(dev);
 
+		if (socket-cb_irq)
+			if (request_irq(socket-cb_irq, yenta_interrupt,
+			SA_SHIRQ, yenta, socket)) {
+printk(KERN_WARNING Yenta: request_irq() failed on resume!\n);
+socket-cb_irq = 0;
+			}
+
 		if (socket-type  socket-type-restore_state)
 			socket-type-restore_state(socket);
 	}


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Sanjoy Mahajan
 So, in short, problem is that if you leave prism54 card in, even
 with module removed, swsusp hangs, right?

Right, in some circumstances.  To narrow them down I spent many hours
rebooting into combinations of runlevels and loaded modules.  It is
reproducible even in single-user mode.  The various permutations, all
from booting single user with almost no modules loaded (cmdline:
idebus=66 apm=off acpi=force pci=noacpi single)

  card: prism54 card/xircom Ethernet+modem card/no card
  cardctl eject: done before the hibernate/ not done before the hibernate

The one combination that always breaks is to have the prism54 card in,
then do 'cardctl eject', which always produces:

[4295438.17] PCMCIA: socket e233c02c: *** DANGER *** unable to
   remove socket power

If I then use a simple hibernate script (basically just unload
prism54, then echo disk  /sys/power/state), swsusp doesn't write the
pages.  These are the only modules loaded before the swsusp begins:

pcmcia 34276  0 
crc32   3808  1 pcmcia
intel_agp  20188  1 
firmware_class  7936  1 pcmcia
yenta_socket   23244  3 
rsrc_nonstatic 11776  1 yenta_socket
pcmcia_core39508  3 pcmcia,yenta_socket,rsrc_nonstatic
agpgart29800  1 intel_agp

If I don't do 'cardctl eject', or do 'cardctl eject' and 'cardctl
insert', then run the hibernate script (which unloads prism54), it
hibernates fine.

With no card in the slot, all is well.

With the xircom card in the slot, hibernation works fine if I don't do
'cardctl eject' first.  If I do 'cardctl eject' that produces the same
DANGER message as with the prism54 card.  But hibernation still works,
although it seems a bit suspect: As it is hibernating, messages appear
about it enabling eth0.

Here's the lspci for the xircom card:

:06:00.0 Ethernet controller: Xircom Cardbus Ethernet 10/100 (rev 03)
:06:00.1 Serial controller: Xircom Cardbus Ethernet + 56k Modem
   (rev 03) (prog-if 02 [16550])

And lspci -vv for the prism54:

:06:00.0 Network controller: Intersil Corporation Intersil ISL3890
[Prism GT/Prism Duette] (rev 01)
   Subsystem: Intersil Corporation: Unknown device 
   Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop-
   ParErr- Stepping- SERR- FastB2B-
   Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
   TAbort- TAbort- MAbort- SERR- PERR-
   Interrupt: pin A routed to IRQ 11
   Region 0: Memory at 2480 (32-bit, non-prefetchable)
   [size=8K]
   Capabilities: [dc] Power Management version 1
 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
 PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

I also noticed the familiar 'irq 9: nobody cared' messages in the
dmesg log, which may be related to the problems above:

[4296190.977000] suspend: (pages needed: 5792 + 512 free: 141637)
[4296190.977000] alloc_pagedir(): nr_pages = 5792
[4296190.977000] create_pbe_list(): initialized 5792 PBEs
[4296190.977000] copy_data_pages(): pages to copy: 5792
[4296190.977000] [nosave pfn 0x3b1]7[nosave pfn
0x3b2]3[4296242.039000] irq 9: nobody cared (try booting with the
irqpoll option)
[4296242.039000]  [c013e22a] __report_bad_irq+0x2a/0xa0
[4296242.039000]  [c013da10] handle_IRQ_event+0x30/0x70
[4296242.039000]  [c013e340] note_interrupt+0x80/0xf0
[4296242.039000]  [c013db84] __do_IRQ+0x134/0x140
[4296242.039000]  [c0104c83] do_IRQ+0x23/0x40
[4296242.039000]  [c01033e2] common_interrupt+0x1a/0x20
[4296242.039000]  [c01208e3] __do_softirq+0x43/0xb0
[4296242.039000]  [c012097d] do_softirq+0x2d/0x30
[4296242.039000]  [c0120a57] irq_exit+0x37/0x40
[4296242.039000]  [c0104c88] do_IRQ+0x28/0x40
[4296242.039000]  [c01033e2] common_interrupt+0x1a/0x20
[4296242.039000]  [c013ba00] swsusp_suspend+0x50/0xc0
[4296242.039000]  [c013c7a1] pm_suspend_disk+0x61/0xd0
[4296242.039000]  [c013a226] enter_state+0xa6/0xb0
[4296242.039000]  [c013a362] state_store+0x92/0x9e
[4296242.039000]  [c0199fdd] subsys_attr_store+0x3d/0x50
[4296242.039000]  [c019a28e] flush_write_buffer+0x3e/0x50
[4296242.039000]  [c019a2f4] sysfs_write_file+0x54/0x80
[4296242.039000]  [c0160026] vfs_write+0xb6/0x180
[4296242.039000]  [c01601c1] sys_write+0x51/0x80
[4296242.039000]  [c0103225] syscall_call+0x7/0xb
[4296242.039000] handlers:
[4296242.039000] [c01f0789] (acpi_irq+0x0/0x16)
[4296242.039000] Disabling IRQ #9

The lspci -vv has this, so somebody should care about irq 9!

:00:07.3 Bridge: Intel Corp. 82371AB/EB/MB PIIX4 ACPI (rev 03)
 Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV-
 VGASnoop- ParErr- Stepping- SERR- FastB2B-
 Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
 TAbort- TAbort- MAbort- SERR- PERR-
 Interrupt: pin ? routed to IRQ 9

 Perhaps the patch from Daniel Ritz to free the yenta IRQ on suspend
 (attached) will help?

I will 

Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-28 Thread Sanjoy Mahajan
 Perhaps the patch from Daniel Ritz to free the yenta IRQ on suspend
 (attached) will help?

Alas, when I went to apply it, patch said it was already there, and
sure enough 2.6.13-rc3-mm2 does have it.

One approach is to find out why PCMCIA cannot remove the socket power
when using cardctl eject (assuming that the error is related to the
swsusp failing).  The error is puzzling because physically ejecting
the card doesn't produce the message.  I'll try to chase that one
down, and welcome any hints on where to look or what debugging to turn
on.  I've looked in drivers/pcmcia/cs.c, which is where the error is
printed, but no enlightenment dawned, and will try setting pcmcia
debugging.

-Sanjoy
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-22 Thread Pavel Machek
Hi!

> swsusp now mostly works on my TP 600X.  If I don't eject the pcmcia card
> (usually a prism54 wireless card), swsusp begins the process of
> hibernation, but never gets to the writing pages part.  The eth0 somehow
> tries to reload the firmware (as if it's been woken up), and then
> everything hangs.  If I eject the card and (for safety) stop
> /etc/init.d/pcmcia, then swsusp writes out the memory to swap, and
> waking up works fine.  Thanks for all the improvements!
> 
> Is there debugging I can do in order to help get the pcmcia system
> hibernating automagically?

Well, it really may be the firmware loading. Add some printks to
confirm it, then fix it.

> One other glitch is that pdnsd (a nameserver caching daemon) has crashed
> when the system wakes up from swsusp.  It also happens when waking up
> from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
> Many people have said mysql also does not suspend well.  Is their use of
> a named pipe or socket causing the problem?

No idea, strace?
Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.13-rc3: swsusp works (TP 600X)

2005-07-22 Thread Sanjoy Mahajan
swsusp now mostly works on my TP 600X.  If I don't eject the pcmcia card
(usually a prism54 wireless card), swsusp begins the process of
hibernation, but never gets to the writing pages part.  The eth0 somehow
tries to reload the firmware (as if it's been woken up), and then
everything hangs.  If I eject the card and (for safety) stop
/etc/init.d/pcmcia, then swsusp writes out the memory to swap, and
waking up works fine.  Thanks for all the improvements!

Is there debugging I can do in order to help get the pcmcia system
hibernating automagically?

One other glitch is that pdnsd (a nameserver caching daemon) has crashed
when the system wakes up from swsusp.  It also happens when waking up
from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
Many people have said mysql also does not suspend well.  Is their use of
a named pipe or socket causing the problem?

System: TP 600X, 2.6.13-rc3 vanilla kernel, fixed DSDT that I used to
get S3 working with 2.6.11.4 (see
 for the DSDT),
booted with 
  idebus=66 apm=off acpi=force pci=noacpi acpi_sleep=s3_bios

-Sanjoy

`A society of sheep must in time beget a government of wolves.'
   - Bertrand de Jouvenal
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.13-rc3: swsusp works (TP 600X)

2005-07-22 Thread Sanjoy Mahajan
swsusp now mostly works on my TP 600X.  If I don't eject the pcmcia card
(usually a prism54 wireless card), swsusp begins the process of
hibernation, but never gets to the writing pages part.  The eth0 somehow
tries to reload the firmware (as if it's been woken up), and then
everything hangs.  If I eject the card and (for safety) stop
/etc/init.d/pcmcia, then swsusp writes out the memory to swap, and
waking up works fine.  Thanks for all the improvements!

Is there debugging I can do in order to help get the pcmcia system
hibernating automagically?

One other glitch is that pdnsd (a nameserver caching daemon) has crashed
when the system wakes up from swsusp.  It also happens when waking up
from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
Many people have said mysql also does not suspend well.  Is their use of
a named pipe or socket causing the problem?

System: TP 600X, 2.6.13-rc3 vanilla kernel, fixed DSDT that I used to
get S3 working with 2.6.11.4 (see
http://bugme.osdl.org/show_bug.cgi?id=4926 for the DSDT),
booted with 
  idebus=66 apm=off acpi=force pci=noacpi acpi_sleep=s3_bios

-Sanjoy

`A society of sheep must in time beget a government of wolves.'
   - Bertrand de Jouvenal
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc3: swsusp works (TP 600X)

2005-07-22 Thread Pavel Machek
Hi!

 swsusp now mostly works on my TP 600X.  If I don't eject the pcmcia card
 (usually a prism54 wireless card), swsusp begins the process of
 hibernation, but never gets to the writing pages part.  The eth0 somehow
 tries to reload the firmware (as if it's been woken up), and then
 everything hangs.  If I eject the card and (for safety) stop
 /etc/init.d/pcmcia, then swsusp writes out the memory to swap, and
 waking up works fine.  Thanks for all the improvements!
 
 Is there debugging I can do in order to help get the pcmcia system
 hibernating automagically?

Well, it really may be the firmware loading. Add some printks to
confirm it, then fix it.

 One other glitch is that pdnsd (a nameserver caching daemon) has crashed
 when the system wakes up from swsusp.  It also happens when waking up
 from S3, which was working with 2.6.11.4 although not with 2.6.13-rc3.
 Many people have said mysql also does not suspend well.  Is their use of
 a named pipe or socket causing the problem?

No idea, strace?
Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/