Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-23 Thread Marcus Better
> Do you use the microcode driver?

No.

I'm in the middle of bisecting. Strange enough the symptoms change as I go. 
Some commits manage to suspend to RAM, but hang at resume.

Marcus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-23 Thread Rafael J. Wysocki
On Friday, 23 March 2007 10:14, Marcus Better wrote:
> Marcus Better wrote:
> > The XFS workqueue patch [1] fixes my problem [2].
> 
> > [1] http://permalink.gmane.org/gmane.linux.kernel/507616
> > [2] http://permalink.gmane.org/gmane.linux.kernel/505570
> 
> Unfortunately it only fixed suspend to RAM. Suspend to disk still hangs
> at "snapshotting system". Will try to bisect it...

Do you use the microcode driver?

Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-23 Thread Tino Keitel
On Fri, Mar 23, 2007 at 10:14:11 +0100, Marcus Better wrote:
> Marcus Better wrote:
> > The XFS workqueue patch [1] fixes my problem [2].
> 
> > [1] http://permalink.gmane.org/gmane.linux.kernel/507616
> > [2] http://permalink.gmane.org/gmane.linux.kernel/505570
> 
> Unfortunately it only fixed suspend to RAM. Suspend to disk still hangs
> at "snapshotting system". Will try to bisect it...

I don't know if this is related, but using 2.6.21-rc4 and suspend2
2.2.9.7, I also get a hang at suspend. I could try if this also happens
with 2.6.20 and the same suspend2 version.

Regards,
Tino

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-23 Thread Marcus Better
Marcus Better wrote:
> The XFS workqueue patch [1] fixes my problem [2].

> [1] http://permalink.gmane.org/gmane.linux.kernel/507616
> [2] http://permalink.gmane.org/gmane.linux.kernel/505570

Unfortunately it only fixed suspend to RAM. Suspend to disk still hangs
at "snapshotting system". Will try to bisect it...

Marcus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-22 Thread Marcus Better
Thomas Gleixner wrote:
> You said, that the breakage came between 2.6.20 and rc2. Can you bisect
> it ?

The XFS workqueue patch [1] fixes my problem [2].

Marcus

[1] http://permalink.gmane.org/gmane.linux.kernel/507616
[2] http://permalink.gmane.org/gmane.linux.kernel/505570


pgpv6EWLbYBMj.pgp
Description: PGP signature


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-21 Thread Thomas Gleixner
On Tue, 2007-03-20 at 10:35 +0100, Marcus Better wrote:
> Thomas Gleixner wrote:
> 
> > I finally found a dual core box, which survives suspend/resume without
> > crashing in the middle of nowhere. Sigh, I never figured out from the
> > code and the bug reports what's going on.
> > 
> > The observed hangs are caused by a stale state transition of the clock
> > event devices, which keeps the RCU synchronization away from completion,
> > when the non boot CPU is brought back up.
> 
> This didn't fix the suspend problems on my Thinkpad R60. (Sorry for
> nagging - please let me know if I can assist in debugging this...)

I did not expect that it fixes your problem. clockevents are only used
in arch/i386 right now. You are running a 64 bit kernel, so a change of
your problem would have been very surprising.

You said, that the breakage came between 2.6.20 and rc2. Can you bisect
it ?

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-20 Thread Marcus Better
Thomas Gleixner wrote:

> I finally found a dual core box, which survives suspend/resume without
> crashing in the middle of nowhere. Sigh, I never figured out from the
> code and the bug reports what's going on.
> 
> The observed hangs are caused by a stale state transition of the clock
> event devices, which keeps the RCU synchronization away from completion,
> when the non boot CPU is brought back up.

This didn't fix the suspend problems on my Thinkpad R60. (Sorry for
nagging - please let me know if I can assist in debugging this...)

Marcus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-18 Thread Adrian Bunk
On Sat, Mar 17, 2007 at 10:47:01PM +0100, Rafael J. Wysocki wrote:
> On Saturday, 17 March 2007 11:07, Thomas Meyer wrote:
> >...
> > 2.) The first suspend to disk works with no problems, but the second
> > suspend to disk in a row results in an oops:
> >  ->resume_device ->
> > pci_device_resume->ata_host_resume->ahci_pci_device_resume->ata_pci_device_do_resume->pci_restore_state
> 
> Can you please see if this problem is already in the Adrian's list of known
> regressions?

AFAIK not, I've given it an own entry.

> Thanks,
> Rafael

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-17 Thread Rafael J. Wysocki
On Saturday, 17 March 2007 11:07, Thomas Meyer wrote:
> Thomas Gleixner schrieb:
> > I finally found a dual core box, which survives suspend/resume without
> > crashing in the middle of nowhere. Sigh, I never figured out from the
> > code and the bug reports what's going on.
> >
> > The observed hangs are caused by a stale state transition of the clock
> > event devices, which keeps the RCU synchronization away from completion,
> > when the non boot CPU is brought back up.
> >
> > The suspend/resume in oneshot mode needs the similar care as the
> > periodic mode during suspend to RAM. My assumption that the state
> > transitions during the different shutdown/bringups of s2disk would go
> > through the periodic boot phase and then switch over to highres resp.
> > nohz mode were simply wrong.
> >
> > Add the appropriate suspend / resume handling for the non periodic
> > modes.
> >
> > Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
> >   
> 
> Excellent work. Now suspend to disk is working again. But:
> 
> 1.) The quirk added in commit a417a21e10831bca695b4ba9c74f4ddf5a95ac06
> for the appletouch driver doesn't seem to work after resume.
> 
> 2.) The first suspend to disk works with no problems, but the second
> suspend to disk in a row results in an oops:
>  ->resume_device ->
> pci_device_resume->ata_host_resume->ahci_pci_device_resume->ata_pci_device_do_resume->pci_restore_state

Can you please see if this problem is already in the Adrian's list of known
regressions?

Thanks,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-17 Thread Thomas Meyer
Thomas Gleixner schrieb:
> I finally found a dual core box, which survives suspend/resume without
> crashing in the middle of nowhere. Sigh, I never figured out from the
> code and the bug reports what's going on.
>
> The observed hangs are caused by a stale state transition of the clock
> event devices, which keeps the RCU synchronization away from completion,
> when the non boot CPU is brought back up.
>
> The suspend/resume in oneshot mode needs the similar care as the
> periodic mode during suspend to RAM. My assumption that the state
> transitions during the different shutdown/bringups of s2disk would go
> through the periodic boot phase and then switch over to highres resp.
> nohz mode were simply wrong.
>
> Add the appropriate suspend / resume handling for the non periodic
> modes.
>
> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>
>   

Excellent work. Now suspend to disk is working again. But:

1.) The quirk added in commit a417a21e10831bca695b4ba9c74f4ddf5a95ac06
for the appletouch driver doesn't seem to work after resume.

2.) The first suspend to disk works with no problems, but the second
suspend to disk in a row results in an oops:
 ->resume_device ->
pci_device_resume->ata_host_resume->ahci_pci_device_resume->ata_pci_device_do_resume->pci_restore_state

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-17 Thread Milan Broz
Thomas Gleixner wrote:
> I finally found a dual core box, which survives suspend/resume without
> crashing in the middle of nowhere. Sigh, I never figured out from the
> code and the bug reports what's going on.
> 
> The observed hangs are caused by a stale state transition of the clock
> event devices, which keeps the RCU synchronization away from completion,
> when the non boot CPU is brought back up.
> 
> The suspend/resume in oneshot mode needs the similar care as the
> periodic mode during suspend to RAM. My assumption that the state
> transitions during the different shutdown/bringups of s2disk would go
> through the periodic boot phase and then switch over to highres resp.
> nohz mode were simply wrong.
> 
> Add the appropriate suspend / resume handling for the non periodic
> modes.
> 
> Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>

Hi,
I can confirm that this patch fixed the problem on Thinkpad X60s.

Thanks !

Milan
--
[EMAIL PROTECTED]
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] clockevents: Fix suspend/resume to disk hangs

2007-03-16 Thread Thomas Gleixner
I finally found a dual core box, which survives suspend/resume without
crashing in the middle of nowhere. Sigh, I never figured out from the
code and the bug reports what's going on.

The observed hangs are caused by a stale state transition of the clock
event devices, which keeps the RCU synchronization away from completion,
when the non boot CPU is brought back up.

The suspend/resume in oneshot mode needs the similar care as the
periodic mode during suspend to RAM. My assumption that the state
transitions during the different shutdown/bringups of s2disk would go
through the periodic boot phase and then switch over to highres resp.
nohz mode were simply wrong.

Add the appropriate suspend / resume handling for the non periodic
modes.

Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]>

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 5567745..eadfce2 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -307,12 +307,19 @@ int tick_resume_broadcast(void)
spin_lock_irqsave(&tick_broadcast_lock, flags);
 
bc = tick_broadcast_device.evtdev;
-   if (bc) {
-   if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC &&
-   !cpus_empty(tick_broadcast_mask))
-   tick_broadcast_start_periodic(bc);
 
-   broadcast = cpu_isset(smp_processor_id(), tick_broadcast_mask);
+   if (bc) {
+   switch (tick_broadcast_device.mode) {
+   case TICKDEV_MODE_PERIODIC:
+   if(!cpus_empty(tick_broadcast_mask))
+   tick_broadcast_start_periodic(bc);
+   broadcast = cpu_isset(smp_processor_id(),
+ tick_broadcast_mask);
+   break;
+   case TICKDEV_MODE_ONESHOT:
+   broadcast = tick_resume_broadcast_oneshot(bc);
+   break;
+   }
}
spin_unlock_irqrestore(&tick_broadcast_lock, flags);
 
@@ -347,6 +354,16 @@ static int tick_broadcast_set_event(ktime_t expires, int 
force)
}
 }
 
+int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
+{
+   clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
+
+   if(!cpus_empty(tick_broadcast_oneshot_mask))
+   tick_broadcast_set_event(ktime_get(), 1);
+
+   return cpu_isset(smp_processor_id(), tick_broadcast_oneshot_mask);
+}
+
 /*
  * Reprogram the broadcast device:
  *
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 43ba1bd..bfda3f7 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -298,18 +298,17 @@ static void tick_shutdown(unsigned int *cpup)
spin_unlock_irqrestore(&tick_device_lock, flags);
 }
 
-static void tick_suspend_periodic(void)
+static void tick_suspend(void)
 {
struct tick_device *td = &__get_cpu_var(tick_cpu_device);
unsigned long flags;
 
spin_lock_irqsave(&tick_device_lock, flags);
-   if (td->mode == TICKDEV_MODE_PERIODIC)
-   clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_SHUTDOWN);
+   clockevents_set_mode(td->evtdev, CLOCK_EVT_MODE_SHUTDOWN);
spin_unlock_irqrestore(&tick_device_lock, flags);
 }
 
-static void tick_resume_periodic(void)
+static void tick_resume(void)
 {
struct tick_device *td = &__get_cpu_var(tick_cpu_device);
unsigned long flags;
@@ -317,6 +316,8 @@ static void tick_resume_periodic(void)
spin_lock_irqsave(&tick_device_lock, flags);
if (td->mode == TICKDEV_MODE_PERIODIC)
tick_setup_periodic(td->evtdev, 0);
+   else
+   tick_resume_oneshot();
spin_unlock_irqrestore(&tick_device_lock, flags);
 }
 
@@ -348,13 +349,13 @@ static int tick_notify(struct notifier_block *nb, 
unsigned long reason,
break;
 
case CLOCK_EVT_NOTIFY_SUSPEND:
-   tick_suspend_periodic();
+   tick_suspend();
tick_suspend_broadcast();
break;
 
case CLOCK_EVT_NOTIFY_RESUME:
if (!tick_resume_broadcast())
-   tick_resume_periodic();
+   tick_resume();
break;
 
default:
diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h
index 75890ef..c9d203b 100644
--- a/kernel/time/tick-internal.h
+++ b/kernel/time/tick-internal.h
@@ -19,12 +19,13 @@ extern void tick_setup_oneshot(struct clock_event_device 
*newdev,
 extern int tick_program_event(ktime_t expires, int force);
 extern void tick_oneshot_notify(void);
 extern int tick_switch_to_oneshot(void (*handler)(struct clock_event_device 
*));
-
+extern void tick_resume_oneshot(void);
 # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
 extern void tick_broadcast_setup_oneshot(struct clock_event_device *bc);
 extern void tick_broadcast_oneshot_control(unsigned long reason);
 extern void tick_broadcast