Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-16 Thread Kok, Auke

Pavel Machek wrote:

Hi!


Seeing a couple of MSI changes in there, on a hunch I booted latest tree with
pci=nomsi, and it resumed again.

Any ideas how to further debug this?
I'll try backing out individual changes from that merge tomorrow.
Thanks.  


Of those msi patches you have identified I don't see anything really
obvious.  And you actually marked them as good in your bisect so
I don't expect it is core problem.

We do have a known e1000 regression, with msi and suspend/resume.


still? I tested this against rc3 and it's mostly just fine. even with msi 
enabled.


So it is possible the nomsi avoided a driver problem.  Especially
as we have a number of driver changes on the on Linus's side of
that merge.

I also know we have some known issues with pci_save_state and
pci_restore_state that require them to be paired for correct
operation.  For suspend and resume that is not generally a problem.

I have fixes for the pci_save_state and pci_restore_state in the -mm
and gregkh tree's.  Since they also happen to fix the e1000 driver as
a side effect they are worth looking at, at least if you have an
e1000.


hey, please include me on those!


I don't have a clue which hardware the x60 has so I don't know which
drivers it would be using.


x60 indeed has e1000.


yup.

Auke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-16 Thread Pavel Machek
Hi!

> > Seeing a couple of MSI changes in there, on a hunch I booted latest tree 
> > with
> > pci=nomsi, and it resumed again.
> >
> > Any ideas how to further debug this?
> > I'll try backing out individual changes from that merge tomorrow.
> 
> Thanks.  
> 
> Of those msi patches you have identified I don't see anything really
> obvious.  And you actually marked them as good in your bisect so
> I don't expect it is core problem.
> 
> We do have a known e1000 regression, with msi and suspend/resume.
> So it is possible the nomsi avoided a driver problem.  Especially
> as we have a number of driver changes on the on Linus's side of
> that merge.
> 
> I also know we have some known issues with pci_save_state and
> pci_restore_state that require them to be paired for correct
> operation.  For suspend and resume that is not generally a problem.
> 
> I have fixes for the pci_save_state and pci_restore_state in the -mm
> and gregkh tree's.  Since they also happen to fix the e1000 driver as
> a side effect they are worth looking at, at least if you have an
> e1000.
> 
> I don't have a clue which hardware the x60 has so I don't know which
> drivers it would be using.

x60 indeed has e1000.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-16 Thread Pavel Machek
Hi!

  Seeing a couple of MSI changes in there, on a hunch I booted latest tree 
  with
  pci=nomsi, and it resumed again.
 
  Any ideas how to further debug this?
  I'll try backing out individual changes from that merge tomorrow.
 
 Thanks.  
 
 Of those msi patches you have identified I don't see anything really
 obvious.  And you actually marked them as good in your bisect so
 I don't expect it is core problem.
 
 We do have a known e1000 regression, with msi and suspend/resume.
 So it is possible the nomsi avoided a driver problem.  Especially
 as we have a number of driver changes on the on Linus's side of
 that merge.
 
 I also know we have some known issues with pci_save_state and
 pci_restore_state that require them to be paired for correct
 operation.  For suspend and resume that is not generally a problem.
 
 I have fixes for the pci_save_state and pci_restore_state in the -mm
 and gregkh tree's.  Since they also happen to fix the e1000 driver as
 a side effect they are worth looking at, at least if you have an
 e1000.
 
 I don't have a clue which hardware the x60 has so I don't know which
 drivers it would be using.

x60 indeed has e1000.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-16 Thread Kok, Auke

Pavel Machek wrote:

Hi!


Seeing a couple of MSI changes in there, on a hunch I booted latest tree with
pci=nomsi, and it resumed again.

Any ideas how to further debug this?
I'll try backing out individual changes from that merge tomorrow.
Thanks.  


Of those msi patches you have identified I don't see anything really
obvious.  And you actually marked them as good in your bisect so
I don't expect it is core problem.

We do have a known e1000 regression, with msi and suspend/resume.


still? I tested this against rc3 and it's mostly just fine. even with msi 
enabled.


So it is possible the nomsi avoided a driver problem.  Especially
as we have a number of driver changes on the on Linus's side of
that merge.

I also know we have some known issues with pci_save_state and
pci_restore_state that require them to be paired for correct
operation.  For suspend and resume that is not generally a problem.

I have fixes for the pci_save_state and pci_restore_state in the -mm
and gregkh tree's.  Since they also happen to fix the e1000 driver as
a side effect they are worth looking at, at least if you have an
e1000.


hey, please include me on those!


I don't have a clue which hardware the x60 has so I don't know which
drivers it would be using.


x60 indeed has e1000.


yup.

Auke
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Dave Jones
On Thu, Mar 15, 2007 at 12:45:20PM -0700, Jeremy Fitzhardinge wrote:
 > Dave Jones wrote:
 > > I just did a build of top of tree, including those commits, and
 > > it's still broken.  Booting with pci=nomsi no longer 'fixes' it
 > > though, which may indicate that the MSI changes were a red herring.
 > > (Or that the subsequent changes have regressed it even more,
 > >  which seems unlikely looking at the changes).
 > >   
 > 
 > I just found the same thing on my X60.  Current top-of-tree with
 > pci=nomsi does not improve things.  When it resumes, the CPU is working
 > (capslock toggles, sysreq-b reboots), but the screen is blank.

Yeah, I noticed the capslock works.  Networking doesn't come back up
though, and it doesn't seem to answer to command that I type blindly.
Even trying to do something like..

pm-suspend ; dmesg >dmesg.out; /sbin/reboot

doesn't seem to execute the commands on resume.

Switching tty's to X with alt-f7 seems to lock it up to the point that
even capslock doesn't work any more.


I'll try and hook up a usb serial cable and see if I'm lucky enough
to get something useful out of it in the absense of a serial port..

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Jeremy Fitzhardinge
Dave Jones wrote:
> I just did a build of top of tree, including those commits, and
> it's still broken.  Booting with pci=nomsi no longer 'fixes' it
> though, which may indicate that the MSI changes were a red herring.
> (Or that the subsequent changes have regressed it even more,
>  which seems unlikely looking at the changes).
>   

I just found the same thing on my X60.  Current top-of-tree with
pci=nomsi does not improve things.  When it resumes, the CPU is working
(capslock toggles, sysreq-b reboots), but the screen is blank.

I was about to try 2.6.21-rc3-mm2; I'll see if that's any different.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Eric W. Biederman
Dave Jones <[EMAIL PROTECTED]> writes:

> I just did a build of top of tree, including those commits, and
> it's still broken.  Booting with pci=nomsi no longer 'fixes' it
> though, which may indicate that the MSI changes were a red herring.
> (Or that the subsequent changes have regressed it even more,
>  which seems unlikely looking at the changes).
>
> .. or it could be something else introduced between rc3 (which is
> what my bisect was based on) and todays tree.
>
> sigh. I'll do more bisecting after lunch.

Thanks.

It is good to know that things are worse, even if that isn't good news.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Dave Jones
On Thu, Mar 15, 2007 at 10:11:01AM -0600, Eric W. Biederman wrote:
 > Dave Jones <[EMAIL PROTECTED]> writes:
 > 
 > > On Tue, Mar 13, 2007 at 10:22:53AM +0100, Rafael J. Wysocki wrote:
 > >  > On Tuesday, 13 March 2007 05:08, Dave Jones wrote:
 > >  > > I spent considerable time over the last day or so bisecting to
 > >  > > find out why an X60 stopped resuming somewhen between 2.6.20 and 
 > > current
 > > -git.
 > >  > > (Total lockup, black screen of death).
 > >  > 
 > >  > Do you have CONFIG_TICK_ONESHOT or CONFIG_NO_HZ set?  If you do, could 
 > > you
 > >  > please unset them and retest?
 > >
 > > I did try with NO_HZ unset, made no difference, I don't recall 
 > > TICK_ONESHOT.
 > > I'm in meetings all day, but I'll check when I get home.
 > 
 > I haven't heard anything more on this thread.
 > 
 > I just wanted to double check.  The tree that failed did it include
 > commits: 
 > 392ee1e6dd901db6c4504617476f6442ed91f72d and
 > 9f35575dfc172f0a93fb464761883c8f49599b7a
 > 
 > Mostly I was wondering if any of my later work to sort out msi 
 > suspend/resume actually solved anything.

I just did a build of top of tree, including those commits, and
it's still broken.  Booting with pci=nomsi no longer 'fixes' it
though, which may indicate that the MSI changes were a red herring.
(Or that the subsequent changes have regressed it even more,
 which seems unlikely looking at the changes).

.. or it could be something else introduced between rc3 (which is
what my bisect was based on) and todays tree.

sigh. I'll do more bisecting after lunch.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Dave Jones
On Thu, Mar 15, 2007 at 10:11:01AM -0600, Eric W. Biederman wrote:

 > I haven't heard anything more on this thread.

Sorry, I've been stuck in meetings the last two days..

 > I just wanted to double check.  The tree that failed did it include
 > commits: 
 > 392ee1e6dd901db6c4504617476f6442ed91f72d and
 > 9f35575dfc172f0a93fb464761883c8f49599b7a
 > 
 > Mostly I was wondering if any of my later work to sort out msi 
 > suspend/resume actually solved anything.

I'll kick off some compiles and find out.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Eric W. Biederman
Dave Jones <[EMAIL PROTECTED]> writes:

> On Tue, Mar 13, 2007 at 10:22:53AM +0100, Rafael J. Wysocki wrote:
>  > On Tuesday, 13 March 2007 05:08, Dave Jones wrote:
>  > > I spent considerable time over the last day or so bisecting to
>  > > find out why an X60 stopped resuming somewhen between 2.6.20 and current
> -git.
>  > > (Total lockup, black screen of death).
>  > 
>  > Do you have CONFIG_TICK_ONESHOT or CONFIG_NO_HZ set?  If you do, could you
>  > please unset them and retest?
>
> I did try with NO_HZ unset, made no difference, I don't recall TICK_ONESHOT.
> I'm in meetings all day, but I'll check when I get home.

I haven't heard anything more on this thread.

I just wanted to double check.  The tree that failed did it include
commits: 
392ee1e6dd901db6c4504617476f6442ed91f72d and
9f35575dfc172f0a93fb464761883c8f49599b7a

Mostly I was wondering if any of my later work to sort out msi 
suspend/resume actually solved anything.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Eric W. Biederman
Dave Jones [EMAIL PROTECTED] writes:

 On Tue, Mar 13, 2007 at 10:22:53AM +0100, Rafael J. Wysocki wrote:
   On Tuesday, 13 March 2007 05:08, Dave Jones wrote:
I spent considerable time over the last day or so bisecting to
find out why an X60 stopped resuming somewhen between 2.6.20 and current
 -git.
(Total lockup, black screen of death).
   
   Do you have CONFIG_TICK_ONESHOT or CONFIG_NO_HZ set?  If you do, could you
   please unset them and retest?

 I did try with NO_HZ unset, made no difference, I don't recall TICK_ONESHOT.
 I'm in meetings all day, but I'll check when I get home.

I haven't heard anything more on this thread.

I just wanted to double check.  The tree that failed did it include
commits: 
392ee1e6dd901db6c4504617476f6442ed91f72d and
9f35575dfc172f0a93fb464761883c8f49599b7a

Mostly I was wondering if any of my later work to sort out msi 
suspend/resume actually solved anything.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Dave Jones
On Thu, Mar 15, 2007 at 10:11:01AM -0600, Eric W. Biederman wrote:

  I haven't heard anything more on this thread.

Sorry, I've been stuck in meetings the last two days..

  I just wanted to double check.  The tree that failed did it include
  commits: 
  392ee1e6dd901db6c4504617476f6442ed91f72d and
  9f35575dfc172f0a93fb464761883c8f49599b7a
  
  Mostly I was wondering if any of my later work to sort out msi 
  suspend/resume actually solved anything.

I'll kick off some compiles and find out.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Dave Jones
On Thu, Mar 15, 2007 at 10:11:01AM -0600, Eric W. Biederman wrote:
  Dave Jones [EMAIL PROTECTED] writes:
  
   On Tue, Mar 13, 2007 at 10:22:53AM +0100, Rafael J. Wysocki wrote:
 On Tuesday, 13 March 2007 05:08, Dave Jones wrote:
  I spent considerable time over the last day or so bisecting to
  find out why an X60 stopped resuming somewhen between 2.6.20 and 
   current
   -git.
  (Total lockup, black screen of death).
 
 Do you have CONFIG_TICK_ONESHOT or CONFIG_NO_HZ set?  If you do, could 
   you
 please unset them and retest?
  
   I did try with NO_HZ unset, made no difference, I don't recall 
   TICK_ONESHOT.
   I'm in meetings all day, but I'll check when I get home.
  
  I haven't heard anything more on this thread.
  
  I just wanted to double check.  The tree that failed did it include
  commits: 
  392ee1e6dd901db6c4504617476f6442ed91f72d and
  9f35575dfc172f0a93fb464761883c8f49599b7a
  
  Mostly I was wondering if any of my later work to sort out msi 
  suspend/resume actually solved anything.

I just did a build of top of tree, including those commits, and
it's still broken.  Booting with pci=nomsi no longer 'fixes' it
though, which may indicate that the MSI changes were a red herring.
(Or that the subsequent changes have regressed it even more,
 which seems unlikely looking at the changes).

.. or it could be something else introduced between rc3 (which is
what my bisect was based on) and todays tree.

sigh. I'll do more bisecting after lunch.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Eric W. Biederman
Dave Jones [EMAIL PROTECTED] writes:

 I just did a build of top of tree, including those commits, and
 it's still broken.  Booting with pci=nomsi no longer 'fixes' it
 though, which may indicate that the MSI changes were a red herring.
 (Or that the subsequent changes have regressed it even more,
  which seems unlikely looking at the changes).

 .. or it could be something else introduced between rc3 (which is
 what my bisect was based on) and todays tree.

 sigh. I'll do more bisecting after lunch.

Thanks.

It is good to know that things are worse, even if that isn't good news.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Jeremy Fitzhardinge
Dave Jones wrote:
 I just did a build of top of tree, including those commits, and
 it's still broken.  Booting with pci=nomsi no longer 'fixes' it
 though, which may indicate that the MSI changes were a red herring.
 (Or that the subsequent changes have regressed it even more,
  which seems unlikely looking at the changes).
   

I just found the same thing on my X60.  Current top-of-tree with
pci=nomsi does not improve things.  When it resumes, the CPU is working
(capslock toggles, sysreq-b reboots), but the screen is blank.

I was about to try 2.6.21-rc3-mm2; I'll see if that's any different.

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-15 Thread Dave Jones
On Thu, Mar 15, 2007 at 12:45:20PM -0700, Jeremy Fitzhardinge wrote:
  Dave Jones wrote:
   I just did a build of top of tree, including those commits, and
   it's still broken.  Booting with pci=nomsi no longer 'fixes' it
   though, which may indicate that the MSI changes were a red herring.
   (Or that the subsequent changes have regressed it even more,
which seems unlikely looking at the changes).
 
  
  I just found the same thing on my X60.  Current top-of-tree with
  pci=nomsi does not improve things.  When it resumes, the CPU is working
  (capslock toggles, sysreq-b reboots), but the screen is blank.

Yeah, I noticed the capslock works.  Networking doesn't come back up
though, and it doesn't seem to answer to command that I type blindly.
Even trying to do something like..

pm-suspend ; dmesg dmesg.out; /sbin/reboot

doesn't seem to execute the commands on resume.

Switching tty's to X with alt-f7 seems to lock it up to the point that
even capslock doesn't work any more.


I'll try and hook up a usb serial cable and see if I'm lucky enough
to get something useful out of it in the absense of a serial port..

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-13 Thread Matt Mackall
On Tue, Mar 13, 2007 at 12:08:28AM -0400, Dave Jones wrote:
> I spent considerable time over the last day or so bisecting to
> find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
> (Total lockup, black screen of death).
> 
> The bisect log looked like this.
> 
...
> Any ideas how to further debug this?
> I'll try backing out individual changes from that merge tomorrow.

If you've got a tree that looks like:

 --a-b-c-d-e-f-g-h->
\   /
 i-j-k-l-m-n

where h is bad but both g and n are good, you can try testing the
merge of g+k, etc. Which will find half the problem. Then you can do
the same on the other side. Tedious.

The best way to debug resume issues directly seems to be to do a fake
suspend, possibly with filtering out particular devices:

http://lwn.net/Articles/219033/
http://www.uwsg.iu.edu/hypermail/linux/kernel/0701.3/0397.html

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-13 Thread Dave Jones
On Tue, Mar 13, 2007 at 10:22:53AM +0100, Rafael J. Wysocki wrote:
 > On Tuesday, 13 March 2007 05:08, Dave Jones wrote:
 > > I spent considerable time over the last day or so bisecting to
 > > find out why an X60 stopped resuming somewhen between 2.6.20 and current 
 > > -git.
 > > (Total lockup, black screen of death).
 > 
 > Do you have CONFIG_TICK_ONESHOT or CONFIG_NO_HZ set?  If you do, could you
 > please unset them and retest?

I did try with NO_HZ unset, made no difference, I don't recall TICK_ONESHOT.
I'm in meetings all day, but I'll check when I get home.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-13 Thread Rafael J. Wysocki
On Tuesday, 13 March 2007 05:08, Dave Jones wrote:
> I spent considerable time over the last day or so bisecting to
> find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
> (Total lockup, black screen of death).

Do you have CONFIG_TICK_ONESHOT or CONFIG_NO_HZ set?  If you do, could you
please unset them and retest?

Thanks,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-13 Thread Eric W. Biederman
Dave Jones <[EMAIL PROTECTED]> writes:

> I spent considerable time over the last day or so bisecting to
> find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
> (Total lockup, black screen of death).
>
> The bisect log looked like this.
>
> git-bisect start
> # bad: [c8f71b01a50597e298dc3214a2f2be7b8d31170c] Linux 2.6.21-rc1
> git-bisect bad c8f71b01a50597e298dc3214a2f2be7b8d31170c
> # good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20
> git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7
> # bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of
> git://ftp.linux-mips.org/pub/scm/upstream-linus
> git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8
> # bad: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge
> master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
> git-bisect bad 43187902cbfafe73ede0144166b741fb0f7d04e1
> # good: [1545085a28f226b59c243f88b82ea25393b0d63f] drm: Allow for 44 bit
> user-tokens (or drm_file offsets)
> git-bisect good 1545085a28f226b59c243f88b82ea25393b0d63f
> # good: [c96e2c92072d3e78954c961f53d8c7352f7abbd7] Merge
> master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6
> git-bisect good c96e2c92072d3e78954c961f53d8c7352f7abbd7
> # good: [31c56d820e03a2fd47f81d6c826f92caf511f9ee] [POWERPC] pasemi: iommu
> support
> git-bisect good 31c56d820e03a2fd47f81d6c826f92caf511f9ee
> # bad: [78149df6d565c36675463352d0bfeb02b7a7] Merge
> master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
> git-bisect bad 78149df6d565c36675463352d0bfeb02b7a7
> # good: [3d9c18872fa1db5c43ab97d8cbca43775998e49c] shpchp: remove
> CONFIG_HOTPLUG_PCI_SHPC_POLL_EVENT_MODE
> git-bisect good 3d9c18872fa1db5c43ab97d8cbca43775998e49c
> # good: [88187dfa4d8bb565df762f272511d2c91e427e0d] MSI: Replace pci_msi_quirk
> with calls to pci_no_msi()
> git-bisect good 88187dfa4d8bb565df762f272511d2c91e427e0d
> # good: [866a8c87c4e51046602387953bbef76992107bcb] msi: Fix
> msi_remove_pci_irq_vectors.
> git-bisect good 866a8c87c4e51046602387953bbef76992107bcb
> # good: [f7feaca77d6ad6bcfcc88ac54e3188970448d6fe] msi: Make MSI useable more
> architectures
> git-bisect good f7feaca77d6ad6bcfcc88ac54e3188970448d6fe
> # good: [14719f325e1cd4ff757587e9a221ebaf394563ee] Revert "PCI: remove 
> duplicate
> device id from ata_piix"
> git-bisect good 14719f325e1cd4ff757587e9a221ebaf394563ee
>
> which led me to a final 'bad' commit of 
> 78149df6d565c36675463352d0bfeb02b7a7
> which is a merge changeset of lots of PCI bits.

Ok.  This is weird.  It looks like you marked the merge bad but
it's individual commits as good

Which would indicate a problem on one of the branches it was merged
with, or a problem that only shows up when both groups of changes
are present.

> Seeing a couple of MSI changes in there, on a hunch I booted latest tree with
> pci=nomsi, and it resumed again.
>
> Any ideas how to further debug this?
> I'll try backing out individual changes from that merge tomorrow.

Thanks.  

Of those msi patches you have identified I don't see anything really
obvious.  And you actually marked them as good in your bisect so
I don't expect it is core problem.

We do have a known e1000 regression, with msi and suspend/resume.
So it is possible the nomsi avoided a driver problem.  Especially
as we have a number of driver changes on the on Linus's side of
that merge.

I also know we have some known issues with pci_save_state and
pci_restore_state that require them to be paired for correct
operation.  For suspend and resume that is not generally a problem.

I have fixes for the pci_save_state and pci_restore_state in the -mm
and gregkh tree's.  Since they also happen to fix the e1000 driver as
a side effect they are worth looking at, at least if you have an
e1000.

I don't have a clue which hardware the x60 has so I don't know which
drivers it would be using.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-13 Thread Eric W. Biederman
Dave Jones [EMAIL PROTECTED] writes:

 I spent considerable time over the last day or so bisecting to
 find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
 (Total lockup, black screen of death).

 The bisect log looked like this.

 git-bisect start
 # bad: [c8f71b01a50597e298dc3214a2f2be7b8d31170c] Linux 2.6.21-rc1
 git-bisect bad c8f71b01a50597e298dc3214a2f2be7b8d31170c
 # good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20
 git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7
 # bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of
 git://ftp.linux-mips.org/pub/scm/upstream-linus
 git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8
 # bad: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge
 master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
 git-bisect bad 43187902cbfafe73ede0144166b741fb0f7d04e1
 # good: [1545085a28f226b59c243f88b82ea25393b0d63f] drm: Allow for 44 bit
 user-tokens (or drm_file offsets)
 git-bisect good 1545085a28f226b59c243f88b82ea25393b0d63f
 # good: [c96e2c92072d3e78954c961f53d8c7352f7abbd7] Merge
 master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6
 git-bisect good c96e2c92072d3e78954c961f53d8c7352f7abbd7
 # good: [31c56d820e03a2fd47f81d6c826f92caf511f9ee] [POWERPC] pasemi: iommu
 support
 git-bisect good 31c56d820e03a2fd47f81d6c826f92caf511f9ee
 # bad: [78149df6d565c36675463352d0bfeb02b7a7] Merge
 master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
 git-bisect bad 78149df6d565c36675463352d0bfeb02b7a7
 # good: [3d9c18872fa1db5c43ab97d8cbca43775998e49c] shpchp: remove
 CONFIG_HOTPLUG_PCI_SHPC_POLL_EVENT_MODE
 git-bisect good 3d9c18872fa1db5c43ab97d8cbca43775998e49c
 # good: [88187dfa4d8bb565df762f272511d2c91e427e0d] MSI: Replace pci_msi_quirk
 with calls to pci_no_msi()
 git-bisect good 88187dfa4d8bb565df762f272511d2c91e427e0d
 # good: [866a8c87c4e51046602387953bbef76992107bcb] msi: Fix
 msi_remove_pci_irq_vectors.
 git-bisect good 866a8c87c4e51046602387953bbef76992107bcb
 # good: [f7feaca77d6ad6bcfcc88ac54e3188970448d6fe] msi: Make MSI useable more
 architectures
 git-bisect good f7feaca77d6ad6bcfcc88ac54e3188970448d6fe
 # good: [14719f325e1cd4ff757587e9a221ebaf394563ee] Revert PCI: remove 
 duplicate
 device id from ata_piix
 git-bisect good 14719f325e1cd4ff757587e9a221ebaf394563ee

 which led me to a final 'bad' commit of 
 78149df6d565c36675463352d0bfeb02b7a7
 which is a merge changeset of lots of PCI bits.

Ok.  This is weird.  It looks like you marked the merge bad but
it's individual commits as good

Which would indicate a problem on one of the branches it was merged
with, or a problem that only shows up when both groups of changes
are present.

 Seeing a couple of MSI changes in there, on a hunch I booted latest tree with
 pci=nomsi, and it resumed again.

 Any ideas how to further debug this?
 I'll try backing out individual changes from that merge tomorrow.

Thanks.  

Of those msi patches you have identified I don't see anything really
obvious.  And you actually marked them as good in your bisect so
I don't expect it is core problem.

We do have a known e1000 regression, with msi and suspend/resume.
So it is possible the nomsi avoided a driver problem.  Especially
as we have a number of driver changes on the on Linus's side of
that merge.

I also know we have some known issues with pci_save_state and
pci_restore_state that require them to be paired for correct
operation.  For suspend and resume that is not generally a problem.

I have fixes for the pci_save_state and pci_restore_state in the -mm
and gregkh tree's.  Since they also happen to fix the e1000 driver as
a side effect they are worth looking at, at least if you have an
e1000.

I don't have a clue which hardware the x60 has so I don't know which
drivers it would be using.

Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-13 Thread Rafael J. Wysocki
On Tuesday, 13 March 2007 05:08, Dave Jones wrote:
 I spent considerable time over the last day or so bisecting to
 find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
 (Total lockup, black screen of death).

Do you have CONFIG_TICK_ONESHOT or CONFIG_NO_HZ set?  If you do, could you
please unset them and retest?

Thanks,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-13 Thread Dave Jones
On Tue, Mar 13, 2007 at 10:22:53AM +0100, Rafael J. Wysocki wrote:
  On Tuesday, 13 March 2007 05:08, Dave Jones wrote:
   I spent considerable time over the last day or so bisecting to
   find out why an X60 stopped resuming somewhen between 2.6.20 and current 
   -git.
   (Total lockup, black screen of death).
  
  Do you have CONFIG_TICK_ONESHOT or CONFIG_NO_HZ set?  If you do, could you
  please unset them and retest?

I did try with NO_HZ unset, made no difference, I don't recall TICK_ONESHOT.
I'm in meetings all day, but I'll check when I get home.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21rc suspend to ram regression on Lenovo X60

2007-03-13 Thread Matt Mackall
On Tue, Mar 13, 2007 at 12:08:28AM -0400, Dave Jones wrote:
 I spent considerable time over the last day or so bisecting to
 find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
 (Total lockup, black screen of death).
 
 The bisect log looked like this.
 
...
 Any ideas how to further debug this?
 I'll try backing out individual changes from that merge tomorrow.

If you've got a tree that looks like:

 --a-b-c-d-e-f-g-h-
\   /
 i-j-k-l-m-n

where h is bad but both g and n are good, you can try testing the
merge of g+k, etc. Which will find half the problem. Then you can do
the same on the other side. Tedious.

The best way to debug resume issues directly seems to be to do a fake
suspend, possibly with filtering out particular devices:

http://lwn.net/Articles/219033/
http://www.uwsg.iu.edu/hypermail/linux/kernel/0701.3/0397.html

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21rc suspend to ram regression on Lenovo X60

2007-03-12 Thread Dave Jones
I spent considerable time over the last day or so bisecting to
find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
(Total lockup, black screen of death).

The bisect log looked like this.

git-bisect start
# bad: [c8f71b01a50597e298dc3214a2f2be7b8d31170c] Linux 2.6.21-rc1
git-bisect bad c8f71b01a50597e298dc3214a2f2be7b8d31170c
# good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20
git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7
# bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of 
git://ftp.linux-mips.org/pub/scm/upstream-linus
git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8
# bad: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
git-bisect bad 43187902cbfafe73ede0144166b741fb0f7d04e1
# good: [1545085a28f226b59c243f88b82ea25393b0d63f] drm: Allow for 44 bit 
user-tokens (or drm_file offsets)
git-bisect good 1545085a28f226b59c243f88b82ea25393b0d63f
# good: [c96e2c92072d3e78954c961f53d8c7352f7abbd7] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6
git-bisect good c96e2c92072d3e78954c961f53d8c7352f7abbd7
# good: [31c56d820e03a2fd47f81d6c826f92caf511f9ee] [POWERPC] pasemi: iommu 
support
git-bisect good 31c56d820e03a2fd47f81d6c826f92caf511f9ee
# bad: [78149df6d565c36675463352d0bfeb02b7a7] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
git-bisect bad 78149df6d565c36675463352d0bfeb02b7a7
# good: [3d9c18872fa1db5c43ab97d8cbca43775998e49c] shpchp: remove 
CONFIG_HOTPLUG_PCI_SHPC_POLL_EVENT_MODE
git-bisect good 3d9c18872fa1db5c43ab97d8cbca43775998e49c
# good: [88187dfa4d8bb565df762f272511d2c91e427e0d] MSI: Replace pci_msi_quirk 
with calls to pci_no_msi()
git-bisect good 88187dfa4d8bb565df762f272511d2c91e427e0d
# good: [866a8c87c4e51046602387953bbef76992107bcb] msi: Fix 
msi_remove_pci_irq_vectors.
git-bisect good 866a8c87c4e51046602387953bbef76992107bcb
# good: [f7feaca77d6ad6bcfcc88ac54e3188970448d6fe] msi: Make MSI useable more 
architectures
git-bisect good f7feaca77d6ad6bcfcc88ac54e3188970448d6fe
# good: [14719f325e1cd4ff757587e9a221ebaf394563ee] Revert "PCI: remove 
duplicate device id from ata_piix"
git-bisect good 14719f325e1cd4ff757587e9a221ebaf394563ee

which led me to a final 'bad' commit of 78149df6d565c36675463352d0bfeb02b7a7
which is a merge changeset of lots of PCI bits.
Seeing a couple of MSI changes in there, on a hunch I booted latest tree with
pci=nomsi, and it resumed again.

Any ideas how to further debug this?
I'll try backing out individual changes from that merge tomorrow.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21rc suspend to ram regression on Lenovo X60

2007-03-12 Thread Dave Jones
I spent considerable time over the last day or so bisecting to
find out why an X60 stopped resuming somewhen between 2.6.20 and current -git.
(Total lockup, black screen of death).

The bisect log looked like this.

git-bisect start
# bad: [c8f71b01a50597e298dc3214a2f2be7b8d31170c] Linux 2.6.21-rc1
git-bisect bad c8f71b01a50597e298dc3214a2f2be7b8d31170c
# good: [fa285a3d7924a0e3782926e51f16865c5129a2f7] Linux 2.6.20
git-bisect good fa285a3d7924a0e3782926e51f16865c5129a2f7
# bad: [574009c1a895aeeb85eaab29c235d75852b09eb8] Merge branch 'upstream' of 
git://ftp.linux-mips.org/pub/scm/upstream-linus
git-bisect bad 574009c1a895aeeb85eaab29c235d75852b09eb8
# bad: [43187902cbfafe73ede0144166b741fb0f7d04e1] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
git-bisect bad 43187902cbfafe73ede0144166b741fb0f7d04e1
# good: [1545085a28f226b59c243f88b82ea25393b0d63f] drm: Allow for 44 bit 
user-tokens (or drm_file offsets)
git-bisect good 1545085a28f226b59c243f88b82ea25393b0d63f
# good: [c96e2c92072d3e78954c961f53d8c7352f7abbd7] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/usb-2.6
git-bisect good c96e2c92072d3e78954c961f53d8c7352f7abbd7
# good: [31c56d820e03a2fd47f81d6c826f92caf511f9ee] [POWERPC] pasemi: iommu 
support
git-bisect good 31c56d820e03a2fd47f81d6c826f92caf511f9ee
# bad: [78149df6d565c36675463352d0bfeb02b7a7] Merge 
master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6
git-bisect bad 78149df6d565c36675463352d0bfeb02b7a7
# good: [3d9c18872fa1db5c43ab97d8cbca43775998e49c] shpchp: remove 
CONFIG_HOTPLUG_PCI_SHPC_POLL_EVENT_MODE
git-bisect good 3d9c18872fa1db5c43ab97d8cbca43775998e49c
# good: [88187dfa4d8bb565df762f272511d2c91e427e0d] MSI: Replace pci_msi_quirk 
with calls to pci_no_msi()
git-bisect good 88187dfa4d8bb565df762f272511d2c91e427e0d
# good: [866a8c87c4e51046602387953bbef76992107bcb] msi: Fix 
msi_remove_pci_irq_vectors.
git-bisect good 866a8c87c4e51046602387953bbef76992107bcb
# good: [f7feaca77d6ad6bcfcc88ac54e3188970448d6fe] msi: Make MSI useable more 
architectures
git-bisect good f7feaca77d6ad6bcfcc88ac54e3188970448d6fe
# good: [14719f325e1cd4ff757587e9a221ebaf394563ee] Revert PCI: remove 
duplicate device id from ata_piix
git-bisect good 14719f325e1cd4ff757587e9a221ebaf394563ee

which led me to a final 'bad' commit of 78149df6d565c36675463352d0bfeb02b7a7
which is a merge changeset of lots of PCI bits.
Seeing a couple of MSI changes in there, on a hunch I booted latest tree with
pci=nomsi, and it resumed again.

Any ideas how to further debug this?
I'll try backing out individual changes from that merge tomorrow.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/