subject:"2.6.23\-rc7\-mm1"

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-07 Thread Nish Aravamudan

On 10/5/07, Linas Vepstas <[EMAIL PROTECTED]> wrote:
> On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
> > On 10/2/07, Tony Breeds <[EMAIL PROTECTED]> wrote:
> > > On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
> > >
> > > > I realise it'll make the patch bigger, but this doesn't seem like a
> > > > particularly good name for the variable anymore.
> > >
> > > Sure, what about?
> > >
> > > Clarify when RTAS logging is enabled.
> > >
> > > Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
> >
> > For what it's worth, on a different ppc64 box, this resolves a similar
> > panic for me.
> >
> > Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>
>
> For the reasons explained, I'd really like to nack Tony's patch.

I see. Can you reply in this thread with the patch you mentioned in
your other reply? (or point me to a copy of it)

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-07 Thread Nish Aravamudan

On 10/5/07, Linas Vepstas [EMAIL PROTECTED] wrote:
 On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
  On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote:
   On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
  
I realise it'll make the patch bigger, but this doesn't seem like a
particularly good name for the variable anymore.
  
   Sure, what about?
  
   Clarify when RTAS logging is enabled.
  
   Signed-off-by: Tony Breeds [EMAIL PROTECTED]
 
  For what it's worth, on a different ppc64 box, this resolves a similar
  panic for me.
 
  Tested-by: Nishanth Aravamudan [EMAIL PROTECTED]

 For the reasons explained, I'd really like to nack Tony's patch.

I see. Can you reply in this thread with the patch you mentioned in
your other reply? (or point me to a copy of it)

Thanks,
Nish
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-05 Thread Linas Vepstas

On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
> On 10/2/07, Tony Breeds <[EMAIL PROTECTED]> wrote:
> > On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
> >
> > > I realise it'll make the patch bigger, but this doesn't seem like a
> > > particularly good name for the variable anymore.
> >
> > Sure, what about?
> >
> > Clarify when RTAS logging is enabled.
> >
> > Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
> 
> For what it's worth, on a different ppc64 box, this resolves a similar
> panic for me.
> 
> Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

For the reasons explained, I'd really like to nack Tony's patch.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-05 Thread Linas Vepstas

On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
 On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote:
  On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
 
   I realise it'll make the patch bigger, but this doesn't seem like a
   particularly good name for the variable anymore.
 
  Sure, what about?
 
  Clarify when RTAS logging is enabled.
 
  Signed-off-by: Tony Breeds [EMAIL PROTECTED]
 
 For what it's worth, on a different ppc64 box, this resolves a similar
 panic for me.
 
 Tested-by: Nishanth Aravamudan [EMAIL PROTECTED]

For the reasons explained, I'd really like to nack Tony's patch.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-04 Thread Nish Aravamudan

On 10/2/07, Tony Breeds <[EMAIL PROTECTED]> wrote:
> On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
>
> > I realise it'll make the patch bigger, but this doesn't seem like a
> > particularly good name for the variable anymore.
>
> Sure, what about?
>
> Clarify when RTAS logging is enabled.
>
> Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>

For what it's worth, on a different ppc64 box, this resolves a similar
panic for me.

Tested-by: Nishanth Aravamudan <[EMAIL PROTECTED]>

Thanks,
Nish
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-04 Thread Nish Aravamudan

On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote:
 On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:

  I realise it'll make the patch bigger, but this doesn't seem like a
  particularly good name for the variable anymore.

 Sure, what about?

 Clarify when RTAS logging is enabled.

 Signed-off-by: Tony Breeds [EMAIL PROTECTED]

For what it's worth, on a different ppc64 box, this resolves a similar
panic for me.

Tested-by: Nishanth Aravamudan [EMAIL PROTECTED]

Thanks,
Nish
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-10-03 Thread Jeff Garzik


Berck E. Nash wrote:

Greetings,

I get a few million of these on boot-- the system never actually boots.
Works fine in 2.6.23-rc7.

[   50.456012] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   50.462484] ata2.00: irq_stat 0x4001
[   50.466441] ata2.00: cmd e5/00:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 0x0 data 0
[   50.466442]  res 51/04:00:01:01:80/00:00:00:00:00/a0 Emask
0x1 (device error)
[   50.481914] ata2.00: status: {DRDY ERR }
[   50.485876] ata2.00: error: {ABRT }
[   50.489533] ata2.00: configured for UDMA/133
[   50.493839] ata2: EH complete


FWIW I haven't had time to debug this, so I'm going to simply revert the 
patch, and make sure it does not make it into 2.6.24.


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-03 Thread Linas Vepstas

On Wed, Oct 03, 2007 at 02:09:46PM +1000, Michael Ellerman wrote:
> 
> Until we initialise what exactly?

Until we allocate the error log buffer. The original crash was 
for a null-pointer deref of the unallocated buffer. I just sent 
out a patch to fix this; its a bit simpler than the below.

In that email, I remarked:

Andy Whitcroft's crash was appearently due to firmware complaining
about lost power, (actually, lost power supply redundancy!), which
occurred very early during boot.

Type0040 (EPOW)
Status: bypassed new
Residual error from previous boot.
EPOW Sensor Value:  0002
EPOW warning due to loss of redundancy.
EPOW general power fault.

I've no clue why firmware thought it was OK to report this
during one of the earliest calls to RTAS; I'm still investiigating
that.

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-03 Thread Linas Vepstas

On Wed, Oct 03, 2007 at 02:09:46PM +1000, Michael Ellerman wrote:
 
 Until we initialise what exactly?

Until we allocate the error log buffer. The original crash was 
for a null-pointer deref of the unallocated buffer. I just sent 
out a patch to fix this; its a bit simpler than the below.

In that email, I remarked:

Andy Whitcroft's crash was appearently due to firmware complaining
about lost power, (actually, lost power supply redundancy!), which
occurred very early during boot.

Type0040 (EPOW)
Status: bypassed new
Residual error from previous boot.
EPOW Sensor Value:  0002
EPOW warning due to loss of redundancy.
EPOW general power fault.

I've no clue why firmware thought it was OK to report this
during one of the earliest calls to RTAS; I'm still investiigating
that.

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-10-03 Thread Jeff Garzik


Berck E. Nash wrote:

Greetings,

I get a few million of these on boot-- the system never actually boots.
Works fine in 2.6.23-rc7.

[   50.456012] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   50.462484] ata2.00: irq_stat 0x4001
[   50.466441] ata2.00: cmd e5/00:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 0x0 data 0
[   50.466442]  res 51/04:00:01:01:80/00:00:00:00:00/a0 Emask
0x1 (device error)
[   50.481914] ata2.00: status: {DRDY ERR }
[   50.485876] ata2.00: error: {ABRT }
[   50.489533] ata2.00: configured for UDMA/133
[   50.493839] ata2: EH complete


FWIW I haven't had time to debug this, so I'm going to simply revert the 
patch, and make sure it does not make it into 2.6.24.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Michael Ellerman

On Wed, 2007-10-03 at 11:19 +1000, Tony Breeds wrote:
> On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
>  
> > I realise it'll make the patch bigger, but this doesn't seem like a
> > particularly good name for the variable anymore.
> 
> Sure, what about?

Better .. but  ..   :D

> diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
> b/arch/powerpc/platforms/pseries/rtasd.c
> index 30925d2..73401c8 100644
> --- a/arch/powerpc/platforms/pseries/rtasd.c
> +++ b/arch/powerpc/platforms/pseries/rtasd.c
> @@ -54,8 +54,9 @@ static unsigned int rtas_event_scan_rate;
>  static int full_rtas_msgs = 0;
>  
>  /* Stop logging to nvram after first fatal error */
> -static int no_more_logging;
> -
> +static int logging_enabled; /* Until we initialize everything,
> + * make sure we don't try logging
> + * anything */

Until we initialise what exactly?

>  static int error_log_cnt;
>  
>  /*
> @@ -217,7 +218,7 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
> int fatal)
>   }
>  
>   /* Write error to NVRAM */
> - if (!no_more_logging && !(err_type & ERR_FLAG_BOOT))
> + if (logging_enabled && !(err_type & ERR_FLAG_BOOT))
>   nvram_write_error_log(buf, len, err_type, error_log_cnt);
>  
>   /*
> @@ -229,8 +230,8 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
> int fatal)
>   printk_log_rtas(buf, len);
>  
>   /* Check to see if we need to or have stopped logging */
> - if (fatal || no_more_logging) {
> - no_more_logging = 1;
> + if (fatal || !logging_enabled) {
> + logging_enabled = 0;
>   spin_unlock_irqrestore(_log_lock, s);
>   return;
>   }

Hmmm, this routine has 4 separate lock-dropping exit paths ..

> @@ -302,7 +303,7 @@ static ssize_t rtas_log_read(struct file * file, char 
> __user * buf,
>  
>   spin_lock_irqsave(_log_lock, s);
>   /* if it's 0, then we know we got the last one (the one in NVRAM) */
> - if (rtas_log_size == 0 && !no_more_logging)
> + if (rtas_log_size == 0 && logging_enabled)
>   nvram_clear_error_log();
>   spin_unlock_irqrestore(_log_lock, s);
>  
> @@ -414,6 +415,8 @@ static int rtasd(void *unused)
>   memset(logdata, 0, rtas_error_log_max);
>   rc = nvram_read_error_log(logdata, rtas_error_log_max,
> _type, _log_cnt);
> + /* We can use rtas_log_buf now */
> + logging_enabled = 1;
>  
>   if (!rc) {
>   if (err_type != ERR_FLAG_ALREADY_LOGGED) {

What exactly happens that allows us to do logging? I don't see any
ordering between anything else and the setting of the flag, and AFAICT
we're not inside a spinlock or anything here.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Tony Breeds

On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
 
> I realise it'll make the patch bigger, but this doesn't seem like a
> particularly good name for the variable anymore.

Sure, what about?

Clarify when RTAS logging is enabled.

Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>

---
 arch/powerpc/platforms/pseries/rtasd.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
b/arch/powerpc/platforms/pseries/rtasd.c
index 30925d2..73401c8 100644
--- a/arch/powerpc/platforms/pseries/rtasd.c
+++ b/arch/powerpc/platforms/pseries/rtasd.c
@@ -54,8 +54,9 @@ static unsigned int rtas_event_scan_rate;
 static int full_rtas_msgs = 0;
 
 /* Stop logging to nvram after first fatal error */
-static int no_more_logging;
-
+static int logging_enabled; /* Until we initialize everything,
+ * make sure we don't try logging
+ * anything */
 static int error_log_cnt;
 
 /*
@@ -217,7 +218,7 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
int fatal)
}
 
/* Write error to NVRAM */
-   if (!no_more_logging && !(err_type & ERR_FLAG_BOOT))
+   if (logging_enabled && !(err_type & ERR_FLAG_BOOT))
nvram_write_error_log(buf, len, err_type, error_log_cnt);
 
/*
@@ -229,8 +230,8 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
int fatal)
printk_log_rtas(buf, len);
 
/* Check to see if we need to or have stopped logging */
-   if (fatal || no_more_logging) {
-   no_more_logging = 1;
+   if (fatal || !logging_enabled) {
+   logging_enabled = 0;
spin_unlock_irqrestore(_log_lock, s);
return;
}
@@ -302,7 +303,7 @@ static ssize_t rtas_log_read(struct file * file, char 
__user * buf,
 
spin_lock_irqsave(_log_lock, s);
/* if it's 0, then we know we got the last one (the one in NVRAM) */
-   if (rtas_log_size == 0 && !no_more_logging)
+   if (rtas_log_size == 0 && logging_enabled)
nvram_clear_error_log();
spin_unlock_irqrestore(_log_lock, s);
 
@@ -414,6 +415,8 @@ static int rtasd(void *unused)
memset(logdata, 0, rtas_error_log_max);
rc = nvram_read_error_log(logdata, rtas_error_log_max,
  _type, _log_cnt);
+   /* We can use rtas_log_buf now */
+   logging_enabled = 1;
 
if (!rc) {
if (err_type != ERR_FLAG_ALREADY_LOGGED) {

Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Michael Ellerman

On Wed, 2007-10-03 at 10:26 +1000, Tony Breeds wrote:
> On Tue, Oct 02, 2007 at 06:28:19PM -0500, Linas Vepstas wrote:
> > On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
> > > Seeing the following from an older power LPAR, pretty sure we had
> > > this in the previous -mm also:
> > 
> > I haven't forgetten about this ... and am looking at it now.
> > Seems that whenever I go to reserve the machine pSeries-102,
> > someone else is using it :-)
> 
> This panic is caused by "[POWERPC] pseries: Fix jumbled no_logging flag."
> (79c0108d1b9db4864ab77b2a95dfa04f2dcf264c), in the powerpc/for-2.6.24
> branch.  It looks to me that we have logging enabled too early now.
> 
> I think the following is a reasonable fix?
> 
> ---
> Explicitly enable RTAS error logging, when it should be ready.
> 
> 
> Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>
> 
> ---
> 
>  arch/powerpc/platforms/pseries/rtasd.c |7 ++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
> b/arch/powerpc/platforms/pseries/rtasd.c
> index 30925d2..0df5d0d 100644
> --- a/arch/powerpc/platforms/pseries/rtasd.c
> +++ b/arch/powerpc/platforms/pseries/rtasd.c
> @@ -54,7 +54,10 @@ static unsigned int rtas_event_scan_rate;
>  static int full_rtas_msgs = 0;
>  
>  /* Stop logging to nvram after first fatal error */
> -static int no_more_logging;
> +static int no_more_logging = 1; /* Until we initialize everything,
> + * make sure we don't try logging
> + * anything */
> +

I realise it'll make the patch bigger, but this doesn't seem like a
particularly good name for the variable anymore.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Tony Breeds

On Tue, Oct 02, 2007 at 06:28:19PM -0500, Linas Vepstas wrote:
> On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
> > Seeing the following from an older power LPAR, pretty sure we had
> > this in the previous -mm also:
> 
> I haven't forgetten about this ... and am looking at it now.
> Seems that whenever I go to reserve the machine pSeries-102,
> someone else is using it :-)

This panic is caused by "[POWERPC] pseries: Fix jumbled no_logging flag."
(79c0108d1b9db4864ab77b2a95dfa04f2dcf264c), in the powerpc/for-2.6.24
branch.  It looks to me that we have logging enabled too early now.

I think the following is a reasonable fix?

---
Explicitly enable RTAS error logging, when it should be ready.


Signed-off-by: Tony Breeds <[EMAIL PROTECTED]>

---

 arch/powerpc/platforms/pseries/rtasd.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
b/arch/powerpc/platforms/pseries/rtasd.c
index 30925d2..0df5d0d 100644
--- a/arch/powerpc/platforms/pseries/rtasd.c
+++ b/arch/powerpc/platforms/pseries/rtasd.c
@@ -54,7 +54,10 @@ static unsigned int rtas_event_scan_rate;
 static int full_rtas_msgs = 0;
 
 /* Stop logging to nvram after first fatal error */
-static int no_more_logging;
+static int no_more_logging = 1; /* Until we initialize everything,
+ * make sure we don't try logging
+ * anything */
+
 
 static int error_log_cnt;
 
@@ -414,6 +417,8 @@ static int rtasd(void *unused)
memset(logdata, 0, rtas_error_log_max);
rc = nvram_read_error_log(logdata, rtas_error_log_max,
  _type, _log_cnt);
+   /* We can use rtas_log_buf now */
+   no_more_logging = 0;
 
if (!rc) {
if (err_type != ERR_FLAG_ALREADY_LOGGED) {

Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Linas Vepstas

On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
> Seeing the following from an older power LPAR, pretty sure we had
> this in the previous -mm also:

I haven't forgetten about this ... and am looking at it now.
Seems that whenever I go to reserve the machine pSeries-102,
someone else is using it :-)

--linas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Linas Vepstas

On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
 Seeing the following from an older power LPAR, pretty sure we had
 this in the previous -mm also:

I haven't forgetten about this ... and am looking at it now.
Seems that whenever I go to reserve the machine pSeries-102,
someone else is using it :-)

--linas
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Tony Breeds

On Tue, Oct 02, 2007 at 06:28:19PM -0500, Linas Vepstas wrote:
 On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
  Seeing the following from an older power LPAR, pretty sure we had
  this in the previous -mm also:
 
 I haven't forgetten about this ... and am looking at it now.
 Seems that whenever I go to reserve the machine pSeries-102,
 someone else is using it :-)

This panic is caused by [POWERPC] pseries: Fix jumbled no_logging flag.
(79c0108d1b9db4864ab77b2a95dfa04f2dcf264c), in the powerpc/for-2.6.24
branch.  It looks to me that we have logging enabled too early now.

I think the following is a reasonable fix?

---
Explicitly enable RTAS error logging, when it should be ready.


Signed-off-by: Tony Breeds [EMAIL PROTECTED]

---

 arch/powerpc/platforms/pseries/rtasd.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
b/arch/powerpc/platforms/pseries/rtasd.c
index 30925d2..0df5d0d 100644
--- a/arch/powerpc/platforms/pseries/rtasd.c
+++ b/arch/powerpc/platforms/pseries/rtasd.c
@@ -54,7 +54,10 @@ static unsigned int rtas_event_scan_rate;
 static int full_rtas_msgs = 0;
 
 /* Stop logging to nvram after first fatal error */
-static int no_more_logging;
+static int no_more_logging = 1; /* Until we initialize everything,
+ * make sure we don't try logging
+ * anything */
+
 
 static int error_log_cnt;
 
@@ -414,6 +417,8 @@ static int rtasd(void *unused)
memset(logdata, 0, rtas_error_log_max);
rc = nvram_read_error_log(logdata, rtas_error_log_max,
  err_type, error_log_cnt);
+   /* We can use rtas_log_buf now */
+   no_more_logging = 0;
 
if (!rc) {
if (err_type != ERR_FLAG_ALREADY_LOGGED) {

Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Michael Ellerman

On Wed, 2007-10-03 at 10:26 +1000, Tony Breeds wrote:
 On Tue, Oct 02, 2007 at 06:28:19PM -0500, Linas Vepstas wrote:
  On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
   Seeing the following from an older power LPAR, pretty sure we had
   this in the previous -mm also:
  
  I haven't forgetten about this ... and am looking at it now.
  Seems that whenever I go to reserve the machine pSeries-102,
  someone else is using it :-)
 
 This panic is caused by [POWERPC] pseries: Fix jumbled no_logging flag.
 (79c0108d1b9db4864ab77b2a95dfa04f2dcf264c), in the powerpc/for-2.6.24
 branch.  It looks to me that we have logging enabled too early now.
 
 I think the following is a reasonable fix?
 
 ---
 Explicitly enable RTAS error logging, when it should be ready.
 
 
 Signed-off-by: Tony Breeds [EMAIL PROTECTED]
 
 ---
 
  arch/powerpc/platforms/pseries/rtasd.c |7 ++-
  1 files changed, 6 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
 b/arch/powerpc/platforms/pseries/rtasd.c
 index 30925d2..0df5d0d 100644
 --- a/arch/powerpc/platforms/pseries/rtasd.c
 +++ b/arch/powerpc/platforms/pseries/rtasd.c
 @@ -54,7 +54,10 @@ static unsigned int rtas_event_scan_rate;
  static int full_rtas_msgs = 0;
  
  /* Stop logging to nvram after first fatal error */
 -static int no_more_logging;
 +static int no_more_logging = 1; /* Until we initialize everything,
 + * make sure we don't try logging
 + * anything */
 +

I realise it'll make the patch bigger, but this doesn't seem like a
particularly good name for the variable anymore.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Tony Breeds

On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
 
 I realise it'll make the patch bigger, but this doesn't seem like a
 particularly good name for the variable anymore.

Sure, what about?

Clarify when RTAS logging is enabled.

Signed-off-by: Tony Breeds [EMAIL PROTECTED]

---
 arch/powerpc/platforms/pseries/rtasd.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
b/arch/powerpc/platforms/pseries/rtasd.c
index 30925d2..73401c8 100644
--- a/arch/powerpc/platforms/pseries/rtasd.c
+++ b/arch/powerpc/platforms/pseries/rtasd.c
@@ -54,8 +54,9 @@ static unsigned int rtas_event_scan_rate;
 static int full_rtas_msgs = 0;
 
 /* Stop logging to nvram after first fatal error */
-static int no_more_logging;
-
+static int logging_enabled; /* Until we initialize everything,
+ * make sure we don't try logging
+ * anything */
 static int error_log_cnt;
 
 /*
@@ -217,7 +218,7 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
int fatal)
}
 
/* Write error to NVRAM */
-   if (!no_more_logging  !(err_type  ERR_FLAG_BOOT))
+   if (logging_enabled  !(err_type  ERR_FLAG_BOOT))
nvram_write_error_log(buf, len, err_type, error_log_cnt);
 
/*
@@ -229,8 +230,8 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
int fatal)
printk_log_rtas(buf, len);
 
/* Check to see if we need to or have stopped logging */
-   if (fatal || no_more_logging) {
-   no_more_logging = 1;
+   if (fatal || !logging_enabled) {
+   logging_enabled = 0;
spin_unlock_irqrestore(rtasd_log_lock, s);
return;
}
@@ -302,7 +303,7 @@ static ssize_t rtas_log_read(struct file * file, char 
__user * buf,
 
spin_lock_irqsave(rtasd_log_lock, s);
/* if it's 0, then we know we got the last one (the one in NVRAM) */
-   if (rtas_log_size == 0  !no_more_logging)
+   if (rtas_log_size == 0  logging_enabled)
nvram_clear_error_log();
spin_unlock_irqrestore(rtasd_log_lock, s);
 
@@ -414,6 +415,8 @@ static int rtasd(void *unused)
memset(logdata, 0, rtas_error_log_max);
rc = nvram_read_error_log(logdata, rtas_error_log_max,
  err_type, error_log_cnt);
+   /* We can use rtas_log_buf now */
+   logging_enabled = 1;
 
if (!rc) {
if (err_type != ERR_FLAG_ALREADY_LOGGED) {

Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Michael Ellerman

On Wed, 2007-10-03 at 11:19 +1000, Tony Breeds wrote:
 On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
  
  I realise it'll make the patch bigger, but this doesn't seem like a
  particularly good name for the variable anymore.
 
 Sure, what about?

Better .. but  ..   :D

 diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
 b/arch/powerpc/platforms/pseries/rtasd.c
 index 30925d2..73401c8 100644
 --- a/arch/powerpc/platforms/pseries/rtasd.c
 +++ b/arch/powerpc/platforms/pseries/rtasd.c
 @@ -54,8 +54,9 @@ static unsigned int rtas_event_scan_rate;
  static int full_rtas_msgs = 0;
  
  /* Stop logging to nvram after first fatal error */
 -static int no_more_logging;
 -
 +static int logging_enabled; /* Until we initialize everything,
 + * make sure we don't try logging
 + * anything */

Until we initialise what exactly?

  static int error_log_cnt;
  
  /*
 @@ -217,7 +218,7 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
 int fatal)
   }
  
   /* Write error to NVRAM */
 - if (!no_more_logging  !(err_type  ERR_FLAG_BOOT))
 + if (logging_enabled  !(err_type  ERR_FLAG_BOOT))
   nvram_write_error_log(buf, len, err_type, error_log_cnt);
  
   /*
 @@ -229,8 +230,8 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
 int fatal)
   printk_log_rtas(buf, len);
  
   /* Check to see if we need to or have stopped logging */
 - if (fatal || no_more_logging) {
 - no_more_logging = 1;
 + if (fatal || !logging_enabled) {
 + logging_enabled = 0;
   spin_unlock_irqrestore(rtasd_log_lock, s);
   return;
   }

Hmmm, this routine has 4 separate lock-dropping exit paths ..

 @@ -302,7 +303,7 @@ static ssize_t rtas_log_read(struct file * file, char 
 __user * buf,
  
   spin_lock_irqsave(rtasd_log_lock, s);
   /* if it's 0, then we know we got the last one (the one in NVRAM) */
 - if (rtas_log_size == 0  !no_more_logging)
 + if (rtas_log_size == 0  logging_enabled)
   nvram_clear_error_log();
   spin_unlock_irqrestore(rtasd_log_lock, s);
  
 @@ -414,6 +415,8 @@ static int rtasd(void *unused)
   memset(logdata, 0, rtas_error_log_max);
   rc = nvram_read_error_log(logdata, rtas_error_log_max,
 err_type, error_log_cnt);
 + /* We can use rtas_log_buf now */
 + logging_enabled = 1;
  
   if (!rc) {
   if (err_type != ERR_FLAG_ALREADY_LOGGED) {

What exactly happens that allows us to do logging? I don't see any
ordering between anything else and the setting of the flag, and AFAICT
we're not inside a spinlock or anything here.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part

Re: [linux-usb-devel] 2.6.23-rc7-mm1

2007-09-30 Thread Jiri Slaby

On 09/24/2007 09:41 PM, Alan Stern wrote:
> On Mon, 24 Sep 2007, Jiri Slaby wrote:
> 
>> Hmm, I have usb legacy keyboard switched on because of grub and bios to 
>> allow me
>>  typing.
>>
>> I booted 23-rc7 4 times, and the latest -mm 3 times just now and can't 
>> reproduce
>> it, I just wonder by what is this conditioned.
> 
> Warm boot vs. cold boot, maybe.

Hmm, no. I don't know, I can't see it anymore so far (using rc8-mm2). I'll keep
eyes on it, anyways.

thanks,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] 2.6.23-rc7-mm1

2007-09-30 Thread Jiri Slaby

On 09/24/2007 09:41 PM, Alan Stern wrote:
 On Mon, 24 Sep 2007, Jiri Slaby wrote:
 
 Hmm, I have usb legacy keyboard switched on because of grub and bios to 
 allow me
  typing.

 I booted 23-rc7 4 times, and the latest -mm 3 times just now and can't 
 reproduce
 it, I just wonder by what is this conditioned.
 
 Warm boot vs. cold boot, maybe.

Hmm, no. I don't know, I can't see it anymore so far (using rc8-mm2). I'll keep
eyes on it, anyways.

thanks,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-26 Thread Tejun Heo

Berck E. Nash wrote:
> Bernd Schmidt wrote:
>> One of these appears in my system as well (ASUS P5W-DH Deluxe
>> mainboard).  Here's the hdparm output:
> 
> Yup, same mainboard here.
> 
>> Since about 2.6.17 or 2.6.18, it has been causing long delays while
>> booting:
>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> ata2.00: qc timeout (cmd 0xec)
>> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
>> ata2: port is slow to respond, please be patient (Status 0x80)
>> ata2: COMRESET failed (errno=-16)
>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
>> ata2.00: 640 sectors, multi 1: LBA
>> ata2.00: configured for UDMA/133
> 
> And yup, same problem with the painful boot delays since 2.6.18.  Tejun
> indicated that a fix would get merged with 2.6.23, but that didn't
> happen.  Here's hoping something makes it into .24!

Yeah, it is the sil4726 virtual device which is really crappy as an ATA
device.  About the fix, I thought PMP support would fix it but the
controller on P5W-DH doesn't support PMP.  It can only talk to the
virtual device or the device attached to the first port depending on how
the PMP chip is configured.  It seems we'll have to blacklist the
mainboard and skip or use modified reset sequence on the affected port,
so that's why the fix was delayed.  I'm currently on the road but I'll
look into it when I get back (next week).

Thanks.

-- 
tejun

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-26 Thread Berck E. Nash

Bernd Schmidt wrote:
> One of these appears in my system as well (ASUS P5W-DH Deluxe
> mainboard).  Here's the hdparm output:

Yup, same mainboard here.

> Since about 2.6.17 or 2.6.18, it has been causing long delays while
> booting:
> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
> ata2: port is slow to respond, please be patient (Status 0x80)
> ata2: COMRESET failed (errno=-16)
> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
> ata2.00: 640 sectors, multi 1: LBA
> ata2.00: configured for UDMA/133

And yup, same problem with the painful boot delays since 2.6.18.  Tejun
indicated that a fix would get merged with 2.6.23, but that didn't
happen.  Here's hoping something makes it into .24!

Berck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-26 Thread Berck E. Nash

Jeff Garzik wrote:
> Would it also be possible for you to send along 'hdparm --Istdout'
> output for your config disk thingy, /dev/sdd ?

Sure, just don't ask me what it is!  (I've generally assumed that
writing to it would be a bad idea.)

Berck

/dev/sdd:
0040 3fff c837 0010   003f 
  3030 3030 3030 315f 5f5f 5f5f
5f5f 5f5f 5f30 5f45 0003 3e00 0004 5247
4c31 3033 3634 436f 6e66 6967 2020 4469
736b 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8001
 2f00 4000 0200  0007 3fff 0010
003f fc10 00fb 0101 0280   0407
0003 0078 0078 0078 0078   
    0201   
007e 001b 0068 5060 4000  1000 4000
407f    fffe  c0fe 
    0002   
       
       
       
0001       
       
      0017 2040
       
       
       
       
       
       
       
       
       
       
       
       
       b4a5

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-26 Thread Bernd Schmidt


Jeff Garzik wrote:
Would it also be possible for you to send along 'hdparm --Istdout' 
output for your config disk thingy, /dev/sdd ?


One of these appears in my system as well (ASUS P5W-DH Deluxe 
mainboard).  Here's the hdparm output:


/dev/sdb:
0040 3fff c837 0010   003f 
  3030 3030 3030 305f 5f5f 5f5f
5f5f 5f5f 5f30 5f41 0003 3e00 0004 5247
4c31 3033 3634 436f 6e66 6967 2020 4469
736b 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8001
 2f00 4000 0200  0007 3fff 0010
003f fc10 00fb 0101 0280   0407
0003 0078 0078 0078 0078   
    0201   
007e 001b 0068 5060 4000  1000 4000
407f    fffe  c0fe 
    0001   
       
       
       
0001       
       
      0017 2040
       
       
       
       
       
       
       
       
       
       
       
       
       baa5

Since about 2.6.17 or 2.6.18, it has been causing long delays while booting:
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: COMRESET failed (errno=-16)
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
ata2.00: 640 sectors, multi 1: LBA
ata2.00: configured for UDMA/133


Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-26 Thread Bernd Schmidt


Jeff Garzik wrote:
Would it also be possible for you to send along 'hdparm --Istdout' 
output for your config disk thingy, /dev/sdd ?


One of these appears in my system as well (ASUS P5W-DH Deluxe 
mainboard).  Here's the hdparm output:


/dev/sdb:
0040 3fff c837 0010   003f 
  3030 3030 3030 305f 5f5f 5f5f
5f5f 5f5f 5f30 5f41 0003 3e00 0004 5247
4c31 3033 3634 436f 6e66 6967 2020 4469
736b 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8001
 2f00 4000 0200  0007 3fff 0010
003f fc10 00fb 0101 0280   0407
0003 0078 0078 0078 0078   
    0201   
007e 001b 0068 5060 4000  1000 4000
407f    fffe  c0fe 
    0001   
       
       
       
0001       
       
      0017 2040
       
       
       
       
       
       
       
       
       
       
       
       
       baa5

Since about 2.6.17 or 2.6.18, it has been causing long delays while booting:
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: COMRESET failed (errno=-16)
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
ata2.00: 640 sectors, multi 1: LBA
ata2.00: configured for UDMA/133


Bernd
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-26 Thread Berck E. Nash

Jeff Garzik wrote:
 Would it also be possible for you to send along 'hdparm --Istdout'
 output for your config disk thingy, /dev/sdd ?

Sure, just don't ask me what it is!  (I've generally assumed that
writing to it would be a bad idea.)

Berck

/dev/sdd:
0040 3fff c837 0010   003f 
  3030 3030 3030 315f 5f5f 5f5f
5f5f 5f5f 5f30 5f45 0003 3e00 0004 5247
4c31 3033 3634 436f 6e66 6967 2020 4469
736b 2020 2020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8001
 2f00 4000 0200  0007 3fff 0010
003f fc10 00fb 0101 0280   0407
0003 0078 0078 0078 0078   
    0201   
007e 001b 0068 5060 4000  1000 4000
407f    fffe  c0fe 
    0002   
       
       
       
0001       
       
      0017 2040
       
       
       
       
       
       
       
       
       
       
       
       
       b4a5

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-26 Thread Berck E. Nash

Bernd Schmidt wrote:
 One of these appears in my system as well (ASUS P5W-DH Deluxe
 mainboard).  Here's the hdparm output:

Yup, same mainboard here.

 Since about 2.6.17 or 2.6.18, it has been causing long delays while
 booting:
 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata2.00: qc timeout (cmd 0xec)
 ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
 ata2: port is slow to respond, please be patient (Status 0x80)
 ata2: COMRESET failed (errno=-16)
 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
 ata2.00: 640 sectors, multi 1: LBA
 ata2.00: configured for UDMA/133

And yup, same problem with the painful boot delays since 2.6.18.  Tejun
indicated that a fix would get merged with 2.6.23, but that didn't
happen.  Here's hoping something makes it into .24!

Berck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-26 Thread Tejun Heo

Berck E. Nash wrote:
 Bernd Schmidt wrote:
 One of these appears in my system as well (ASUS P5W-DH Deluxe
 mainboard).  Here's the hdparm output:
 
 Yup, same mainboard here.
 
 Since about 2.6.17 or 2.6.18, it has been causing long delays while
 booting:
 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata2.00: qc timeout (cmd 0xec)
 ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
 ata2: port is slow to respond, please be patient (Status 0x80)
 ata2: COMRESET failed (errno=-16)
 ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
 ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
 ata2.00: 640 sectors, multi 1: LBA
 ata2.00: configured for UDMA/133
 
 And yup, same problem with the painful boot delays since 2.6.18.  Tejun
 indicated that a fix would get merged with 2.6.23, but that didn't
 happen.  Here's hoping something makes it into .24!

Yeah, it is the sil4726 virtual device which is really crappy as an ATA
device.  About the fix, I thought PMP support would fix it but the
controller on P5W-DH doesn't support PMP.  It can only talk to the
virtual device or the device attached to the first port depending on how
the PMP chip is configured.  It seems we'll have to blacklist the
mainboard and skip or use modified reset sequence on the affected port,
so that's why the fix was delayed.  I'm currently on the road but I'll
look into it when I get back (next week).

Thanks.

-- 
tejun

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik

Would it also be possible for you to send along 'hdparm --Istdout' 
output for your config disk thingy, /dev/sdd ?


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik


Berck E. Nash wrote:

Jeff Garzik wrote:

Does the attached patch change behavior at all?  You should be able to
apply it on top of libata-dev.git#upstream or -mm.


Still broken, dmesg with ATA_DEBUG defined, attached.


Great, this will be useful output.  It will probably be a couple days 
before my next patch.  In the meantime, you can extract the bad commit 
to a patch


git-diff-tree -p 268fe6f9f15551be9abedd44a237392675d529d5 > \
/tmp/patch

and then revert it locally in your kernel tree

patch -sp1 -R < /tmp/patch

to temporarily work around this.

I will definitely make sure this is either fixed or reverted before it 
goes upstream to Linus.


Thanks,

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik


Berck E. Nash wrote:

Jeff Garzik wrote:

Once the blame has been squared fixed upon me :) you can use git-bisect
to locate the precise change that broke your setup.


Okay, here's the problem:

268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik <[EMAIL PROTECTED]>
Date:   Fri Sep 21 07:09:36 2007 -0400

[libata] SCSI: simple TEST UNIT READY simulation

It's trivial to ping the device, and that's a much more sane behavior
than no-op.

Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

:04 04 44d34cdad073bd623545b8239aca9a113652c6d0
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M  drivers


Does the attached patch change behavior at all?  You should be able to 
apply it on top of libata-dev.git#upstream or -mm.


If there are still problems, an updated dmesg (w/ the attached patch) 
and output from enabling ATA_DEBUG (include/linux/libata.h) would be 
very helpful.


Thanks!

Jeff


diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 3882c72..c9838f1 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2800,7 +2800,9 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct 
ata_device *dev, u8 cmd)
return ata_scsi_start_stop_xlat;
 
case TEST_UNIT_READY:
-   return ata_scsi_tur_xlat;
+   if (ata_id_has_pm(dev->id))
+   return ata_scsi_tur_xlat;
+   return NULL;
}
 
return NULL;
@@ -3021,6 +3023,7 @@ void ata_scsi_simulate(struct ata_device *dev, struct 
scsi_cmnd *cmd,
case REZERO_UNIT:
case SEEK_6:
case SEEK_10:
+   case TEST_UNIT_READY:   /* only for !PM devices */
ata_scsi_rbuf_fill(, ata_scsiop_noop);
break;

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik


Robert Hancock wrote:
ATA spec says "The device shall return command aborted if the device 
does not support the Power Management feature set." Whereas TEST UNIT 
READY is required for SCSI. It seems the SAT authors didn't consider 
this case.



Dumb me -- I misread that as mandatory.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Berck E. Nash wrote:
> hdparm output attached.

Whoops, it really is this time.


/dev/sde:
427a 3fff  0010 e100 0258 003f 
 000e 5744 2d57 4d41 4b48 3131 3235
3131 3700   0003 4000 004a 3331
2e30 3846 3331 5744 4320 5744 3336 3047
442d 3030 464c 4132 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
 2f00 4001 0280  0007 3fff 0010
003f fc10 00fb 0110 44e0 044f  0007
0003 0078 0078 0078 0078   
   001f 0202   
007e  74eb 7f63 4003 74e9 3e43 4003
407f      80fe 
    44e0 044f  
       
       
       
0001 0141    0746  
      0002 0001
       
       001f
       
       
       
       
       
      001f 
       
       
       
       
       
       8da5

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Robert Hancock


Jeff Garzik wrote:

Berck E. Nash wrote:

Jeff Garzik wrote:

Once the blame has been squared fixed upon me :) you can use git-bisect
to locate the precise change that broke your setup.


Okay, here's the problem:

268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik <[EMAIL PROTECTED]>
Date:   Fri Sep 21 07:09:36 2007 -0400

[libata] SCSI: simple TEST UNIT READY simulation

It's trivial to ping the device, and that's a much more sane behavior
than no-op.

Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

:04 04 44d34cdad073bd623545b8239aca9a113652c6d0
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M  drivers


Thanks for debugging!

Can you tell me something about this device?

[   49.045635] ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
[   49.051677] ata2.00: 640 sectors, multi 1: LBA
[   49.056321] ata2.00: configured for UDMA/133

It seems like it does not support the 'check power mode' command.

Can you post a text file attachment, containing the output of 'hdparm 
--Istdout' ?


ATA spec says "The device shall return command aborted if the device 
does not support the Power Management feature set." Whereas TEST UNIT 
READY is required for SCSI. It seems the SAT authors didn't consider 
this case.


I assume we can tell from the identify data that the device doesn't 
support power management and just fake success for TEST UNIT READY in 
this case?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Jeff Garzik wrote:
> Can you tell me something about this device?
> 
> [   49.045635] ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
> [   49.051677] ata2.00: 640 sectors, multi 1: LBA
> [   49.056321] ata2.00: configured for UDMA/133
> 
> It seems like it does not support the 'check power mode' command.
> 
> Can you post a text file attachment, containing the output of 'hdparm
> --Istdout' ?

No problem.  The device in question is a Western Digital Raptor WD360GD
36.7GB 10,000 RPM Serial ATA150 Hard Drive.

hdparm output attached.

Berck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik


Berck E. Nash wrote:

Jeff Garzik wrote:

Once the blame has been squared fixed upon me :) you can use git-bisect
to locate the precise change that broke your setup.


Okay, here's the problem:

268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik <[EMAIL PROTECTED]>
Date:   Fri Sep 21 07:09:36 2007 -0400

[libata] SCSI: simple TEST UNIT READY simulation

It's trivial to ping the device, and that's a much more sane behavior
than no-op.

Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

:04 04 44d34cdad073bd623545b8239aca9a113652c6d0
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M  drivers


Thanks for debugging!

Can you tell me something about this device?

[   49.045635] ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
[   49.051677] ata2.00: 640 sectors, multi 1: LBA
[   49.056321] ata2.00: configured for UDMA/133

It seems like it does not support the 'check power mode' command.

Can you post a text file attachment, containing the output of 'hdparm 
--Istdout' ?


Jeff



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Jeff Garzik wrote:
> Once the blame has been squared fixed upon me :) you can use git-bisect
> to locate the precise change that broke your setup.

Okay, here's the problem:

268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik <[EMAIL PROTECTED]>
Date:   Fri Sep 21 07:09:36 2007 -0400

[libata] SCSI: simple TEST UNIT READY simulation

It's trivial to ping the device, and that's a much more sane behavior
than no-op.

Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

:04 04 44d34cdad073bd623545b8239aca9a113652c6d0
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M  drivers

Berck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jens Axboe

On Tue, Sep 25 2007, Berck E. Nash wrote:
> Jens Axboe wrote:
> > On Tue, Sep 25 2007, Berck E. Nash wrote:
> >> Jeff Garzik wrote:
> >>
> >>> The first step would be to clone the "upstream" branch of
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
> >>>
> >>> and see if the problem is reproducible there.  If yes, then you have
> >>> narrowed down the problem to something my ATA devel tree has introduced
> >>> into -mm.
> >> Nope, you're off the hook.  The libata tree works great, so it must be
> >> something else in -mm conflicting.
> 
> Whoops, sorry!  I just lied.  I'm a git newbie, and failed to actually
> get the "upstream" branch the first time, so rc8 is clean, but it fails
> when I actually pull the upstream branch.  I'll git bisect and get back
> to you.

OK, you probably realize this, but you can forget about the git-block
testing for now then.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Jens Axboe wrote:
> On Tue, Sep 25 2007, Berck E. Nash wrote:
>> Jeff Garzik wrote:
>>
>>> The first step would be to clone the "upstream" branch of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
>>>
>>> and see if the problem is reproducible there.  If yes, then you have
>>> narrowed down the problem to something my ATA devel tree has introduced
>>> into -mm.
>> Nope, you're off the hook.  The libata tree works great, so it must be
>> something else in -mm conflicting.

Whoops, sorry!  I just lied.  I'm a git newbie, and failed to actually
get the "upstream" branch the first time, so rc8 is clean, but it fails
when I actually pull the upstream branch.  I'll git bisect and get back
to you.

BErck
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jens Axboe

On Tue, Sep 25 2007, Berck E. Nash wrote:
> Jeff Garzik wrote:
> 
> > The first step would be to clone the "upstream" branch of
> > git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
> > 
> > and see if the problem is reproducible there.  If yes, then you have
> > narrowed down the problem to something my ATA devel tree has introduced
> > into -mm.
> 
> Nope, you're off the hook.  The libata tree works great, so it must be
> something else in -mm conflicting.

Can you try 2.6.23-rc8 plus this patch:

http://brick.kernel.dk/git-block.patch.bz2

and see if that works?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Jeff Garzik wrote:

> The first step would be to clone the "upstream" branch of
> git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
> 
> and see if the problem is reproducible there.  If yes, then you have
> narrowed down the problem to something my ATA devel tree has introduced
> into -mm.

Nope, you're off the hook.  The libata tree works great, so it must be
something else in -mm conflicting.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: panic in scheduler

2007-09-25 Thread Lee Schermerhorn

On Tue, 2007-09-25 at 13:32 +0530, Kamalesh Babulal wrote:
> Balbir Singh wrote:
> > On 9/25/07, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> >> Exactly same call trace is produced over IA64 Madison (up to 9M cache) 
> >> with 8 cpu's.
> >> --
> > 
> > Hi, Kamalesh,
> > 
> > Could you please reproduce the problem or share the steps to reproduce
> > the problem?
> > 
> > Thanks,
> > Balbir
> > -
> 
> Hi Balbir,
> 
> Yes, i am able to reproduce the problem. The problem can be reproduced
> using the ltprunall.
> 

I see the problem just trying to boot.  I have yet to successfully boot
23-rc7-mm1 on my platform.  [But, I'll try Ingo's dev tree real soon
now...]

Lee

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Kamalesh Babulal

Peter Zijlstra wrote:
> On Mon, 24 Sep 2007 21:20:58 +0200 Peter Zijlstra
> <[EMAIL PROTECTED]> wrote:
> 
 Nope, and the stacktrace is utterly puzzling.

 /me goes read the lkml.org link

 Kamalesh Babulal: do you still get:
   BUG: spinlock bad magic on

 msgs?

 Because those I could reproduce using fsx, and I fixed all that.
>>> Hi Peter,
>>>
>>> I do not get BUG: spinlock bad magic messages any more, but the softlock 
>>> message is
>>> thrown more than 30 time, while running the ltp runall.
>> It would be good to know what function on_each_cpu is executing, could
>> you try something like:
> 
> I've just completed 2 full ltp runs on a dual-core opteron machine but
> could not reproduce this problem.
> 
> Kamalesh, would it be possible for you to reproduce with that patch, so
> we can see what function is holding up the cpu?

Hi Peter,

After running the test with the patch you provided, i observed an oops message
which was at the top of the these soft lockup message and the oops is the same 
as 
the oops reported at http://lkml.org/lkml/2007/9/24/107.

And when i applied the patch for the oops proposed at 
http://lkml.org/lkml/2007/9/25/57 the oops as well as the soft lockup's are not 
seen.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Jens Axboe

On Tue, Sep 25 2007, Mel Gorman wrote:
> On (25/09/07 12:31), Jens Axboe didst pronounce:
> > On Tue, Sep 25 2007, Mel Gorman wrote:
> > > On (25/09/07 01:11), Kamalesh Babulal didst pronounce:
> > > 
> > > Hi Kamalesh,
> > > 
> > > > The build fails with following error
> > > > 
> > > > CC drivers/block/ps3disk.o
> > > > drivers/block/ps3disk.c: In function ???ps3disk_scatter_gather???:
> > > > drivers/block/ps3disk.c:115: error: ???bio??? undeclared (first use in 
> > > > this
> > > > function)
> > > > drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
> > > > reported only once
> > > > drivers/block/ps3disk.c:115: error: for each function it appears in.)
> > > > drivers/block/ps3disk.c:115: error: ???j??? undeclared (first use in 
> > > > this
> > > > function)
> > > > drivers/block/ps3disk.c:116: error: implicit declaration of function
> > > > ???bio_kunmap_bvec???
> > > > make[2]: *** [drivers/block/ps3disk.o] Error 1
> > > > make[1]: *** [drivers/block] Error 2
> > > > make: *** [drivers] Error 2
> > > > 
> > > > The function bio_kunmap_bvec is missing.I tried checking the 
> > > > git-block.patch
> > > > as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
> > > > find this function.
> > > > 
> > > > Previously this function was replaced by __bio_kunmap_atomic();
> > > > This patch does not solves the implicit "declaration of function
> > > > ???bio_kunmap_bvec???"
> > > > 
> > > > Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]
> > > > >
> > > 
> > > Your mailer appears to have mangled both your signoff and the whitespace 
> > > in
> > > the patch and it does not apply. However, fixing it does not solve the 
> > > problem
> > > because of this mysterious bio_kunmap_bvec() that is only referenced by 
> > > this
> > > driver. Was it accidently added during the addition of sg chaining 
> > > support?
> > 
> > This should fix things up.
> > 
> 
> This builds although I lack the hardware to really test it. However, in
> 2.6.23-rc8-mm1 it collides with git-block-ps3disk-fix.patch. This is a
> version on top of that stack but I guess the best thing to do is replace
> git-block-ps3disk-fix.patch with Jens patch once it is signed off.
> 
> Not signing off because this is just a rebase. Assuming the other one
> gets signed off, consider it;

Thanks, but I already integrated the fix into the existing patch, so
that bisect will work.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Mel Gorman

On (25/09/07 12:31), Jens Axboe didst pronounce:
> On Tue, Sep 25 2007, Mel Gorman wrote:
> > On (25/09/07 01:11), Kamalesh Babulal didst pronounce:
> > 
> > Hi Kamalesh,
> > 
> > > The build fails with following error
> > > 
> > > CC drivers/block/ps3disk.o
> > > drivers/block/ps3disk.c: In function ???ps3disk_scatter_gather???:
> > > drivers/block/ps3disk.c:115: error: ???bio??? undeclared (first use in 
> > > this
> > > function)
> > > drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
> > > reported only once
> > > drivers/block/ps3disk.c:115: error: for each function it appears in.)
> > > drivers/block/ps3disk.c:115: error: ???j??? undeclared (first use in this
> > > function)
> > > drivers/block/ps3disk.c:116: error: implicit declaration of function
> > > ???bio_kunmap_bvec???
> > > make[2]: *** [drivers/block/ps3disk.o] Error 1
> > > make[1]: *** [drivers/block] Error 2
> > > make: *** [drivers] Error 2
> > > 
> > > The function bio_kunmap_bvec is missing.I tried checking the 
> > > git-block.patch
> > > as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
> > > find this function.
> > > 
> > > Previously this function was replaced by __bio_kunmap_atomic();
> > > This patch does not solves the implicit "declaration of function
> > > ???bio_kunmap_bvec???"
> > > 
> > > Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]
> > > >
> > 
> > Your mailer appears to have mangled both your signoff and the whitespace in
> > the patch and it does not apply. However, fixing it does not solve the 
> > problem
> > because of this mysterious bio_kunmap_bvec() that is only referenced by this
> > driver. Was it accidently added during the addition of sg chaining support?
> 
> This should fix things up.
> 

This builds although I lack the hardware to really test it. However, in
2.6.23-rc8-mm1 it collides with git-block-ps3disk-fix.patch. This is a
version on top of that stack but I guess the best thing to do is replace
git-block-ps3disk-fix.patch with Jens patch once it is signed off.

Not signing off because this is just a rebase. Assuming the other one
gets signed off, consider it;

Acked-by: Mel Gorman <[EMAIL PROTECTED]>

--- 

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.23-rc8-mm1-clean/drivers/block/ps3disk.c 
linux-2.6.23-rc8-mm1-fix-ps3disk/drivers/block/ps3disk.c
--- linux-2.6.23-rc8-mm1-clean/drivers/block/ps3disk.c  2007-09-25 
12:05:40.0 +0100
+++ linux-2.6.23-rc8-mm1-fix-ps3disk/drivers/block/ps3disk.c2007-09-25 
12:09:19.0 +0100
@@ -106,14 +106,14 @@ static void ps3disk_scatter_gather(struc
(unsigned long)iter.bio->bi_sector);
 
size = bvec->bv_len;
-   buf = bvec_kmap_irq(bvec, flags);
+   buf = bvec_kmap_irq(bvec, );
if (gather)
memcpy(dev->bounce_buf+offset, buf, size);
else
memcpy(buf, dev->bounce_buf+offset, size);
offset += size;
-   flush_kernel_dcache_page(bio_iovec_idx(iter.bio, 
iter.i)->bv_page);
-   bio_kunmap_bvec(bvec, flags);
+   flush_kernel_dcache_page(bvec->bv_page);
+   bvec_kunmap_irq(buf, );
i++;
}
 }

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Peter Zijlstra

On Mon, 24 Sep 2007 21:20:58 +0200 Peter Zijlstra
<[EMAIL PROTECTED]> wrote:

> > > Nope, and the stacktrace is utterly puzzling.
> > > 
> > > /me goes read the lkml.org link
> > > 
> > > Kamalesh Babulal: do you still get:
> > >   BUG: spinlock bad magic on
> > > 
> > > msgs?
> > > 
> > > Because those I could reproduce using fsx, and I fixed all that.
> > Hi Peter,
> > 
> > I do not get BUG: spinlock bad magic messages any more, but the softlock 
> > message is
> > thrown more than 30 time, while running the ltp runall.
> 
> It would be good to know what function on_each_cpu is executing, could
> you try something like:

I've just completed 2 full ltp runs on a dual-core opteron machine but
could not reproduce this problem.

Kamalesh, would it be possible for you to reproduce with that patch, so
we can see what function is holding up the cpu?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Jens Axboe

On Tue, Sep 25 2007, Mel Gorman wrote:
> On (25/09/07 01:11), Kamalesh Babulal didst pronounce:
> 
> Hi Kamalesh,
> 
> > The build fails with following error
> > 
> > CC drivers/block/ps3disk.o
> > drivers/block/ps3disk.c: In function ???ps3disk_scatter_gather???:
> > drivers/block/ps3disk.c:115: error: ???bio??? undeclared (first use in this
> > function)
> > drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
> > reported only once
> > drivers/block/ps3disk.c:115: error: for each function it appears in.)
> > drivers/block/ps3disk.c:115: error: ???j??? undeclared (first use in this
> > function)
> > drivers/block/ps3disk.c:116: error: implicit declaration of function
> > ???bio_kunmap_bvec???
> > make[2]: *** [drivers/block/ps3disk.o] Error 1
> > make[1]: *** [drivers/block] Error 2
> > make: *** [drivers] Error 2
> > 
> > The function bio_kunmap_bvec is missing.I tried checking the git-block.patch
> > as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
> > find this function.
> > 
> > Previously this function was replaced by __bio_kunmap_atomic();
> > This patch does not solves the implicit "declaration of function
> > ???bio_kunmap_bvec???"
> > 
> > Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]
> > >
> 
> Your mailer appears to have mangled both your signoff and the whitespace in
> the patch and it does not apply. However, fixing it does not solve the problem
> because of this mysterious bio_kunmap_bvec() that is only referenced by this
> driver. Was it accidently added during the addition of sg chaining support?

This should fix things up.

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8e05ba7..a7fd66a 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -106,14 +106,14 @@ static void ps3disk_scatter_gather(struct 
ps3_storage_device *dev,
(unsigned long)iter.bio->bi_sector);
 
size = bvec->bv_len;
-   buf = bvec_kmap_irq(bvec, flags);
+   buf = bvec_kmap_irq(bvec, );
if (gather)
memcpy(dev->bounce_buf+offset, buf, size);
else
memcpy(buf, dev->bounce_buf+offset, size);
offset += size;
-   flush_kernel_dcache_page(bio_iovec_idx(bio, j)->bv_page);
-   bio_kunmap_bvec(bvec, flags);
+   flush_kernel_dcache_page(bvec->bv_page);
+   bvec_kunmap_irq(buf, );
i++;
}
 }

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Mel Gorman

On (25/09/07 01:11), Kamalesh Babulal didst pronounce:

Hi Kamalesh,

> The build fails with following error
> 
> CC drivers/block/ps3disk.o
> drivers/block/ps3disk.c: In function ???ps3disk_scatter_gather???:
> drivers/block/ps3disk.c:115: error: ???bio??? undeclared (first use in this
> function)
> drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
> reported only once
> drivers/block/ps3disk.c:115: error: for each function it appears in.)
> drivers/block/ps3disk.c:115: error: ???j??? undeclared (first use in this
> function)
> drivers/block/ps3disk.c:116: error: implicit declaration of function
> ???bio_kunmap_bvec???
> make[2]: *** [drivers/block/ps3disk.o] Error 1
> make[1]: *** [drivers/block] Error 2
> make: *** [drivers] Error 2
> 
> The function bio_kunmap_bvec is missing.I tried checking the git-block.patch
> as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
> find this function.
> 
> Previously this function was replaced by __bio_kunmap_atomic();
> This patch does not solves the implicit "declaration of function
> ???bio_kunmap_bvec???"
> 
> Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]
> >

Your mailer appears to have mangled both your signoff and the whitespace in
the patch and it does not apply. However, fixing it does not solve the problem
because of this mysterious bio_kunmap_bvec() that is only referenced by this
driver. Was it accidently added during the addition of sg chaining support?

> ---
> 
> --- linux-2.6.23-rc7/drivers/block/ps3disk.c2007-09-24 20:50:41.0 
> +0530
> +++ linux-2.6.23-rc7/drivers/block/~ps3disk.c   2007-09-24 20:50:59.0 
> +0530
> @@ -112,7 +112,7 @@ static void ps3disk_scatter_gather(struc
> else
> memcpy(buf, dev->bounce_buf+offset, size);
> offset += size;
> -   flush_kernel_dcache_page(bio_iovec_idx(bio, j)->bv_page);
> +  flush_kernel_dcache_page(bio_iovec_idx(iter.bio, 
> iter.i)->bv_page);
> bio_kunmap_bvec(bvec, flags);
> i++;
> }
> 
> -- 
> 
> Thanks & Regards,
> Kamalesh Babulal,
> Linux Technology Center,
> IBM, ISTL.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: panic in scheduler

2007-09-25 Thread Kamalesh Babulal

Balbir Singh wrote:
> On 9/25/07, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
>> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 
>> 8 cpu's.
>> --
> 
> Hi, Kamalesh,
> 
> Could you please reproduce the problem or share the steps to reproduce
> the problem?
> 
> Thanks,
> Balbir
> -

Hi Balbir,

Yes, i am able to reproduce the problem. The problem can be reproduced
using the ltprunall.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Thomas Gleixner


On Tue, 2007-09-25 at 09:32 +0200, Torsten Kaiser wrote:
> On 9/24/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> > Can your check whether 2.6.23-rc7 +
> > http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch
> >
> > works for you ?
> 
> Yes, powers off normally.

Ok, so it's probably some merge artifact in -mm. We'll get this sorted
out once Len has his new tree available.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Torsten Kaiser

On 9/24/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> Can your check whether 2.6.23-rc7 +
> http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch
>
> works for you ?

Yes, powers off normally.

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: panic in scheduler

2007-09-25 Thread Balbir Singh

On 9/25/07, Kamalesh Babulal <[EMAIL PROTECTED]> wrote:
> Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 
> cpu's.
> --

Hi, Kamalesh,

Could you please reproduce the problem or share the steps to reproduce
the problem?

Thanks,
Balbir
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: panic in scheduler

2007-09-25 Thread Balbir Singh

On 9/25/07, Kamalesh Babulal [EMAIL PROTECTED] wrote:
 Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 
 cpu's.
 --

Hi, Kamalesh,

Could you please reproduce the problem or share the steps to reproduce
the problem?

Thanks,
Balbir
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Torsten Kaiser

On 9/24/07, Thomas Gleixner [EMAIL PROTECTED] wrote:
 Can your check whether 2.6.23-rc7 +
 http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch

 works for you ?

Yes, powers off normally.

Torsten
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Thomas Gleixner


On Tue, 2007-09-25 at 09:32 +0200, Torsten Kaiser wrote:
 On 9/24/07, Thomas Gleixner [EMAIL PROTECTED] wrote:
  Can your check whether 2.6.23-rc7 +
  http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch
 
  works for you ?
 
 Yes, powers off normally.

Ok, so it's probably some merge artifact in -mm. We'll get this sorted
out once Len has his new tree available.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: panic in scheduler

2007-09-25 Thread Kamalesh Babulal

Balbir Singh wrote:
 On 9/25/07, Kamalesh Babulal [EMAIL PROTECTED] wrote:
 Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 
 8 cpu's.
 --
 
 Hi, Kamalesh,
 
 Could you please reproduce the problem or share the steps to reproduce
 the problem?
 
 Thanks,
 Balbir
 -

Hi Balbir,

Yes, i am able to reproduce the problem. The problem can be reproduced
using the ltprunall.

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Mel Gorman

On (25/09/07 01:11), Kamalesh Babulal didst pronounce:

Hi Kamalesh,

 The build fails with following error
 
 CC drivers/block/ps3disk.o
 drivers/block/ps3disk.c: In function ???ps3disk_scatter_gather???:
 drivers/block/ps3disk.c:115: error: ???bio??? undeclared (first use in this
 function)
 drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
 reported only once
 drivers/block/ps3disk.c:115: error: for each function it appears in.)
 drivers/block/ps3disk.c:115: error: ???j??? undeclared (first use in this
 function)
 drivers/block/ps3disk.c:116: error: implicit declaration of function
 ???bio_kunmap_bvec???
 make[2]: *** [drivers/block/ps3disk.o] Error 1
 make[1]: *** [drivers/block] Error 2
 make: *** [drivers] Error 2
 
 The function bio_kunmap_bvec is missing.I tried checking the git-block.patch
 as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
 find this function.
 
 Previously this function was replaced by __bio_kunmap_atomic();
 This patch does not solves the implicit declaration of function
 ???bio_kunmap_bvec???
 
 Signed-off-by: Kamalesh Babulal [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]

Your mailer appears to have mangled both your signoff and the whitespace in
the patch and it does not apply. However, fixing it does not solve the problem
because of this mysterious bio_kunmap_bvec() that is only referenced by this
driver. Was it accidently added during the addition of sg chaining support?

 ---
 
 --- linux-2.6.23-rc7/drivers/block/ps3disk.c2007-09-24 20:50:41.0 
 +0530
 +++ linux-2.6.23-rc7/drivers/block/~ps3disk.c   2007-09-24 20:50:59.0 
 +0530
 @@ -112,7 +112,7 @@ static void ps3disk_scatter_gather(struc
 else
 memcpy(buf, dev-bounce_buf+offset, size);
 offset += size;
 -   flush_kernel_dcache_page(bio_iovec_idx(bio, j)-bv_page);
 +  flush_kernel_dcache_page(bio_iovec_idx(iter.bio, 
 iter.i)-bv_page);
 bio_kunmap_bvec(bvec, flags);
 i++;
 }
 
 -- 
 
 Thanks  Regards,
 Kamalesh Babulal,
 Linux Technology Center,
 IBM, ISTL.
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-- 
-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Jens Axboe

On Tue, Sep 25 2007, Mel Gorman wrote:
 On (25/09/07 01:11), Kamalesh Babulal didst pronounce:
 
 Hi Kamalesh,
 
  The build fails with following error
  
  CC drivers/block/ps3disk.o
  drivers/block/ps3disk.c: In function ???ps3disk_scatter_gather???:
  drivers/block/ps3disk.c:115: error: ???bio??? undeclared (first use in this
  function)
  drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
  reported only once
  drivers/block/ps3disk.c:115: error: for each function it appears in.)
  drivers/block/ps3disk.c:115: error: ???j??? undeclared (first use in this
  function)
  drivers/block/ps3disk.c:116: error: implicit declaration of function
  ???bio_kunmap_bvec???
  make[2]: *** [drivers/block/ps3disk.o] Error 1
  make[1]: *** [drivers/block] Error 2
  make: *** [drivers] Error 2
  
  The function bio_kunmap_bvec is missing.I tried checking the git-block.patch
  as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
  find this function.
  
  Previously this function was replaced by __bio_kunmap_atomic();
  This patch does not solves the implicit declaration of function
  ???bio_kunmap_bvec???
  
  Signed-off-by: Kamalesh Babulal [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED]
 
 Your mailer appears to have mangled both your signoff and the whitespace in
 the patch and it does not apply. However, fixing it does not solve the problem
 because of this mysterious bio_kunmap_bvec() that is only referenced by this
 driver. Was it accidently added during the addition of sg chaining support?

This should fix things up.

diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8e05ba7..a7fd66a 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -106,14 +106,14 @@ static void ps3disk_scatter_gather(struct 
ps3_storage_device *dev,
(unsigned long)iter.bio-bi_sector);
 
size = bvec-bv_len;
-   buf = bvec_kmap_irq(bvec, flags);
+   buf = bvec_kmap_irq(bvec, flags);
if (gather)
memcpy(dev-bounce_buf+offset, buf, size);
else
memcpy(buf, dev-bounce_buf+offset, size);
offset += size;
-   flush_kernel_dcache_page(bio_iovec_idx(bio, j)-bv_page);
-   bio_kunmap_bvec(bvec, flags);
+   flush_kernel_dcache_page(bvec-bv_page);
+   bvec_kunmap_irq(buf, flags);
i++;
}
 }

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Peter Zijlstra

On Mon, 24 Sep 2007 21:20:58 +0200 Peter Zijlstra
[EMAIL PROTECTED] wrote:

   Nope, and the stacktrace is utterly puzzling.
   
   /me goes read the lkml.org link
   
   Kamalesh Babulal: do you still get:
 BUG: spinlock bad magic on
   
   msgs?
   
   Because those I could reproduce using fsx, and I fixed all that.
  Hi Peter,
  
  I do not get BUG: spinlock bad magic messages any more, but the softlock 
  message is
  thrown more than 30 time, while running the ltp runall.
 
 It would be good to know what function on_each_cpu is executing, could
 you try something like:

I've just completed 2 full ltp runs on a dual-core opteron machine but
could not reproduce this problem.

Kamalesh, would it be possible for you to reproduce with that patch, so
we can see what function is holding up the cpu?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Mel Gorman

On (25/09/07 12:31), Jens Axboe didst pronounce:
 On Tue, Sep 25 2007, Mel Gorman wrote:
  On (25/09/07 01:11), Kamalesh Babulal didst pronounce:
  
  Hi Kamalesh,
  
   The build fails with following error
   
   CC drivers/block/ps3disk.o
   drivers/block/ps3disk.c: In function ???ps3disk_scatter_gather???:
   drivers/block/ps3disk.c:115: error: ???bio??? undeclared (first use in 
   this
   function)
   drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
   reported only once
   drivers/block/ps3disk.c:115: error: for each function it appears in.)
   drivers/block/ps3disk.c:115: error: ???j??? undeclared (first use in this
   function)
   drivers/block/ps3disk.c:116: error: implicit declaration of function
   ???bio_kunmap_bvec???
   make[2]: *** [drivers/block/ps3disk.o] Error 1
   make[1]: *** [drivers/block] Error 2
   make: *** [drivers] Error 2
   
   The function bio_kunmap_bvec is missing.I tried checking the 
   git-block.patch
   as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
   find this function.
   
   Previously this function was replaced by __bio_kunmap_atomic();
   This patch does not solves the implicit declaration of function
   ???bio_kunmap_bvec???
   
   Signed-off-by: Kamalesh Babulal [EMAIL PROTECTED]
   mailto:[EMAIL PROTECTED]
  
  Your mailer appears to have mangled both your signoff and the whitespace in
  the patch and it does not apply. However, fixing it does not solve the 
  problem
  because of this mysterious bio_kunmap_bvec() that is only referenced by this
  driver. Was it accidently added during the addition of sg chaining support?
 
 This should fix things up.
 

This builds although I lack the hardware to really test it. However, in
2.6.23-rc8-mm1 it collides with git-block-ps3disk-fix.patch. This is a
version on top of that stack but I guess the best thing to do is replace
git-block-ps3disk-fix.patch with Jens patch once it is signed off.

Not signing off because this is just a rebase. Assuming the other one
gets signed off, consider it;

Acked-by: Mel Gorman [EMAIL PROTECTED]

--- 

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff 
linux-2.6.23-rc8-mm1-clean/drivers/block/ps3disk.c 
linux-2.6.23-rc8-mm1-fix-ps3disk/drivers/block/ps3disk.c
--- linux-2.6.23-rc8-mm1-clean/drivers/block/ps3disk.c  2007-09-25 
12:05:40.0 +0100
+++ linux-2.6.23-rc8-mm1-fix-ps3disk/drivers/block/ps3disk.c2007-09-25 
12:09:19.0 +0100
@@ -106,14 +106,14 @@ static void ps3disk_scatter_gather(struc
(unsigned long)iter.bio-bi_sector);
 
size = bvec-bv_len;
-   buf = bvec_kmap_irq(bvec, flags);
+   buf = bvec_kmap_irq(bvec, flags);
if (gather)
memcpy(dev-bounce_buf+offset, buf, size);
else
memcpy(buf, dev-bounce_buf+offset, size);
offset += size;
-   flush_kernel_dcache_page(bio_iovec_idx(iter.bio, 
iter.i)-bv_page);
-   bio_kunmap_bvec(bvec, flags);
+   flush_kernel_dcache_page(bvec-bv_page);
+   bvec_kunmap_irq(buf, flags);
i++;
}
 }

-- 
Mel Gorman
Part-time Phd Student  Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Jens Axboe

On Tue, Sep 25 2007, Mel Gorman wrote:
 On (25/09/07 12:31), Jens Axboe didst pronounce:
  On Tue, Sep 25 2007, Mel Gorman wrote:
   On (25/09/07 01:11), Kamalesh Babulal didst pronounce:
   
   Hi Kamalesh,
   
The build fails with following error

CC drivers/block/ps3disk.o
drivers/block/ps3disk.c: In function ???ps3disk_scatter_gather???:
drivers/block/ps3disk.c:115: error: ???bio??? undeclared (first use in 
this
function)
drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
reported only once
drivers/block/ps3disk.c:115: error: for each function it appears in.)
drivers/block/ps3disk.c:115: error: ???j??? undeclared (first use in 
this
function)
drivers/block/ps3disk.c:116: error: implicit declaration of function
???bio_kunmap_bvec???
make[2]: *** [drivers/block/ps3disk.o] Error 1
make[1]: *** [drivers/block] Error 2
make: *** [drivers] Error 2

The function bio_kunmap_bvec is missing.I tried checking the 
git-block.patch
as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
find this function.

Previously this function was replaced by __bio_kunmap_atomic();
This patch does not solves the implicit declaration of function
???bio_kunmap_bvec???

Signed-off-by: Kamalesh Babulal [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]
   
   Your mailer appears to have mangled both your signoff and the whitespace 
   in
   the patch and it does not apply. However, fixing it does not solve the 
   problem
   because of this mysterious bio_kunmap_bvec() that is only referenced by 
   this
   driver. Was it accidently added during the addition of sg chaining 
   support?
  
  This should fix things up.
  
 
 This builds although I lack the hardware to really test it. However, in
 2.6.23-rc8-mm1 it collides with git-block-ps3disk-fix.patch. This is a
 version on top of that stack but I guess the best thing to do is replace
 git-block-ps3disk-fix.patch with Jens patch once it is signed off.
 
 Not signing off because this is just a rebase. Assuming the other one
 gets signed off, consider it;

Thanks, but I already integrated the fix into the existing patch, so
that bisect will work.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-25 Thread Kamalesh Babulal

Peter Zijlstra wrote:
 On Mon, 24 Sep 2007 21:20:58 +0200 Peter Zijlstra
 [EMAIL PROTECTED] wrote:
 
 Nope, and the stacktrace is utterly puzzling.

 /me goes read the lkml.org link

 Kamalesh Babulal: do you still get:
   BUG: spinlock bad magic on

 msgs?

 Because those I could reproduce using fsx, and I fixed all that.
 Hi Peter,

 I do not get BUG: spinlock bad magic messages any more, but the softlock 
 message is
 thrown more than 30 time, while running the ltp runall.
 It would be good to know what function on_each_cpu is executing, could
 you try something like:
 
 I've just completed 2 full ltp runs on a dual-core opteron machine but
 could not reproduce this problem.
 
 Kamalesh, would it be possible for you to reproduce with that patch, so
 we can see what function is holding up the cpu?

Hi Peter,

After running the test with the patch you provided, i observed an oops message
which was at the top of the these soft lockup message and the oops is the same 
as 
the oops reported at http://lkml.org/lkml/2007/9/24/107.

And when i applied the patch for the oops proposed at 
http://lkml.org/lkml/2007/9/25/57 the oops as well as the soft lockup's are not 
seen.

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: panic in scheduler

2007-09-25 Thread Lee Schermerhorn

On Tue, 2007-09-25 at 13:32 +0530, Kamalesh Babulal wrote:
 Balbir Singh wrote:
  On 9/25/07, Kamalesh Babulal [EMAIL PROTECTED] wrote:
  Exactly same call trace is produced over IA64 Madison (up to 9M cache) 
  with 8 cpu's.
  --
  
  Hi, Kamalesh,
  
  Could you please reproduce the problem or share the steps to reproduce
  the problem?
  
  Thanks,
  Balbir
  -
 
 Hi Balbir,
 
 Yes, i am able to reproduce the problem. The problem can be reproduced
 using the ltprunall.
 

I see the problem just trying to boot.  I have yet to successfully boot
23-rc7-mm1 on my platform.  [But, I'll try Ingo's dev tree real soon
now...]

Lee

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Jeff Garzik wrote:

 The first step would be to clone the upstream branch of
 git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
 
 and see if the problem is reproducible there.  If yes, then you have
 narrowed down the problem to something my ATA devel tree has introduced
 into -mm.

Nope, you're off the hook.  The libata tree works great, so it must be
something else in -mm conflicting.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jens Axboe

On Tue, Sep 25 2007, Berck E. Nash wrote:
 Jeff Garzik wrote:
 
  The first step would be to clone the upstream branch of
  git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
  
  and see if the problem is reproducible there.  If yes, then you have
  narrowed down the problem to something my ATA devel tree has introduced
  into -mm.
 
 Nope, you're off the hook.  The libata tree works great, so it must be
 something else in -mm conflicting.

Can you try 2.6.23-rc8 plus this patch:

http://brick.kernel.dk/git-block.patch.bz2

and see if that works?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Jens Axboe wrote:
 On Tue, Sep 25 2007, Berck E. Nash wrote:
 Jeff Garzik wrote:

 The first step would be to clone the upstream branch of
 git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git

 and see if the problem is reproducible there.  If yes, then you have
 narrowed down the problem to something my ATA devel tree has introduced
 into -mm.
 Nope, you're off the hook.  The libata tree works great, so it must be
 something else in -mm conflicting.

Whoops, sorry!  I just lied.  I'm a git newbie, and failed to actually
get the upstream branch the first time, so rc8 is clean, but it fails
when I actually pull the upstream branch.  I'll git bisect and get back
to you.

BErck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jens Axboe

On Tue, Sep 25 2007, Berck E. Nash wrote:
 Jens Axboe wrote:
  On Tue, Sep 25 2007, Berck E. Nash wrote:
  Jeff Garzik wrote:
 
  The first step would be to clone the upstream branch of
  git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git
 
  and see if the problem is reproducible there.  If yes, then you have
  narrowed down the problem to something my ATA devel tree has introduced
  into -mm.
  Nope, you're off the hook.  The libata tree works great, so it must be
  something else in -mm conflicting.
 
 Whoops, sorry!  I just lied.  I'm a git newbie, and failed to actually
 get the upstream branch the first time, so rc8 is clean, but it fails
 when I actually pull the upstream branch.  I'll git bisect and get back
 to you.

OK, you probably realize this, but you can forget about the git-block
testing for now then.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Jeff Garzik wrote:
 Once the blame has been squared fixed upon me :) you can use git-bisect
 to locate the precise change that broke your setup.

Okay, here's the problem:

268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik [EMAIL PROTECTED]
Date:   Fri Sep 21 07:09:36 2007 -0400

[libata] SCSI: simple TEST UNIT READY simulation

It's trivial to ping the device, and that's a much more sane behavior
than no-op.

Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

:04 04 44d34cdad073bd623545b8239aca9a113652c6d0
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M  drivers

Berck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik


Berck E. Nash wrote:

Jeff Garzik wrote:

Once the blame has been squared fixed upon me :) you can use git-bisect
to locate the precise change that broke your setup.


Okay, here's the problem:

268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik [EMAIL PROTECTED]
Date:   Fri Sep 21 07:09:36 2007 -0400

[libata] SCSI: simple TEST UNIT READY simulation

It's trivial to ping the device, and that's a much more sane behavior
than no-op.

Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

:04 04 44d34cdad073bd623545b8239aca9a113652c6d0
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M  drivers


Thanks for debugging!

Can you tell me something about this device?

[   49.045635] ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
[   49.051677] ata2.00: 640 sectors, multi 1: LBA
[   49.056321] ata2.00: configured for UDMA/133

It seems like it does not support the 'check power mode' command.

Can you post a text file attachment, containing the output of 'hdparm 
--Istdout' ?


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Jeff Garzik wrote:
 Can you tell me something about this device?
 
 [   49.045635] ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
 [   49.051677] ata2.00: 640 sectors, multi 1: LBA
 [   49.056321] ata2.00: configured for UDMA/133
 
 It seems like it does not support the 'check power mode' command.
 
 Can you post a text file attachment, containing the output of 'hdparm
 --Istdout' ?

No problem.  The device in question is a Western Digital Raptor WD360GD
36.7GB 10,000 RPM Serial ATA150 Hard Drive.

hdparm output attached.

Berck
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Robert Hancock


Jeff Garzik wrote:

Berck E. Nash wrote:

Jeff Garzik wrote:

Once the blame has been squared fixed upon me :) you can use git-bisect
to locate the precise change that broke your setup.


Okay, here's the problem:

268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik [EMAIL PROTECTED]
Date:   Fri Sep 21 07:09:36 2007 -0400

[libata] SCSI: simple TEST UNIT READY simulation

It's trivial to ping the device, and that's a much more sane behavior
than no-op.

Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

:04 04 44d34cdad073bd623545b8239aca9a113652c6d0
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M  drivers


Thanks for debugging!

Can you tell me something about this device?

[   49.045635] ata2.00: ATA-6: Config  Disk, RGL10364, max UDMA/133
[   49.051677] ata2.00: 640 sectors, multi 1: LBA
[   49.056321] ata2.00: configured for UDMA/133

It seems like it does not support the 'check power mode' command.

Can you post a text file attachment, containing the output of 'hdparm 
--Istdout' ?


ATA spec says The device shall return command aborted if the device 
does not support the Power Management feature set. Whereas TEST UNIT 
READY is required for SCSI. It seems the SAT authors didn't consider 
this case.


I assume we can tell from the identify data that the device doesn't 
support power management and just fake success for TEST UNIT READY in 
this case?


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Berck E. Nash

Berck E. Nash wrote:
 hdparm output attached.

Whoops, it really is this time.


/dev/sde:
427a 3fff  0010 e100 0258 003f 
 000e 5744 2d57 4d41 4b48 3131 3235
3131 3700   0003 4000 004a 3331
2e30 3846 3331 5744 4320 5744 3336 3047
442d 3030 464c 4132 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8010
 2f00 4001 0280  0007 3fff 0010
003f fc10 00fb 0110 44e0 044f  0007
0003 0078 0078 0078 0078   
   001f 0202   
007e  74eb 7f63 4003 74e9 3e43 4003
407f      80fe 
    44e0 044f  
       
       
       
0001 0141    0746  
      0002 0001
       
       001f
       
       
       
       
       
      001f 
       
       
       
       
       
       8da5

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik


Robert Hancock wrote:
ATA spec says The device shall return command aborted if the device 
does not support the Power Management feature set. Whereas TEST UNIT 
READY is required for SCSI. It seems the SAT authors didn't consider 
this case.



Dumb me -- I misread that as mandatory.

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik


Berck E. Nash wrote:

Jeff Garzik wrote:

Once the blame has been squared fixed upon me :) you can use git-bisect
to locate the precise change that broke your setup.


Okay, here's the problem:

268fe6f9f15551be9abedd44a237392675d529d5 is first bad commit
commit 268fe6f9f15551be9abedd44a237392675d529d5
Author: Jeff Garzik [EMAIL PROTECTED]
Date:   Fri Sep 21 07:09:36 2007 -0400

[libata] SCSI: simple TEST UNIT READY simulation

It's trivial to ping the device, and that's a much more sane behavior
than no-op.

Signed-off-by: Jeff Garzik [EMAIL PROTECTED]

:04 04 44d34cdad073bd623545b8239aca9a113652c6d0
df6d21f7ce56a4e796f8f856c1f647b0395ab4df M  drivers


Does the attached patch change behavior at all?  You should be able to 
apply it on top of libata-dev.git#upstream or -mm.


If there are still problems, an updated dmesg (w/ the attached patch) 
and output from enabling ATA_DEBUG (include/linux/libata.h) would be 
very helpful.


Thanks!

Jeff


diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 3882c72..c9838f1 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2800,7 +2800,9 @@ static inline ata_xlat_func_t ata_get_xlat_func(struct 
ata_device *dev, u8 cmd)
return ata_scsi_start_stop_xlat;
 
case TEST_UNIT_READY:
-   return ata_scsi_tur_xlat;
+   if (ata_id_has_pm(dev-id))
+   return ata_scsi_tur_xlat;
+   return NULL;
}
 
return NULL;
@@ -3021,6 +3023,7 @@ void ata_scsi_simulate(struct ata_device *dev, struct 
scsi_cmnd *cmd,
case REZERO_UNIT:
case SEEK_6:
case SEEK_10:
+   case TEST_UNIT_READY:   /* only for !PM devices */
ata_scsi_rbuf_fill(args, ata_scsiop_noop);
break;

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik


Berck E. Nash wrote:

Jeff Garzik wrote:

Does the attached patch change behavior at all?  You should be able to
apply it on top of libata-dev.git#upstream or -mm.


Still broken, dmesg with ATA_DEBUG defined, attached.


Great, this will be useful output.  It will probably be a couple days 
before my next patch.  In the meantime, you can extract the bad commit 
to a patch


git-diff-tree -p 268fe6f9f15551be9abedd44a237392675d529d5  \
/tmp/patch

and then revert it locally in your kernel tree

patch -sp1 -R  /tmp/patch

to temporarily work around this.

I will definitely make sure this is either fixed or reverted before it 
goes upstream to Linus.


Thanks,

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-25 Thread Jeff Garzik

Would it also be possible for you to send along 'hdparm --Istdout' 
output for your config disk thingy, /dev/sdd ?


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1 AHCI ATA errors -- won't boot

2007-09-24 Thread Jeff Garzik


Berck E. Nash wrote:

Greetings,

I get a few million of these on boot-- the system never actually boots.
Works fine in 2.6.23-rc7.

[   50.456012] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   50.462484] ata2.00: irq_stat 0x4001
[   50.466441] ata2.00: cmd e5/00:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 0x0 data 0
[   50.466442]  res 51/04:00:01:01:80/00:00:00:00:00/a0 Emask
0x1 (device error)
[   50.481914] ata2.00: status: {DRDY ERR }
[   50.485876] ata2.00: error: {ABRT }
[   50.489533] ata2.00: configured for UDMA/133
[   50.493839] ata2: EH complete

I've attached the entire dmesg and lspci.


Are you "git-friendly"?  A few quick kernel compiles and reboots would 
help us narrow down the problem, given that it's a reproducible regression.


The first step would be to clone the "upstream" branch of 
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev.git


and see if the problem is reproducible there.  If yes, then you have 
narrowed down the problem to something my ATA devel tree has introduced 
into -mm.


Once the blame has been squared fixed upon me :) you can use git-bisect 
to locate the precise change that broke your setup.


Info at http://kerneltrap.org/node/11753 or 
http://www.kernel.org/pub/software/scm/git/docs/v1.3.3/howto/isolate-bugs-with-bisect.txt

or "man git-bisect"

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: build error with CONFIG_KEXEC=y and CONFIG_NOHIGHMEM=y

2007-09-24 Thread Randy Dunlap

[adding kexec m-l]

On Mon, 24 Sep 2007 22:10:36 +0200 Laurent Riffard wrote:

> Le 24.09.2007 11:17, Andrew Morton a écrit :
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/
> > 
> 
> I've got this compilation when CONFIG_KEXEC=y and CCONFIG_NOHIGHMEM=y:
> 
> linux-2.6-mm$ LANG=C make 
>   CHK include/linux/version.h
>   CHK include/linux/utsrelease.h
>   CALLscripts/checksyscalls.sh
>   CHK include/linux/compile.h
>   CC  arch/i386/kernel/setup.o
> arch/i386/kernel/setup.c: In function 'reserve_crashkernel':
> arch/i386/kernel/setup.c:391: error: 'highend_pfn' undeclared (first use in 
> this function)
> arch/i386/kernel/setup.c:391: error: (Each undeclared identifier is reported 
> only once
> arch/i386/kernel/setup.c:391: error: for each function it appears in.)
> arch/i386/kernel/setup.c:391: error: 'highstart_pfn' undeclared (first use in 
> this function)
> make[1]: *** [arch/i386/kernel/setup.o] Error 1
> make: *** [arch/i386/kernel] Error 2
> 
> 
> .config attached.
> ~~
> laurent
> 


---
~Randy
Phaedrus says that Quality is about caring.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: build error with CONFIG_KEXEC=y and CONFIG_NOHIGHMEM=y

2007-09-24 Thread Laurent Riffard

Le 24.09.2007 11:17, Andrew Morton a écrit :
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/
> 

I've got this compilation when CONFIG_KEXEC=y and CCONFIG_NOHIGHMEM=y:

linux-2.6-mm$ LANG=C make 
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
  CHK include/linux/compile.h
  CC  arch/i386/kernel/setup.o
arch/i386/kernel/setup.c: In function 'reserve_crashkernel':
arch/i386/kernel/setup.c:391: error: 'highend_pfn' undeclared (first use in 
this function)
arch/i386/kernel/setup.c:391: error: (Each undeclared identifier is reported 
only once
arch/i386/kernel/setup.c:391: error: for each function it appears in.)
arch/i386/kernel/setup.c:391: error: 'highstart_pfn' undeclared (first use in 
this function)
make[1]: *** [arch/i386/kernel/setup.o] Error 1
make: *** [arch/i386/kernel] Error 2


.config attached.
~~
laurent
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.23-rc7-mm1
# Mon Sep 24 20:25:04 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_USER_NS is not set
# CONFIG_AUDIT is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=15
# CONFIG_CGROUPS is not set
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_PROC_KPAGEMAP=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set
# CONFIG_BLK_DEV_BSG is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
# CONFIG_SMP is not set
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MCORE2 is not set
# CONFIG_MK6 is not set
CONFIG_MK7=y
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_TSC=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=4
CONFIG_HPET_TIMER=y
# CONFIG_PREE

Re: 2.6.23-rc7-mm1: panic in scheduler

2007-09-24 Thread Ingo Molnar


* Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> Taking a quick look at [__]{en|de|queue_entity() and the functions 
> they call, I see something suspicious in set_leftmost() in 
> sched_fair.c:
> 
> static inline void
> set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
> {
> struct sched_entity *se;
> 
> cfs_rq->rb_leftmost = leftmost;
> if (leftmost)
> se = rb_entry(leftmost, struct sched_entity, run_node);
> }
> 
> Missing code?  corrupt patch?

could you pull this git tree ontop of a -rc7 (or later) upstream tree:

  git-pull 
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel.git

does the solve the crash?

the above set_leftmost() code used to be larger and now indeed those 
bits are mostly dead code. I've queued up a clean-up patch for that - 
see the patch below. It should not impact correctness though, so if you 
can still trigger the crash with the latest sched-devel.git tree we'd 
like to know about it.

Ingo

--->
Subject: sched: remove set_leftmost()
From: Ingo Molnar <[EMAIL PROTECTED]>

Lee Schermerhorn noticed that set_leftmost() contains dead code,
remove this.

Reported-by: Lee Schermerhorn <[EMAIL PROTECTED]>
Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
---
 kernel/sched_fair.c |   14 ++
 1 file changed, 2 insertions(+), 12 deletions(-)

Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -124,16 +124,6 @@ max_vruntime(u64 min_vruntime, u64 vrunt
return min_vruntime;
 }
 
-static inline void
-set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
-{
-   struct sched_entity *se;
-
-   cfs_rq->rb_leftmost = leftmost;
-   if (leftmost)
-   se = rb_entry(leftmost, struct sched_entity, run_node);
-}
-
 static inline s64
 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
@@ -175,7 +165,7 @@ __enqueue_entity(struct cfs_rq *cfs_rq, 
 * used):
 */
if (leftmost)
-   set_leftmost(cfs_rq, >run_node);
+   cfs_rq->rb_leftmost = >run_node;
 
rb_link_node(>run_node, parent, link);
rb_insert_color(>run_node, _rq->tasks_timeline);
@@ -185,7 +175,7 @@ static void
 __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
if (cfs_rq->rb_leftmost == >run_node)
-   set_leftmost(cfs_rq, rb_next(>run_node));
+   cfs_rq->rb_leftmost = rb_next(>run_node);
 
rb_erase(>run_node, _rq->tasks_timeline);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Kamalesh Babulal

Hi Andrew,

The drivers/net/pasemi_mac seems to be broken and build fails with

CC [M] drivers/net/pasemi_mac.o
drivers/net/pasemi_mac.c: In function ‘pasemi_mac_probe’:
drivers/net/pasemi_mac.c:1153: error: conflicting types for ‘mac’
drivers/net/pasemi_mac.c:1151: error: previous declaration of ‘mac’ was here
drivers/net/pasemi_mac.c:1170: error: incompatible types in assignment
drivers/net/pasemi_mac.c:1172: error: request for member ‘pdev’ in
something not a structure or union
drivers/net/pasemi_mac.c:1173: error: request for member ‘netdev’ in
something not a structure or union
drivers/net/pasemi_mac.c:1175: error: request for member ‘napi’ in
something not a structure or union
drivers/net/pasemi_mac.c:1180: error: request for member ‘dma_txch’ in
something not a structure or union
drivers/net/pasemi_mac.c:1181: error: request for member ‘dma_rxch’ in
something not a structure or union
drivers/net/pasemi_mac.c:1187: error: request for member ‘dma_if’ in
something not a structure or union
drivers/net/pasemi_mac.c:1189: error: request for member ‘dma_if’ in
something not a structure or union
drivers/net/pasemi_mac.c:1194: error: request for member ‘type’ in
something not a structure or union
drivers/net/pasemi_mac.c:1197: error: request for member ‘type’ in
something not a structure or union
drivers/net/pasemi_mac.c:1205: warning: passing argument 1 of
‘pasemi_get_mac_addr’ from incompatible pointer type
drivers/net/pasemi_mac.c:1205: error: request for member ‘mac_addr’ in
something not a structure or union
drivers/net/pasemi_mac.c:1209: error: request for member ‘mac_addr’ in
something not a structure or union
drivers/net/pasemi_mac.c:1209: error: request for member ‘mac_addr’ in
something not a structure or union
drivers/net/pasemi_mac.c:1216: warning: passing argument 1 of
‘pasemi_mac_map_regs’ from incompatible pointer type
drivers/net/pasemi_mac.c:1220: error: request for member ‘rx_status’ in
something not a structure or union
drivers/net/pasemi_mac.c:1220: error: request for member ‘dma_rxch’ in
something not a structure or union
drivers/net/pasemi_mac.c:1221: error: request for member ‘tx_status’ in
something not a structure or union
drivers/net/pasemi_mac.c:1221: error: request for member ‘dma_txch’ in
something not a structure or union
drivers/net/pasemi_mac.c:1223: error: request for member ‘msg_enable’ in
something not a structure or union
drivers/net/pasemi_mac.c:1226: error: request for member ‘msg_enable’ in
something not a structure or union
drivers/net/pasemi_mac.c:1231: error: request for member ‘pdev’ in
something not a structure or union
drivers/net/pasemi_mac.c:1231: error: request for member ‘pdev’ in
something not a structure or union
drivers/net/pasemi_mac.c:1237: error: request for member ‘type’ in
something not a structure or union
drivers/net/pasemi_mac.c:1238: error: request for member ‘dma_if’ in
something not a structure or union
drivers/net/pasemi_mac.c:1238: error: request for member ‘dma_txch’ in
something not a structure or union
drivers/net/pasemi_mac.c:1238: error: request for member ‘dma_rxch’ in
something not a structure or union
drivers/net/pasemi_mac.c:1244: error: request for member ‘iob_pdev’ in
something not a structure or union
drivers/net/pasemi_mac.c:1245: error: request for member ‘iob_pdev’ in
something not a structure or union
drivers/net/pasemi_mac.c:1246: error: request for member ‘dma_pdev’ in
something not a structure or union
drivers/net/pasemi_mac.c:1247: error: request for member ‘dma_pdev’ in
something not a structure or union
drivers/net/pasemi_mac.c:1248: error: request for member ‘dma_regs’ in
something not a structure or union
drivers/net/pasemi_mac.c:1249: error: request for member ‘dma_regs’ in
something not a structure or union
drivers/net/pasemi_mac.c:1250: error: request for member ‘iob_regs’ in
something not a structure or union
drivers/net/pasemi_mac.c:1251: error: request for member ‘iob_regs’ in
something not a structure or union
drivers/net/pasemi_mac.c:1252: error: request for member ‘regs’ in
something not a structure or union
drivers/net/pasemi_mac.c:1253: error: request for member ‘regs’ in
something not a structure or union
make[2]: *** [drivers/net/pasemi_mac.o] Error 1
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2

In the function
static int __devinit
pasemi_mac_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
{

struct pasemi_mac *mac;
int err;
DECLARE_MAC_BUF(mac);

introduction of mac as var [18] triggers the build failure, so in the
below patch
renaming mac as mac_buf is done, because it is used to print the mac
address using
the newly introduced print_mac function.

Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]>
---

--- linux-2.6.23-rc7/drivers/net/pasemi_mac.c   2007-09-25 03:27:45.0 
+0530
+++ linux-2.6.23-rc7/drivers/net/~pasemi_mac.c  2007-09-25 03:27:27.0 
+0530
@@ -1150,7 +1150,7 @@ pasemi_mac_probe(struct pci_dev *pdev, c
struct net_device *dev;

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Sam Ravnborg

On Mon, Sep 24, 2007 at 11:02:22PM +0200, Sam Ravnborg wrote:
> Hi Kamalesh.
> 
> > The link error for a PowerMac G5 (powerpc) is still seen with 
> > 2.6.23-rc7-mm1,
> > and was reported for 2.6.23-rc6-mm1 (http://lkml.org/lkml/2007/9/19/62).
> > 
> >  KSYM.tmp_kallsyms1.S
> >  AS  .tmp_kallsyms1.o
> >  LD  .tmp_vmlinux2
> >  KSYM.tmp_kallsyms2.S
> >  AS  .tmp_kallsyms2.o
> >  LD  vmlinux.o
> > ld: dynreloc miscount for fs/built-in.o, section .opd
> > ld: can not edit opd Bad value
> > make: *** [vmlinux.o] Error 1
> 
> Can you try to narrow it down a bit further...
> As this happens when building fs/built-in.o it should be
> straightforward to do so.
> 
> First step would be to do:
> rm fs/built-in.o
> make fs/ V=1
> 
> Then copy the ld invocation and try to remove the input .o files one-by-one.
> This should tell you which .o file is causing the bug.
> 
> Next step is to try to squeze down the offending file until the
> errornous part remains.
> 
> Last time I did a
> make fs/file.i
> 
> And then I used gcc & ld to compile and link.
> Gradually removing stuff from file.i made me spot the problem
> with the weak prototype in a header file.
> 
> I guess something else is making ld hit this error now.
> 
> PS. Just reinstalled my dev box so no crosscompiler atm.

Got powerpc toolchain running now but cannot reproduce.
What config do you use (I used g5_defconfig)?
And what ld version?

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Reuben Farrelly




On 25/09/2007 3:12 AM, J. Bruce Fields wrote:

On Mon, Sep 24, 2007 at 09:59:29AM -0700, Andrew Morton wrote:

On Tue, 25 Sep 2007 00:52:30 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote:



On 24/09/2007 7:17 PM, Andrew Morton wrote:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/

- New git tree git-powerpc-galak.patch added to the -mm lineup: ppc32
  things, mainly (Kumar Gala <[EMAIL PROTECTED]>)
I'm observing a problem with this kernel (as well as 2.6.23-rc6-mm1) which 
manifests itself only in my Postfix/application mail.logs:


Sep 25 00:25:40 tornado postfix/smtp[12520]: fatal: select lock: Cannot allocate 
memory
Sep 25 00:25:41 tornado postfix/master[8002]: warning: process 
/usr/lib64/postfix/smtp pid 12520 exit status 1


This is happening frequently with processes started via 'master' (smtp, smtpd 
and cleanup), but it does not appear to have any noticeable operational impact 
apart from logging a lot of copies of this message.


The corresponding code in Postfix which triggers this is (choice of 3 files in 
src/master are all possibilities which all have much the same code)


Oog.  Looks like it's the "Memory shortage can result in inconsistent
flocks state" patch--the error variable is being set in some cases when
it shouldn't be.  Does the following fix it?

That's in my git tree, not in mainline.  I'll fix up my copy.

And I'll spend some time today figuring out what to do about regression
testing for the posix lock, flock, and lease code.

Thanks for the bug report!

--b.

diff --git a/fs/locks.c b/fs/locks.c
index a6c5917..3e8bfd2 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -740,6 +740,7 @@ static int flock_lock_file(struct file *filp, struct 
file_lock *request)
new_fl = locks_alloc_lock();
if (new_fl == NULL)
goto out;
+   error = 0;
}
 
 	for_each_lock(inode, before) {


Yes that has fixed it, thanks!

Reuben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1: panic in scheduler

2007-09-24 Thread Kamalesh Babulal

Lee Schermerhorn wrote:
> I looked around on the MLs for mention of this, but didn't find anything
> that appeared to match.
> 
> Platform:  HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison]
> 
> 2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed:
> 
> Unable to handle kernel NULL pointer dereference (address )
> swapper[0]: Oops 8813272891392 [1]
> Modules linked in: scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore
> 
> Pid: 0, CPU 14, comm:  swapper
> psr : 101008522030 ifs : 8002 ip  : []
> Not tainted
> ip is at rb_next+0x0/0x140
> unat:  pfs : 0308 rsc : 0003
> rnat: 8012 bsps: 0001003e pr  : 6609a840599519a5
> ldrs:  ccv : 0002 fpsr: 0009804c8a70433f
> csd :  ssd : 
> b0  : a00100078dc0 b6  : a00100074a40 b7  : a00100078e00
> f6  : 1003e f7  : 1003e0040
> f8  : 1003e2aab f9  : 1003e000d43798a2b
> f10 : 1003e35e9970b967dd8b9 f11 : 1003e0002
> r1  : a00100bc0920 r2  : e76577f0 r3  : e7657f10
> r8  : fff0 r9  : 0002 r10 : e7657780
> r11 :  r12 : e7004160fe10 r13 : e70041608000
> r14 :  r15 : 000e r16 : 0007f6c30a22
> r17 : e70041608040 r18 : a001008383a8 r19 : a00100078e00
> r20 : e7655bb8 r21 : e7655bb0 r22 : e7657ed0
> r23 : 000f4240 r24 : a001009e0440 r25 : e70041608bb4
> r26 :  r27 :  r28 : e7657f80
> r29 : 02e7 r30 :  r31 : e7657780
> 
> Call Trace:
>  [] show_stack+0x80/0xa0
> sp=e7004160f9e0 bsp=e70041609008
>  [] show_regs+0x870/0x8a0
> sp=e7004160fbb0 bsp=e70041608fa8
>  [] die+0x190/0x300
> sp=e7004160fbb0 bsp=e70041608f60
>  [] ia64_do_page_fault+0x780/0xa80
> sp=e7004160fbb0 bsp=e70041608f08
>  [] ia64_leave_kernel+0x0/0x270
> sp=e7004160fc40 bsp=e70041608f08
>  [] rb_next+0x0/0x140
> sp=e7004160fe10 bsp=e70041608ef8
>  [] __dequeue_entity+0x80/0xc0
> sp=e7004160fe10 bsp=e70041608ec8
>  [] pick_next_task_fair+0x60/0x180
> sp=e7004160fe10 bsp=e70041608e98
>  [] schedule+0x340/0x19c0
> sp=e7004160fe10 bsp=e70041608cc0
>  [] cpu_idle+0x290/0x3e0
> sp=e7004160fe30 bsp=e70041608c50
>  [] start_secondary+0x380/0x5a0
> sp=e7004160fe30 bsp=e70041608c00
>  [] __kprobes_text_end+0x6c0/0x6f0
> sp=e7004160fe30 bsp=e70041608c00
> 
> 
> Taking a quick look at [__]{en|de|queue_entity() and the functions they call,
> I see something suspicious in set_leftmost() in sched_fair.c:
> 
> static inline void
> set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
> {
> struct sched_entity *se;
> 
> cfs_rq->rb_leftmost = leftmost;
> if (leftmost)
> se = rb_entry(leftmost, struct sched_entity, run_node);
> }
> 
> Missing code?  corrupt patch?
> 
> config available on request, but there doesn't seem to be much in the way
> of scheduler config option.  A few that might apply:
> 
> SCHED_SMT is not set
> SCHED_DEBUG=y
> SCHEDSTATS=y
> 
> 
> Regards,
> Lee Schermerhorn
> 

Exactly same call trace is produced over IA64 Madison (up to 9M cache) with 8 
cpu's.
-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.23-rc7-mm1: panic in scheduler

2007-09-24 Thread Lee Schermerhorn

I looked around on the MLs for mention of this, but didn't find anything
that appeared to match.

Platform:  HP rx8620 - 16-cpu/32GB/4-node ia64 [Madison]

2.6.23-rc7-mm1 broken out -- panic occurs when git-sched.patch pushed:

Unable to handle kernel NULL pointer dereference (address )
swapper[0]: Oops 8813272891392 [1]
Modules linked in: scsi_wait_scan ehci_hcd ohci_hcd uhci_hcd usbcore

Pid: 0, CPU 14, comm:  swapper
psr : 101008522030 ifs : 8002 ip  : []Not 
tainted
ip is at rb_next+0x0/0x140
unat:  pfs : 0308 rsc : 0003
rnat: 8012 bsps: 0001003e pr  : 6609a840599519a5
ldrs:  ccv : 0002 fpsr: 0009804c8a70433f
csd :  ssd : 
b0  : a00100078dc0 b6  : a00100074a40 b7  : a00100078e00
f6  : 1003e f7  : 1003e0040
f8  : 1003e2aab f9  : 1003e000d43798a2b
f10 : 1003e35e9970b967dd8b9 f11 : 1003e0002
r1  : a00100bc0920 r2  : e76577f0 r3  : e7657f10
r8  : fff0 r9  : 0002 r10 : e7657780
r11 :  r12 : e7004160fe10 r13 : e70041608000
r14 :  r15 : 000e r16 : 0007f6c30a22
r17 : e70041608040 r18 : a001008383a8 r19 : a00100078e00
r20 : e7655bb8 r21 : e7655bb0 r22 : e7657ed0
r23 : 000f4240 r24 : a001009e0440 r25 : e70041608bb4
r26 :  r27 :  r28 : e7657f80
r29 : 02e7 r30 :  r31 : e7657780

Call Trace:
 [] show_stack+0x80/0xa0
sp=e7004160f9e0 bsp=e70041609008
 [] show_regs+0x870/0x8a0
sp=e7004160fbb0 bsp=e70041608fa8
 [] die+0x190/0x300
sp=e7004160fbb0 bsp=e70041608f60
 [] ia64_do_page_fault+0x780/0xa80
sp=e7004160fbb0 bsp=e70041608f08
 [] ia64_leave_kernel+0x0/0x270
sp=e7004160fc40 bsp=e70041608f08
 [] rb_next+0x0/0x140
sp=e7004160fe10 bsp=e70041608ef8
 [] __dequeue_entity+0x80/0xc0
sp=e7004160fe10 bsp=e70041608ec8
 [] pick_next_task_fair+0x60/0x180
sp=e7004160fe10 bsp=e70041608e98
 [] schedule+0x340/0x19c0
sp=e7004160fe10 bsp=e70041608cc0
 [] cpu_idle+0x290/0x3e0
sp=e7004160fe30 bsp=e70041608c50
 [] start_secondary+0x380/0x5a0
sp=e7004160fe30 bsp=e70041608c00
 [] __kprobes_text_end+0x6c0/0x6f0
sp=e7004160fe30 bsp=e70041608c00


Taking a quick look at [__]{en|de|queue_entity() and the functions they call,
I see something suspicious in set_leftmost() in sched_fair.c:

static inline void
set_leftmost(struct cfs_rq *cfs_rq, struct rb_node *leftmost)
{
struct sched_entity *se;

cfs_rq->rb_leftmost = leftmost;
if (leftmost)
se = rb_entry(leftmost, struct sched_entity, run_node);
}

Missing code?  corrupt patch?

config available on request, but there doesn't seem to be much in the way
of scheduler config option.  A few that might apply:

SCHED_SMT is not set
SCHED_DEBUG=y
SCHEDSTATS=y


Regards,
Lee Schermerhorn


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Sam Ravnborg

Hi Kamalesh.

> The link error for a PowerMac G5 (powerpc) is still seen with 
> 2.6.23-rc7-mm1,
> and was reported for 2.6.23-rc6-mm1 (http://lkml.org/lkml/2007/9/19/62).
> 
>  KSYM.tmp_kallsyms1.S
>  AS  .tmp_kallsyms1.o
>  LD  .tmp_vmlinux2
>  KSYM.tmp_kallsyms2.S
>  AS  .tmp_kallsyms2.o
>  LD  vmlinux.o
> ld: dynreloc miscount for fs/built-in.o, section .opd
> ld: can not edit opd Bad value
> make: *** [vmlinux.o] Error 1

Can you try to narrow it down a bit further...
As this happens when building fs/built-in.o it should be
straightforward to do so.

First step would be to do:
rm fs/built-in.o
make fs/ V=1

Then copy the ld invocation and try to remove the input .o files one-by-one.
This should tell you which .o file is causing the bug.

Next step is to try to squeze down the offending file until the
errornous part remains.

Last time I did a
make fs/file.i

And then I used gcc & ld to compile and link.
Gradually removing stuff from file.i made me spot the problem
with the weak prototype in a header file.

I guess something else is making ld hit this error now.

PS. Just reinstalled my dev box so no crosscompiler atm.

Sam
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Thomas Gleixner

On Mon, 2007-09-24 at 12:34 -0700, Andrew Morton wrote:
> > It prints twice 'System halted' and blinks the keyboard leds, but does
> > not switch off. On all other kernel version I only see one keyboard
> > blink before the power goes out.
> 
> ok...
> 
> > I compared its dmesg to vanilla-rc7 and -rc4-mm1, but expect that rc-4
> > assigns different IRQs I can't see any differences except the normal
> > variation in BogoMips etc.

Can your check whether 2.6.23-rc7 +
http://tglx.de/projects/hrtimers/2.6.23-rc7/patch-2.6.23-rc7-hrt1.patch

works for you ?

> hm, dunno.  The only substantial patch which touches
> arch/x86_64/kernel/process.c (which is where cpu_idle lives) is
> x86_64-prep-idle-loop-for-dynticks.patch.
> 
> The problem is, 2.6.23-rc6-mm1's git-acpi patch had all the new cpuidle
> code in it.  Len dropped all that code over the weekend (which is when I
> picked this copy of his tree), so 2.6.23-rc7-mm1 doesn't have the cpuidle
> code.  Len will be reapplying the cpuidle patches today(ish) so next -mm
> _will_ have the cpuidle code.
> 
> So what we have in rc7-mm1 is this transient no-cpuidle state.  It could be
> that the x86_64 dynticks code (which was developed previously tested in
> conjunction with the cpuidle patches) has some dependency on cpuidle.

It should not. cpuidle makes use of dynticks not the other way round.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] 2.6.23-rc7-mm1

2007-09-24 Thread Alan Stern

On Mon, 24 Sep 2007, Jiri Slaby wrote:

> Hmm, I have usb legacy keyboard switched on because of grub and bios to allow 
> me
>  typing.
> 
> I booted 23-rc7 4 times, and the latest -mm 3 times just now and can't 
> reproduce
> it, I just wonder by what is this conditioned.

Warm boot vs. cold boot, maybe.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Kamalesh Babulal

Hi Andrew,

The build fails with following error

CC drivers/block/ps3disk.o
drivers/block/ps3disk.c: In function ‘ps3disk_scatter_gather’:
drivers/block/ps3disk.c:115: error: ‘bio’ undeclared (first use in this
function)
drivers/block/ps3disk.c:115: error: (Each undeclared identifier is
reported only once
drivers/block/ps3disk.c:115: error: for each function it appears in.)
drivers/block/ps3disk.c:115: error: ‘j’ undeclared (first use in this
function)
drivers/block/ps3disk.c:116: error: implicit declaration of function
‘bio_kunmap_bvec’
make[2]: *** [drivers/block/ps3disk.o] Error 1
make[1]: *** [drivers/block] Error 2
make: *** [drivers] Error 2

The function bio_kunmap_bvec is missing.I tried checking the git-block.patch
as well as the linux/kernel/git/axboe/linux-2.6-block.git and did not
find this function.

Previously this function was replaced by __bio_kunmap_atomic();
This patch does not solves the implicit "declaration of function
‘bio_kunmap_bvec’"

Signed-off-by: Kamalesh Babulal <[EMAIL PROTECTED]
>
---

--- linux-2.6.23-rc7/drivers/block/ps3disk.c2007-09-24 20:50:41.0 
+0530
+++ linux-2.6.23-rc7/drivers/block/~ps3disk.c   2007-09-24 20:50:59.0 
+0530
@@ -112,7 +112,7 @@ static void ps3disk_scatter_gather(struc
else
memcpy(buf, dev->bounce_buf+offset, size);
offset += size;
-   flush_kernel_dcache_page(bio_iovec_idx(bio, j)->bv_page);
+  flush_kernel_dcache_page(bio_iovec_idx(iter.bio, 
iter.i)->bv_page);
bio_kunmap_bvec(bvec, flags);
i++;
}

-- 

Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Andrew Morton

On Mon, 24 Sep 2007 21:07:19 +0200
"Torsten Kaiser" <[EMAIL PROTECTED]> wrote:

> On 9/24/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/
> 
> With the five hotfixes applied it works for me.
> 
> But it fails to power down my system when shutting down.
> 
> It prints twice 'System halted' and blinks the keyboard leds, but does
> not switch off. On all other kernel version I only see one keyboard
> blink before the power goes out.

ok...

> I compared its dmesg to vanilla-rc7 and -rc4-mm1, but expect that rc-4
> assigns different IRQs I can't see any differences except the normal
> variation in BogoMips etc.
> 
> As the system still responded to SysRq I got the following informations:

good move.

> [  415.77] SysRq : Show Regs
> [  415.77] CPU 3:
> [  415.78] Modules linked in: radeon drm nfsd exportfs ipv6 tuner
> tea5767 tda8290 tuner_simple mt20xx tvaudio msp3400 bttv video_buf
> ir_common compat_ioctl32 btcx_risc tveeprom videodev v4l2_common
> v4l1_compat pata_amd usbhid hid sg
> [  415.78] Pid: 0, comm: swapper Not tainted 2.6.23-rc7-mm1 #1
> [  415.78] RIP: 0010:[]  []
> default_idle+0x29/0x40
> [  415.78] RSP: 0018:81010038bf30  EFLAGS: 0246
> [  415.78] RAX: 0400 RBX: 80810040 RCX: 
> 
> [  415.78] RDX:  RSI: 0001 RDI: 
> 0005
> [  415.78] RBP: 00030400 R08:  R09: 
> 81010038be68
> [  415.95] R10: 012c R11: 80219be0 R12: 
> 
> [  415.95] R13:  R14:  R15: 
> 
> [  415.95] FS:  7f35c69726f0() GS:810100319700()
> knlGS:
> [  415.95] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> [  415.95] CR2: 7fe432928c40 CR3: 00201000 CR4: 
> 06e0
> [  416.07] DR0:  DR1:  DR2: 
> 
> [  416.07] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  416.07]
> [  416.07] Call Trace:
> [  416.07]  [] cpu_idle+0x5a/0x90
> [  416.07]
> 
> No blocked tasks were shown with SysRq+W.
> Last lines before I used SysRq+B (That worked, a normal reboot started):
> 
> [  450.78] SysRq : Emergency Remount R/O
> [  450.79] Emergency Remount complete
> [  453.65] SysRq : Emergency Sync
> [  453.66] Emergency Sync complete
> [  455.91] SysRq : Power Off
> [  455.92] md: stopping all md devices.
> [  455.93] md: md1 still in use.
> [  456.94] sd 8:0:1:0: [sdd] Synchronizing SCSI cache
> [  456.96] sd 8:0:1:0: [sdd] Stopping disk
> [  457.48] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
> [  457.49] sd 2:0:0:0: [sdc] Stopping disk
> [  457.50] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
> [  457.52] sd 1:0:0:0: [sdb] Stopping disk
> [  457.53] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [  457.55] sd 0:0:0:0: [sda] Stopping disk
> [  457.56] Power down.
> [  479.09] SysRq : Power Off
> [  479.10] md: stopping all md devices.
> [  479.11] md: md1 still in use.
> [  480.12] sd 8:0:1:0: [sdd] Synchronizing SCSI cache
> [  480.14] sd 8:0:1:0: [sdd] Stopping disk
> [  480.66] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
> [  480.67] sd 2:0:0:0: [sdc] Stopping disk
> [  480.68] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
> [  480.70] sd 1:0:0:0: [sdb] Stopping disk
> [  480.71] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [  480.73] sd 0:0:0:0: [sda] Stopping disk
> [  480.74] Power down.
> [  489.03] SysRq : Resetting
> 

hm, dunno.  The only substantial patch which touches
arch/x86_64/kernel/process.c (which is where cpu_idle lives) is
x86_64-prep-idle-loop-for-dynticks.patch.

The problem is, 2.6.23-rc6-mm1's git-acpi patch had all the new cpuidle
code in it.  Len dropped all that code over the weekend (which is when I
picked this copy of his tree), so 2.6.23-rc7-mm1 doesn't have the cpuidle
code.  Len will be reapplying the cpuidle patches today(ish) so next -mm
_will_ have the cpuidle code.

So what we have in rc7-mm1 is this transient no-cpuidle state.  It could be
that the x86_64 dynticks code (which was developed previously tested in
conjunction with the cpuidle patches) has some dependency on cpuidle.

So it's all a bit of a mess :(

I think I'll basically stop applying things which don't look like bugfixes
for a while and try to get more -mm's out, as we seriously need to get this
lot stabilised asap.

Len, would it be possible to restore cpuidle sometime today please?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Peter Zijlstra

On Mon, 24 Sep 2007 22:38:03 +0530 Kamalesh Babulal
<[EMAIL PROTECTED]> wrote:

> Peter Zijlstra wrote:
> > On Mon, 24 Sep 2007 09:44:48 -0700 Andrew Morton
> > <[EMAIL PROTECTED]> wrote:
> > 
> >> On Mon, 24 Sep 2007 18:43:33 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> 
> >> wrote:
> >>
> >>> Hi Andrew,
> >>>
> >>> Kernel BUG over x86_64 (AMD Opteron(tm) Processor 844).
> >>>
> >>> Similar kernel Bug was reported for 2.6.23-rc2-mm1
> >>> at http://lkml.org/lkml/2007/8/10/20 and the 
> >>> mm-dirty-balancing-for-tasks.patch was dropped from 2.6.23-rc2-mm2.
> >>> And the same patch is in this -mm version, suspect whether is it the
> >>> same patch triggering this Bug.
> >>>
> >>> BUG: soft lockup - CPU#0 stuck for 11s! [events/0:15]
> >>> CPU 0:
> >>> Modules linked in:
> >>> Pid: 15, comm: events/0 Tainted: G  D 2.6.23-rc7-mm1-autokern1 #1
> >>> RIP: 0010:[]  [] 
> >>> __smp_call_function_mask+0x9a/0xc4
> >>> RSP: :8100017add80  EFLAGS: 0297
> >>> RAX: 00fc RBX: 8100017adde0 RCX: 0001
> >>> RDX: 08fc RSI: 00fc RDI: 000e
> >>> RBP: c20002d11000 R08: 8100017ac000 R09: 80675e38
> >>> R10:  R11:  R12: 000f
> >>> R13: 8021bcfe R14:  R15: 0001
> >>> FS:  () GS:8065a000() 
> >>> knlGS:556aa2a0
> >>> CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> >>> CR2: c20002d11008 CR3: 00201000 CR4: 06e0
> >>> DR0:  DR1:  DR2: 
> >>> DR3:  DR6: 0ff0 DR7: 0400
> >>>
> >>> Call Trace:
> >>> Inexact backtrace:
> >>>  [] mcheck_check_cpu+0x0/0x31
> >>>  [] mcheck_check_cpu+0x0/0x31
> >>>  [] smp_call_function_mask+0x5f/0x72
> >>>  [] mcheck_check_cpu+0x0/0x31
> >>>  [] smp_call_function+0x19/0x1b
> >>>  [] on_each_cpu+0x16/0x2b
> >>>  [] mcheck_timer+0x0/0x7c
> >>>  [] mcheck_timer+0x1e/0x7c
> >>>  [] run_workqueue+0x88/0x109
> >>>  [] worker_thread+0x0/0xf4
> >>>  [] worker_thread+0xe9/0xf4
> >>>  [] autoremove_wake_function+0x0/0x37
> >>>  [] autoremove_wake_function+0x0/0x37
> >>>  [] kthread+0x44/0x6d
> >>>  [] child_rip+0xa/0x12
> >>>  [] kthread+0x0/0x6d
> >>>  [] child_rip+0x0/0x12
> >> hm, I thought we'd fixed the problems in that patchset.  Peter, were
> >> you aware of this one?
> > 
> > Nope, and the stacktrace is utterly puzzling.
> > 
> > /me goes read the lkml.org link
> > 
> > Kamalesh Babulal: do you still get:
> >   BUG: spinlock bad magic on
> > 
> > msgs?
> > 
> > Because those I could reproduce using fsx, and I fixed all that.
> Hi Peter,
> 
> I do not get BUG: spinlock bad magic messages any more, but the softlock 
> message is
> thrown more than 30 time, while running the ltp runall.

It would be good to know what function on_each_cpu is executing, could
you try something like:

---
 kernel/softirq.c|5 +
 kernel/softlockup.c |7 +++
 2 files changed, 12 insertions(+)

Index: linux-2.6/kernel/softirq.c
===
--- linux-2.6.orig/kernel/softirq.c
+++ linux-2.6/kernel/softirq.c
@@ -645,6 +645,8 @@ __init int spawn_ksoftirqd(void)
 }
 
 #ifdef CONFIG_SMP
+
+DEFINE_PER_CPU(void (*)(void *info), last_on_each_cpu);
 /*
  * Call a function on all processors
  */
@@ -653,6 +655,9 @@ int on_each_cpu(void (*func) (void *info
int ret = 0;
 
preempt_disable();
+
+   per_cpu(last_on_each_cpu, smp_processor_id()) = func;
+
ret = smp_call_function(func, info, retry, wait);
local_irq_disable();
func(info);
Index: linux-2.6/kernel/softlockup.c
===
--- linux-2.6.orig/kernel/softlockup.c
+++ linux-2.6/kernel/softlockup.c
@@ -15,6 +15,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 
@@ -71,6 +73,8 @@ void touch_all_softlockup_watchdogs(void
 }
 EXPORT_SYMBOL(touch_all_softlockup_watchdogs);
 
+DECLARE_PER_CPU(void (*)(void *), last_on_each_cpu);
+
 /*
  * This callback runs from the timer interrupt, and checks
  * whether the watchdog thread has hung or not:
@@ -122,6 +126,9 @@ void softlockup_tick(void)
printk(KERN_ERR "BUG: soft lockup - CPU#%d stuck for %lus! [%s:%d]\n",
this_cpu, now - touch_timestamp,
current->comm, task_pid_nr(current));
+   printk(KERN_ERR " last_on_each_cpu: [<%p>] ",
+   per_cpu(last_on_each_cpu, this_cpu));
+   print_symbol("%s\n", (unsigned long)per_cpu(last_on_each_cpu, 
this_cpu));
if (regs)
show_regs(regs);
else
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] 2.6.23-rc7-mm1

2007-09-24 Thread Jiri Slaby

On 09/24/2007 09:06 PM, Alan Stern wrote:
> On Mon, 24 Sep 2007, Jiri Slaby wrote:
> 
>> On 09/24/2007 04:41 PM, Alan Stern wrote:
>>> On Mon, 24 Sep 2007, Jiri Slaby wrote:
>>>
>>>> On 09/24/2007 11:17 AM, Andrew Morton wrote:
>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/
>>>> Fine, but on some boots (I noticed this on rc6-mm1 too, but not before):
>>>> :00:1a.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
>>>> :00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
>>> Any changes in your BIOS setup?
>> unlikely, but still possible -- I've made some changes in BIOS recently when 
>> I
>> looking backwards. Which concrete changes would turn it in such behaviour?
> 
> USB Legacy Support is about the only change which springs to mind.  But 
> who knows...  A buggy BIOS could do almost anything.

Hmm, I have usb legacy keyboard switched on because of grub and bios to allow me
 typing.

I booted 23-rc7 4 times, and the latest -mm 3 times just now and can't reproduce
it, I just wonder by what is this conditioned.

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Torsten Kaiser

On 9/24/07, Andrew Morton <[EMAIL PROTECTED]> wrote:
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/

With the five hotfixes applied it works for me.

But it fails to power down my system when shutting down.

It prints twice 'System halted' and blinks the keyboard leds, but does
not switch off. On all other kernel version I only see one keyboard
blink before the power goes out.

I compared its dmesg to vanilla-rc7 and -rc4-mm1, but expect that rc-4
assigns different IRQs I can't see any differences except the normal
variation in BogoMips etc.

As the system still responded to SysRq I got the following informations:
[  415.77] SysRq : Show Regs
[  415.77] CPU 3:
[  415.78] Modules linked in: radeon drm nfsd exportfs ipv6 tuner
tea5767 tda8290 tuner_simple mt20xx tvaudio msp3400 bttv video_buf
ir_common compat_ioctl32 btcx_risc tveeprom videodev v4l2_common
v4l1_compat pata_amd usbhid hid sg
[  415.78] Pid: 0, comm: swapper Not tainted 2.6.23-rc7-mm1 #1
[  415.78] RIP: 0010:[]  []
default_idle+0x29/0x40
[  415.78] RSP: 0018:81010038bf30  EFLAGS: 0246
[  415.78] RAX: 0400 RBX: 80810040 RCX: 
[  415.78] RDX:  RSI: 0001 RDI: 0005
[  415.78] RBP: 00030400 R08:  R09: 81010038be68
[  415.95] R10: 012c R11: 80219be0 R12: 
[  415.95] R13:  R14:  R15: 
[  415.95] FS:  7f35c69726f0() GS:810100319700()
knlGS:
[  415.95] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
[  415.95] CR2: 7fe432928c40 CR3: 00201000 CR4: 06e0
[  416.07] DR0:  DR1:  DR2: 
[  416.07] DR3:  DR6: 0ff0 DR7: 0400
[  416.07]
[  416.07] Call Trace:
[  416.07]  [] cpu_idle+0x5a/0x90
[  416.07]

No blocked tasks were shown with SysRq+W.
Last lines before I used SysRq+B (That worked, a normal reboot started):

[  450.78] SysRq : Emergency Remount R/O
[  450.79] Emergency Remount complete
[  453.65] SysRq : Emergency Sync
[  453.66] Emergency Sync complete
[  455.91] SysRq : Power Off
[  455.92] md: stopping all md devices.
[  455.93] md: md1 still in use.
[  456.94] sd 8:0:1:0: [sdd] Synchronizing SCSI cache
[  456.96] sd 8:0:1:0: [sdd] Stopping disk
[  457.48] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
[  457.49] sd 2:0:0:0: [sdc] Stopping disk
[  457.50] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[  457.52] sd 1:0:0:0: [sdb] Stopping disk
[  457.53] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  457.55] sd 0:0:0:0: [sda] Stopping disk
[  457.56] Power down.
[  479.09] SysRq : Power Off
[  479.10] md: stopping all md devices.
[  479.11] md: md1 still in use.
[  480.12] sd 8:0:1:0: [sdd] Synchronizing SCSI cache
[  480.14] sd 8:0:1:0: [sdd] Stopping disk
[  480.66] sd 2:0:0:0: [sdc] Synchronizing SCSI cache
[  480.67] sd 2:0:0:0: [sdc] Stopping disk
[  480.68] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[  480.70] sd 1:0:0:0: [sdb] Stopping disk
[  480.71] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[  480.73] sd 0:0:0:0: [sda] Stopping disk
[  480.74] Power down.
[  489.03] SysRq : Resetting

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] 2.6.23-rc7-mm1

2007-09-24 Thread Alan Stern

On Mon, 24 Sep 2007, Jiri Slaby wrote:

> On 09/24/2007 04:41 PM, Alan Stern wrote:
> > On Mon, 24 Sep 2007, Jiri Slaby wrote:
> > 
> >> On 09/24/2007 11:17 AM, Andrew Morton wrote:
> >>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/
> >> Fine, but on some boots (I noticed this on rc6-mm1 too, but not before):
> >> :00:1a.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
> >> :00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
> > 
> > Any changes in your BIOS setup?
> 
> unlikely, but still possible -- I've made some changes in BIOS recently when I
> looking backwards. Which concrete changes would turn it in such behaviour?

USB Legacy Support is about the only change which springs to mind.  But 
who knows...  A buggy BIOS could do almost anything.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [linux-usb-devel] 2.6.23-rc7-mm1

2007-09-24 Thread Jiri Slaby

On 09/24/2007 04:41 PM, Alan Stern wrote:
> On Mon, 24 Sep 2007, Jiri Slaby wrote:
> 
>> On 09/24/2007 11:17 AM, Andrew Morton wrote:
>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/
>> Fine, but on some boots (I noticed this on rc6-mm1 too, but not before):
>> :00:1a.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
>> :00:1d.7 EHCI: BIOS handoff failed (BIOS bug ?) 01010001
> 
> Any changes in your BIOS setup?

unlikely, but still possible -- I've made some changes in BIOS recently when I
looking backwards. Which concrete changes would turn it in such behaviour?

> What about with vanilla 2.6.23-rc6?  Or vanilla 2.6.23-rc7?
> 
> The USB part of the code here hasn't changed in quite a while.  Any 
> difference in behavior must be the result of changes in some other part 
> of the kernel.  Possibly ACPI.
> 
> This might be a good job for git-bisect.

Ok, I'll play with that little bit.

thanks,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread J. Bruce Fields

On Mon, Sep 24, 2007 at 09:59:29AM -0700, Andrew Morton wrote:
> On Tue, 25 Sep 2007 00:52:30 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote:
> 
> > 
> > 
> > On 24/09/2007 7:17 PM, Andrew Morton wrote:
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/
> > > 
> > > - New git tree git-powerpc-galak.patch added to the -mm lineup: ppc32
> > >   things, mainly (Kumar Gala <[EMAIL PROTECTED]>)
> > 
> > I'm observing a problem with this kernel (as well as 2.6.23-rc6-mm1) which 
> > manifests itself only in my Postfix/application mail.logs:
> > 
> > Sep 25 00:25:40 tornado postfix/smtp[12520]: fatal: select lock: Cannot 
> > allocate 
> > memory
> > Sep 25 00:25:41 tornado postfix/master[8002]: warning: process 
> > /usr/lib64/postfix/smtp pid 12520 exit status 1
> > 
> > This is happening frequently with processes started via 'master' (smtp, 
> > smtpd 
> > and cleanup), but it does not appear to have any noticeable operational 
> > impact 
> > apart from logging a lot of copies of this message.
> > 
> > The corresponding code in Postfix which triggers this is (choice of 3 files 
> > in 
> > src/master are all possibilities which all have much the same code)

Oog.  Looks like it's the "Memory shortage can result in inconsistent
flocks state" patch--the error variable is being set in some cases when
it shouldn't be.  Does the following fix it?

That's in my git tree, not in mainline.  I'll fix up my copy.

And I'll spend some time today figuring out what to do about regression
testing for the posix lock, flock, and lease code.

Thanks for the bug report!

--b.

diff --git a/fs/locks.c b/fs/locks.c
index a6c5917..3e8bfd2 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -740,6 +740,7 @@ static int flock_lock_file(struct file *filp, struct 
file_lock *request)
new_fl = locks_alloc_lock();
if (new_fl == NULL)
goto out;
+   error = 0;
}
 
for_each_lock(inode, before) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Kamalesh Babulal

Peter Zijlstra wrote:
> On Mon, 24 Sep 2007 09:44:48 -0700 Andrew Morton
> <[EMAIL PROTECTED]> wrote:
> 
>> On Mon, 24 Sep 2007 18:43:33 +0530 Kamalesh Babulal <[EMAIL PROTECTED]> 
>> wrote:
>>
>>> Hi Andrew,
>>>
>>> Kernel BUG over x86_64 (AMD Opteron(tm) Processor 844).
>>>
>>> Similar kernel Bug was reported for 2.6.23-rc2-mm1
>>> at http://lkml.org/lkml/2007/8/10/20 and the 
>>> mm-dirty-balancing-for-tasks.patch was dropped from 2.6.23-rc2-mm2.
>>> And the same patch is in this -mm version, suspect whether is it the
>>> same patch triggering this Bug.
>>>
>>> BUG: soft lockup - CPU#0 stuck for 11s! [events/0:15]
>>> CPU 0:
>>> Modules linked in:
>>> Pid: 15, comm: events/0 Tainted: G  D 2.6.23-rc7-mm1-autokern1 #1
>>> RIP: 0010:[]  [] 
>>> __smp_call_function_mask+0x9a/0xc4
>>> RSP: :8100017add80  EFLAGS: 0297
>>> RAX: 00fc RBX: 8100017adde0 RCX: 0001
>>> RDX: 08fc RSI: 00fc RDI: 000e
>>> RBP: c20002d11000 R08: 8100017ac000 R09: 80675e38
>>> R10:  R11:  R12: 000f
>>> R13: 8021bcfe R14:  R15: 0001
>>> FS:  () GS:8065a000() knlGS:556aa2a0
>>> CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
>>> CR2: c20002d11008 CR3: 00201000 CR4: 06e0
>>> DR0:  DR1:  DR2: 
>>> DR3:  DR6: 0ff0 DR7: 0400
>>>
>>> Call Trace:
>>> Inexact backtrace:
>>>  [] mcheck_check_cpu+0x0/0x31
>>>  [] mcheck_check_cpu+0x0/0x31
>>>  [] smp_call_function_mask+0x5f/0x72
>>>  [] mcheck_check_cpu+0x0/0x31
>>>  [] smp_call_function+0x19/0x1b
>>>  [] on_each_cpu+0x16/0x2b
>>>  [] mcheck_timer+0x0/0x7c
>>>  [] mcheck_timer+0x1e/0x7c
>>>  [] run_workqueue+0x88/0x109
>>>  [] worker_thread+0x0/0xf4
>>>  [] worker_thread+0xe9/0xf4
>>>  [] autoremove_wake_function+0x0/0x37
>>>  [] autoremove_wake_function+0x0/0x37
>>>  [] kthread+0x44/0x6d
>>>  [] child_rip+0xa/0x12
>>>  [] kthread+0x0/0x6d
>>>  [] child_rip+0x0/0x12
>> hm, I thought we'd fixed the problems in that patchset.  Peter, were
>> you aware of this one?
> 
> Nope, and the stacktrace is utterly puzzling.
> 
> /me goes read the lkml.org link
> 
> Kamalesh Babulal: do you still get:
>   BUG: spinlock bad magic on
> 
> msgs?
> 
> Because those I could reproduce using fsx, and I fixed all that.
Hi Peter,

I do not get BUG: spinlock bad magic messages any more, but the softlock 
message is
thrown more than 30 time, while running the ltp runall.

-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.23-rc7-mm1

2007-09-24 Thread Andrew Morton

On Tue, 25 Sep 2007 00:52:30 +1000 Reuben Farrelly <[EMAIL PROTECTED]> wrote:

> 
> 
> On 24/09/2007 7:17 PM, Andrew Morton wrote:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.23-rc7/2.6.23-rc7-mm1/
> > 
> > - New git tree git-powerpc-galak.patch added to the -mm lineup: ppc32
> >   things, mainly (Kumar Gala <[EMAIL PROTECTED]>)
> 
> I'm observing a problem with this kernel (as well as 2.6.23-rc6-mm1) which 
> manifests itself only in my Postfix/application mail.logs:
> 
> Sep 25 00:25:40 tornado postfix/smtp[12520]: fatal: select lock: Cannot 
> allocate 
> memory
> Sep 25 00:25:41 tornado postfix/master[8002]: warning: process 
> /usr/lib64/postfix/smtp pid 12520 exit status 1
> 
> This is happening frequently with processes started via 'master' (smtp, smtpd 
> and cleanup), but it does not appear to have any noticeable operational 
> impact 
> apart from logging a lot of copies of this message.
> 
> The corresponding code in Postfix which triggers this is (choice of 3 files 
> in 
> src/master are all possibilities which all have much the same code)
> 
>  /*
>   * The event loop, at last.
>   */
>  while (var_use_limit == 0 || use_count < var_use_limit || client_count > 
> 0) {
>  if (multi_server_lock != 0) {
>  watchdog_stop(watchdog);
>  if (myflock(vstream_fileno(multi_server_lock), INTERNAL_LOCK,
>  MYFLOCK_OP_EXCLUSIVE) < 0)
>  msg_fatal("select lock: %m");
>  }
>  watchdog_start(watchdog);
>  delay = loop ? loop(multi_server_name, multi_server_argv) : -1;
>  event_loop(delay);
>  }
>  multi_server_exit();
> }
> 
> 
> Now I'm not convinced this is an application problem, because I'm only seeing 
> this after running up kernel 2.6.23-rc6-mm1 or 2.6.23-rc7-mm1 and with NO 
> changes to the application itself.  Using the same application binaries it 
> does 
> not occur with 2.6.22 mainline.  [I didn't get a lot of testing with the -mm 
> release prior to that unfortunately due to some other breakage.]
> 
> Is there anything new in the last two or so -mm kernels which could have 
> caused 
> this?
> 
> I've put my .config up at 
> http://www.reub.net/files/kernel/2.6.23-rc7-mm1.config

ug.

Lots of people have been futzing with the fs/locks.c code:

cleanup-macros-for-distinguishing-mandatory-locks.patch
fix-potential-oops-in-generic_setlease-v2.patch
fix-potential-oops-in-generic_setlease.patch
fs-locksc-use-list_for_each_entry-instead-of-list_for_each.patch
git-nfs.patch
git-nfsd.pc
rework-proc-locks-via-seq_files-and-seq_list-helpers-fix-2.patch
rework-proc-locks-via-seq_files-and-seq_list-helpers.patch
slab-api-remove-useless-ctor-parameter-and-reorder-parameters.patch



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 197 matches

Mail list logo