Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-07 Thread Nish Aravamudan
On 10/5/07, Linas Vepstas [EMAIL PROTECTED] wrote:
 On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
  On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote:
   On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
  
I realise it'll make the patch bigger, but this doesn't seem like a
particularly good name for the variable anymore.
  
   Sure, what about?
  
   Clarify when RTAS logging is enabled.
  
   Signed-off-by: Tony Breeds [EMAIL PROTECTED]
 
  For what it's worth, on a different ppc64 box, this resolves a similar
  panic for me.
 
  Tested-by: Nishanth Aravamudan [EMAIL PROTECTED]

 For the reasons explained, I'd really like to nack Tony's patch.

I see. Can you reply in this thread with the patch you mentioned in
your other reply? (or point me to a copy of it)

Thanks,
Nish
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-05 Thread Linas Vepstas
On Thu, Oct 04, 2007 at 05:01:47PM -0700, Nish Aravamudan wrote:
 On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote:
  On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
 
   I realise it'll make the patch bigger, but this doesn't seem like a
   particularly good name for the variable anymore.
 
  Sure, what about?
 
  Clarify when RTAS logging is enabled.
 
  Signed-off-by: Tony Breeds [EMAIL PROTECTED]
 
 For what it's worth, on a different ppc64 box, this resolves a similar
 panic for me.
 
 Tested-by: Nishanth Aravamudan [EMAIL PROTECTED]

For the reasons explained, I'd really like to nack Tony's patch.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-04 Thread Nish Aravamudan
On 10/2/07, Tony Breeds [EMAIL PROTECTED] wrote:
 On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:

  I realise it'll make the patch bigger, but this doesn't seem like a
  particularly good name for the variable anymore.

 Sure, what about?

 Clarify when RTAS logging is enabled.

 Signed-off-by: Tony Breeds [EMAIL PROTECTED]

For what it's worth, on a different ppc64 box, this resolves a similar
panic for me.

Tested-by: Nishanth Aravamudan [EMAIL PROTECTED]

Thanks,
Nish
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-03 Thread Linas Vepstas
On Wed, Oct 03, 2007 at 02:09:46PM +1000, Michael Ellerman wrote:
 
 Until we initialise what exactly?

Until we allocate the error log buffer. The original crash was 
for a null-pointer deref of the unallocated buffer. I just sent 
out a patch to fix this; its a bit simpler than the below.

In that email, I remarked:

Andy Whitcroft's crash was appearently due to firmware complaining
about lost power, (actually, lost power supply redundancy!), which
occurred very early during boot.

Type0040 (EPOW)
Status: bypassed new
Residual error from previous boot.
EPOW Sensor Value:  0002
EPOW warning due to loss of redundancy.
EPOW general power fault.

I've no clue why firmware thought it was OK to report this
during one of the earliest calls to RTAS; I'm still investiigating
that.

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Linas Vepstas
On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
 Seeing the following from an older power LPAR, pretty sure we had
 this in the previous -mm also:

I haven't forgetten about this ... and am looking at it now.
Seems that whenever I go to reserve the machine pSeries-102,
someone else is using it :-)

--linas
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Tony Breeds
On Tue, Oct 02, 2007 at 06:28:19PM -0500, Linas Vepstas wrote:
 On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
  Seeing the following from an older power LPAR, pretty sure we had
  this in the previous -mm also:
 
 I haven't forgetten about this ... and am looking at it now.
 Seems that whenever I go to reserve the machine pSeries-102,
 someone else is using it :-)

This panic is caused by [POWERPC] pseries: Fix jumbled no_logging flag.
(79c0108d1b9db4864ab77b2a95dfa04f2dcf264c), in the powerpc/for-2.6.24
branch.  It looks to me that we have logging enabled too early now.

I think the following is a reasonable fix?

---
Explicitly enable RTAS error logging, when it should be ready.


Signed-off-by: Tony Breeds [EMAIL PROTECTED]

---

 arch/powerpc/platforms/pseries/rtasd.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
b/arch/powerpc/platforms/pseries/rtasd.c
index 30925d2..0df5d0d 100644
--- a/arch/powerpc/platforms/pseries/rtasd.c
+++ b/arch/powerpc/platforms/pseries/rtasd.c
@@ -54,7 +54,10 @@ static unsigned int rtas_event_scan_rate;
 static int full_rtas_msgs = 0;
 
 /* Stop logging to nvram after first fatal error */
-static int no_more_logging;
+static int no_more_logging = 1; /* Until we initialize everything,
+ * make sure we don't try logging
+ * anything */
+
 
 static int error_log_cnt;
 
@@ -414,6 +417,8 @@ static int rtasd(void *unused)
memset(logdata, 0, rtas_error_log_max);
rc = nvram_read_error_log(logdata, rtas_error_log_max,
  err_type, error_log_cnt);
+   /* We can use rtas_log_buf now */
+   no_more_logging = 0;
 
if (!rc) {
if (err_type != ERR_FLAG_ALREADY_LOGGED) {

Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Michael Ellerman
On Wed, 2007-10-03 at 10:26 +1000, Tony Breeds wrote:
 On Tue, Oct 02, 2007 at 06:28:19PM -0500, Linas Vepstas wrote:
  On Mon, Sep 24, 2007 at 01:35:31PM +0100, Andy Whitcroft wrote:
   Seeing the following from an older power LPAR, pretty sure we had
   this in the previous -mm also:
  
  I haven't forgetten about this ... and am looking at it now.
  Seems that whenever I go to reserve the machine pSeries-102,
  someone else is using it :-)
 
 This panic is caused by [POWERPC] pseries: Fix jumbled no_logging flag.
 (79c0108d1b9db4864ab77b2a95dfa04f2dcf264c), in the powerpc/for-2.6.24
 branch.  It looks to me that we have logging enabled too early now.
 
 I think the following is a reasonable fix?
 
 ---
 Explicitly enable RTAS error logging, when it should be ready.
 
 
 Signed-off-by: Tony Breeds [EMAIL PROTECTED]
 
 ---
 
  arch/powerpc/platforms/pseries/rtasd.c |7 ++-
  1 files changed, 6 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
 b/arch/powerpc/platforms/pseries/rtasd.c
 index 30925d2..0df5d0d 100644
 --- a/arch/powerpc/platforms/pseries/rtasd.c
 +++ b/arch/powerpc/platforms/pseries/rtasd.c
 @@ -54,7 +54,10 @@ static unsigned int rtas_event_scan_rate;
  static int full_rtas_msgs = 0;
  
  /* Stop logging to nvram after first fatal error */
 -static int no_more_logging;
 +static int no_more_logging = 1; /* Until we initialize everything,
 + * make sure we don't try logging
 + * anything */
 +

I realise it'll make the patch bigger, but this doesn't seem like a
particularly good name for the variable anymore.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Tony Breeds
On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
 
 I realise it'll make the patch bigger, but this doesn't seem like a
 particularly good name for the variable anymore.

Sure, what about?

Clarify when RTAS logging is enabled.

Signed-off-by: Tony Breeds [EMAIL PROTECTED]

---
 arch/powerpc/platforms/pseries/rtasd.c |   15 +--
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
b/arch/powerpc/platforms/pseries/rtasd.c
index 30925d2..73401c8 100644
--- a/arch/powerpc/platforms/pseries/rtasd.c
+++ b/arch/powerpc/platforms/pseries/rtasd.c
@@ -54,8 +54,9 @@ static unsigned int rtas_event_scan_rate;
 static int full_rtas_msgs = 0;
 
 /* Stop logging to nvram after first fatal error */
-static int no_more_logging;
-
+static int logging_enabled; /* Until we initialize everything,
+ * make sure we don't try logging
+ * anything */
 static int error_log_cnt;
 
 /*
@@ -217,7 +218,7 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
int fatal)
}
 
/* Write error to NVRAM */
-   if (!no_more_logging  !(err_type  ERR_FLAG_BOOT))
+   if (logging_enabled  !(err_type  ERR_FLAG_BOOT))
nvram_write_error_log(buf, len, err_type, error_log_cnt);
 
/*
@@ -229,8 +230,8 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
int fatal)
printk_log_rtas(buf, len);
 
/* Check to see if we need to or have stopped logging */
-   if (fatal || no_more_logging) {
-   no_more_logging = 1;
+   if (fatal || !logging_enabled) {
+   logging_enabled = 0;
spin_unlock_irqrestore(rtasd_log_lock, s);
return;
}
@@ -302,7 +303,7 @@ static ssize_t rtas_log_read(struct file * file, char 
__user * buf,
 
spin_lock_irqsave(rtasd_log_lock, s);
/* if it's 0, then we know we got the last one (the one in NVRAM) */
-   if (rtas_log_size == 0  !no_more_logging)
+   if (rtas_log_size == 0  logging_enabled)
nvram_clear_error_log();
spin_unlock_irqrestore(rtasd_log_lock, s);
 
@@ -414,6 +415,8 @@ static int rtasd(void *unused)
memset(logdata, 0, rtas_error_log_max);
rc = nvram_read_error_log(logdata, rtas_error_log_max,
  err_type, error_log_cnt);
+   /* We can use rtas_log_buf now */
+   logging_enabled = 1;
 
if (!rc) {
if (err_type != ERR_FLAG_ALREADY_LOGGED) {

Yours Tony

  linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/
  Jan 28 - Feb 02 2008 The Australian Linux Technical Conference!

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: 2.6.23-rc7-mm1 -- powerpc rtas panic

2007-10-02 Thread Michael Ellerman
On Wed, 2007-10-03 at 11:19 +1000, Tony Breeds wrote:
 On Wed, Oct 03, 2007 at 10:30:16AM +1000, Michael Ellerman wrote:
  
  I realise it'll make the patch bigger, but this doesn't seem like a
  particularly good name for the variable anymore.
 
 Sure, what about?

Better .. but  ..   :D

 diff --git a/arch/powerpc/platforms/pseries/rtasd.c 
 b/arch/powerpc/platforms/pseries/rtasd.c
 index 30925d2..73401c8 100644
 --- a/arch/powerpc/platforms/pseries/rtasd.c
 +++ b/arch/powerpc/platforms/pseries/rtasd.c
 @@ -54,8 +54,9 @@ static unsigned int rtas_event_scan_rate;
  static int full_rtas_msgs = 0;
  
  /* Stop logging to nvram after first fatal error */
 -static int no_more_logging;
 -
 +static int logging_enabled; /* Until we initialize everything,
 + * make sure we don't try logging
 + * anything */

Until we initialise what exactly?

  static int error_log_cnt;
  
  /*
 @@ -217,7 +218,7 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
 int fatal)
   }
  
   /* Write error to NVRAM */
 - if (!no_more_logging  !(err_type  ERR_FLAG_BOOT))
 + if (logging_enabled  !(err_type  ERR_FLAG_BOOT))
   nvram_write_error_log(buf, len, err_type, error_log_cnt);
  
   /*
 @@ -229,8 +230,8 @@ void pSeries_log_error(char *buf, unsigned int err_type, 
 int fatal)
   printk_log_rtas(buf, len);
  
   /* Check to see if we need to or have stopped logging */
 - if (fatal || no_more_logging) {
 - no_more_logging = 1;
 + if (fatal || !logging_enabled) {
 + logging_enabled = 0;
   spin_unlock_irqrestore(rtasd_log_lock, s);
   return;
   }

Hmmm, this routine has 4 separate lock-dropping exit paths ..

 @@ -302,7 +303,7 @@ static ssize_t rtas_log_read(struct file * file, char 
 __user * buf,
  
   spin_lock_irqsave(rtasd_log_lock, s);
   /* if it's 0, then we know we got the last one (the one in NVRAM) */
 - if (rtas_log_size == 0  !no_more_logging)
 + if (rtas_log_size == 0  logging_enabled)
   nvram_clear_error_log();
   spin_unlock_irqrestore(rtasd_log_lock, s);
  
 @@ -414,6 +415,8 @@ static int rtasd(void *unused)
   memset(logdata, 0, rtas_error_log_max);
   rc = nvram_read_error_log(logdata, rtas_error_log_max,
 err_type, error_log_cnt);
 + /* We can use rtas_log_buf now */
 + logging_enabled = 1;
  
   if (!rc) {
   if (err_type != ERR_FLAG_ALREADY_LOGGED) {

What exactly happens that allows us to do logging? I don't see any
ordering between anything else and the setting of the flag, and AFAICT
we're not inside a spinlock or anything here.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev