Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-05-10 Thread Kamalesh Babulal
Paul Mackerras wrote:
 Kamalesh Babulal writes:
 
 Thanks, after applying the patch the oops is not reproducible on the 
 machine. The console
 log had no message starting with SLB: or FWNMI:. I have updated the bugzilla 
 also.

 Tested-by: Kamalesh Babulal [EMAIL PROTECTED]
 
 Could you test Linus' current git tree and see if you can reproduce
 the same problem now?  The patch I sent upstream was a little
 different from the one you tested, though it should have the same
 effect, and I would like to be sure that it is just as effective at
 fixing the bug.
 
 Thanks,
 Paul.

Hi Paul,

The patch has the same effect. I tested the 2.6.26-rc1-git7 kernel and
the oops is not reproducible.

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-05-08 Thread Paul Mackerras
Kamalesh Babulal writes:

 Thanks, after applying the patch the oops is not reproducible on the machine. 
 The console
 log had no message starting with SLB: or FWNMI:. I have updated the bugzilla 
 also.
 
 Tested-by: Kamalesh Babulal [EMAIL PROTECTED]

Could you test Linus' current git tree and see if you can reproduce
the same problem now?  The patch I sent upstream was a little
different from the one you tested, though it should have the same
effect, and I would like to be sure that it is just as effective at
fixing the bug.

Thanks,
Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-24 Thread Kamalesh Babulal
Paul Mackerras wrote:
 Kamalesh Babulal writes:
 
 After applying the patch above and the patch posted on
 http://lkml.org/lkml/2008/4/8/42
 the bug had the following information,
 
 Thanks.  The patch below, against Linus' current git tree, fixes one
 bug that might be the cause of the problem, and also attempts to
 detect the erroneous situation earlier and fix it up, and also print
 some debug information.  Please try to reproduce the problem with this
 patch applied, and if there are any console log messages starting with
 SLB: or FWNMI:, please send me the console log.
 
 Paul.
 
 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
 index c0db5b7..f7f0962 100644
 --- a/arch/powerpc/kernel/entry_64.S
 +++ b/arch/powerpc/kernel/entry_64.S
 @@ -439,6 +439,19 @@ END_FTR_SECTION_IFSET(CPU_FTR_1T_SEGMENT)
   mr  r1,r8   /* start using new stack pointer */
   std r7,PACAKSAVE(r13)
 
 + /* check that SLB entry 2 contains the right thing */
 + clrrdi  r6,r1,28
 + clrldi. r0,r6,2
 + beq 3f
 + li  r0,2
 + slbmfee r7,r0
 + orisr6,r6,[EMAIL PROTECTED]
 + cmpdr6,r7
 + beq 3f
 + bl  bad_slb_switch
 + ld  r3,PACACURRENT(r13)
 + addir3,r3,THREAD
 +3:
   ld  r6,_CCR(r1)
   mtcrf   0xFF,r6
 
 @@ -540,6 +553,19 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
   ld  r4,_XER(r1)
   mtspr   SPRN_XER,r4
 
 + /* check that SLB entry 2 contains the right thing */
 + clrrdi  r6,r1,28/* stack ESID */
 + clrldi. r0,r6,2
 + beq 57f
 + li  r0,2
 + slbmfee r7,r0
 + orisr6,r6,[EMAIL PROTECTED]
 + cmpdr6,r7
 + beq 57f
 + addir3,r1,STACK_FRAME_OVERHEAD
 + bl  bad_slb_exc
 + ld  r3,_MSR(r1)
 +57:
   REST_8GPRS(5, r1)
 
   andi.   r0,r3,MSR_RI
 diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
 index be35ffa..c938134 100644
 --- a/arch/powerpc/kernel/smp.c
 +++ b/arch/powerpc/kernel/smp.c
 @@ -45,6 +45,7 @@
  #include asm/system.h
  #include asm/mpic.h
  #include asm/vdso_datapage.h
 +#include asm/mmu.h
  #ifdef CONFIG_PPC64
  #include asm/paca.h
  #endif
 @@ -580,6 +581,10 @@ int __devinit start_secondary(void *unused)
   atomic_inc(init_mm.mm_count);
   current-active_mm = init_mm;
 
 + /* Bolt in the entry for the kernel stack now */
 + if (cpu_has_feature(CPU_FTR_SLB))
 + slb_flush_and_rebolt();
 +
   smp_store_cpu_info(cpu);
   set_dec(tb_ticks_per_jiffy);
   preempt_disable();
 diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
 index 906daed..bb7765b 100644
 --- a/arch/powerpc/mm/slb.c
 +++ b/arch/powerpc/mm/slb.c
 @@ -309,3 +309,34 @@ void slb_initialize(void)
* one. */
   asm volatile(isync:::memory);
  }
 +
 +static void dump_slb(void)
 +{
 + long entry;
 + unsigned long esid, vsid;
 +
 + printk(KERN_EMERG SLB contents now:\n);
 + for (entry = 0; entry  64; ++entry) {
 + asm volatile(slbmfee  %0,%1 : =r (esid) : r (entry));
 + if (esid == 0)
 + /* valid bit is clear along with everything else */
 + continue;
 + asm volatile(slbmfev  %0,%1 : =r (vsid) : r (entry));
 + printk(KERN_EMERG %d: %.16lx %.16lx\n, entry, esid, vsid);
 + }
 +}
 +
 +void bad_slb_exc(struct pt_regs *regs)
 +{
 + printk(KERN_EMERG SLB: stack not bolted on exception return\n);
 + dump_slb();
 + slb_flush_and_rebolt();
 + show_regs(regs);
 +}
 +
 +void bad_slb_switch(void)
 +{
 + printk(KERN_EMERG SLB: stack not bolted on context switch\n);
 + dump_slb();
 + slb_flush_and_rebolt();
 +}
 diff --git a/arch/powerpc/platforms/pseries/ras.c 
 b/arch/powerpc/platforms/pseries/ras.c
 index a1ab25c..ed68083 100644
 --- a/arch/powerpc/platforms/pseries/ras.c
 +++ b/arch/powerpc/platforms/pseries/ras.c
 @@ -325,6 +325,8 @@ static int recover_mce(struct pt_regs *regs, struct 
 rtas_error_log * err)
 
   if (err-disposition == RTAS_DISP_FULLY_RECOVERED) {
   /* Platform corrected itself */
 + printk(KERN_ALERT FWNMI: platform corrected error %.16lx\n,
 +*(unsigned long *)err);
   nonfatal = 1;
   } else if ((regs-msr  MSR_RI) 
  user_mode(regs) 

Hi Paul,

Thanks, after applying the patch the oops is not reproducible on the machine. 
The console
log had no message starting with SLB: or FWNMI:. I have updated the bugzilla 
also.

Tested-by: Kamalesh Babulal [EMAIL PROTECTED]

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-23 Thread Paul Mackerras
Kamalesh Babulal writes:

 After applying the patch above and the patch posted on
 http://lkml.org/lkml/2008/4/8/42
 the bug had the following information,

Thanks.  The patch below, against Linus' current git tree, fixes one
bug that might be the cause of the problem, and also attempts to
detect the erroneous situation earlier and fix it up, and also print
some debug information.  Please try to reproduce the problem with this
patch applied, and if there are any console log messages starting with
SLB: or FWNMI:, please send me the console log.

Paul.

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index c0db5b7..f7f0962 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -439,6 +439,19 @@ END_FTR_SECTION_IFSET(CPU_FTR_1T_SEGMENT)
mr  r1,r8   /* start using new stack pointer */
std r7,PACAKSAVE(r13)
 
+   /* check that SLB entry 2 contains the right thing */
+   clrrdi  r6,r1,28
+   clrldi. r0,r6,2
+   beq 3f
+   li  r0,2
+   slbmfee r7,r0
+   orisr6,r6,[EMAIL PROTECTED]
+   cmpdr6,r7
+   beq 3f
+   bl  bad_slb_switch
+   ld  r3,PACACURRENT(r13)
+   addir3,r3,THREAD
+3:
ld  r6,_CCR(r1)
mtcrf   0xFF,r6
 
@@ -540,6 +553,19 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
ld  r4,_XER(r1)
mtspr   SPRN_XER,r4
 
+   /* check that SLB entry 2 contains the right thing */
+   clrrdi  r6,r1,28/* stack ESID */
+   clrldi. r0,r6,2
+   beq 57f
+   li  r0,2
+   slbmfee r7,r0
+   orisr6,r6,[EMAIL PROTECTED]
+   cmpdr6,r7
+   beq 57f
+   addir3,r1,STACK_FRAME_OVERHEAD
+   bl  bad_slb_exc
+   ld  r3,_MSR(r1)
+57:
REST_8GPRS(5, r1)
 
andi.   r0,r3,MSR_RI
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index be35ffa..c938134 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -45,6 +45,7 @@
 #include asm/system.h
 #include asm/mpic.h
 #include asm/vdso_datapage.h
+#include asm/mmu.h
 #ifdef CONFIG_PPC64
 #include asm/paca.h
 #endif
@@ -580,6 +581,10 @@ int __devinit start_secondary(void *unused)
atomic_inc(init_mm.mm_count);
current-active_mm = init_mm;
 
+   /* Bolt in the entry for the kernel stack now */
+   if (cpu_has_feature(CPU_FTR_SLB))
+   slb_flush_and_rebolt();
+
smp_store_cpu_info(cpu);
set_dec(tb_ticks_per_jiffy);
preempt_disable();
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 906daed..bb7765b 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -309,3 +309,34 @@ void slb_initialize(void)
 * one. */
asm volatile(isync:::memory);
 }
+
+static void dump_slb(void)
+{
+   long entry;
+   unsigned long esid, vsid;
+
+   printk(KERN_EMERG SLB contents now:\n);
+   for (entry = 0; entry  64; ++entry) {
+   asm volatile(slbmfee  %0,%1 : =r (esid) : r (entry));
+   if (esid == 0)
+   /* valid bit is clear along with everything else */
+   continue;
+   asm volatile(slbmfev  %0,%1 : =r (vsid) : r (entry));
+   printk(KERN_EMERG %d: %.16lx %.16lx\n, entry, esid, vsid);
+   }
+}
+
+void bad_slb_exc(struct pt_regs *regs)
+{
+   printk(KERN_EMERG SLB: stack not bolted on exception return\n);
+   dump_slb();
+   slb_flush_and_rebolt();
+   show_regs(regs);
+}
+
+void bad_slb_switch(void)
+{
+   printk(KERN_EMERG SLB: stack not bolted on context switch\n);
+   dump_slb();
+   slb_flush_and_rebolt();
+}
diff --git a/arch/powerpc/platforms/pseries/ras.c 
b/arch/powerpc/platforms/pseries/ras.c
index a1ab25c..ed68083 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -325,6 +325,8 @@ static int recover_mce(struct pt_regs *regs, struct 
rtas_error_log * err)
 
if (err-disposition == RTAS_DISP_FULLY_RECOVERED) {
/* Platform corrected itself */
+   printk(KERN_ALERT FWNMI: platform corrected error %.16lx\n,
+  *(unsigned long *)err);
nonfatal = 1;
} else if ((regs-msr  MSR_RI) 
   user_mode(regs) 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-14 Thread Paul Mackerras
Kamalesh Babulal writes:

 The SHA1 ID of the kernel is 0e81a8ae37687845f7cdfa2adce14ea6a5f1dd34 
 (2.6.25-rc8) 
 and the source seems to have the patch 
 44387e9ff25267c78a99229aca55ed750e9174c7.
 
 The kernel was patched only the patch you gave me 
 (http://lkml.org/lkml/2008/4/8/42). 

Please try again with both that patch and the one below.  Once again
it won't fix the bug but will give us more information.  When the oops
occurs, the kernel will print a lot of debug information that should
help locate the problem.

Paul.

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index e932b43..f16db50 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -144,6 +144,9 @@ int main(void)
DEFINE(PACA_SLBSHADOWPTR, offsetof(struct paca_struct, slb_shadow_ptr));
DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
+   DEFINE(PACASLBLOG, offsetof(struct paca_struct, slblog));
+   DEFINE(PACASLBLOGIX, offsetof(struct paca_struct, slblog_ix));
+   DEFINE(PACALASTSLB, offsetof(struct paca_struct, last_slb));
 
DEFINE(SLBSHADOW_STACKVSID,
   offsetof(struct slb_shadow, save_area[SLB_NUM_BOLTED - 1].vsid));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 148a354..663df17 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -419,6 +419,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_1T_SEGMENT)
slbmte  r7,r0
isync
 
+   ld  r4,PACASLBLOGIX(r13)
+   addir4,r4,1
+   clrldi  r4,r4,64-6
+   std r4,PACASLBLOGIX(r13)
+   add r4,r4,r13
+   addir4,r4,PACASLBLOG
+   li  r5,4
+   std r5,0(r4)
+   mftbr5
+   std r5,8(r4)
+   std r6,16(r4)
+   std r0,24(r4)
 2:
clrrdi  r7,r8,THREAD_SHIFT  /* base of new stack */
/* Note: this uses SWITCH_FRAME_SIZE rather than INT_FRAME_SIZE
@@ -533,6 +545,17 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
 
stdcx.  r0,0,r1 /* to clear the reservation */
 
+   li  r4,0
+   slbmfee r2,r4
+   std r2,PACALASTSLB(r13)
+   slbmfev r2,r4
+   std r2,PACALASTSLB+8(r13)
+   li  r4,1
+   slbmfee r2,r4
+   std r2,PACALASTSLB+16(r13)
+   slbmfev r2,r4
+   std r2,PACALASTSLB+24(r13)
+
/*
 * Clear RI before restoring r13.  If we are returning to
 * userspace and we take an exception after restoring r13,
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4b5b7ff..c918f33 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1141,6 +1141,40 @@ void SPEFloatingPointException(struct pt_regs *regs)
 }
 #endif
 
+static void dump_unrecov_slb(void)
+{
+#ifdef CONFIG_PPC64
+   long entry, rstart;
+   unsigned long esid, vsid;
+
+   printk(KERN_EMERG SLB contents now:\n);
+   for (entry = 0; entry  64; ++entry) {
+   asm volatile(slbmfee  %0,%1 : =r (esid) : r (entry));
+   if (esid == 0)
+   /* valid bit is clear along with everything else */
+   continue;
+   asm volatile(slbmfev  %0,%1 : =r (vsid) : r (entry));
+   printk(KERN_EMERG %d: %.16lx %.16lx\n, entry, esid, vsid);
+   }
+
+   printk(KERN_EMERG SLB 0-1 at last exception exit:\n);
+   printk(KERN_EMERG 0: %.16lx %.16lx\n, get_paca()-last_slb[0][0],
+  get_paca()-last_slb[0][1]);
+   printk(KERN_EMERG 1: %.16lx %.16lx\n, get_paca()-last_slb[1][0],
+  get_paca()-last_slb[1][1]);
+   printk(KERN_EMERG SLB update log:\n);
+   rstart = entry = get_paca()-slblog_ix;
+   do {
+   printk(KERN_EMERG %d: %lx %lx %.16lx %.16lx\n, entry,
+  get_paca()-slblog[entry][0],
+  get_paca()-slblog[entry][1],
+  get_paca()-slblog[entry][2],
+  get_paca()-slblog[entry][3]);
+   entry = (entry + 1) % 63;
+   } while (entry != rstart);
+#endif
+}
+
 /*
  * We enter here if we get an unrecoverable exception, that is, one
  * that happened at a point where the RI (recoverable interrupt) bit
@@ -1151,6 +1185,8 @@ void unrecoverable_exception(struct pt_regs *regs)
 {
printk(KERN_EMERG Unrecoverable exception %lx at %lx\n,
   regs-trap, regs-nip);
+   if (regs-trap == 0x4100)
+   dump_unrecov_slb();
die(Unrecoverable exception, regs, SIGABRT);
 }
 
diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
index 906daed..235edf7 100644
--- a/arch/powerpc/mm/slb.c
+++ b/arch/powerpc/mm/slb.c
@@ -105,6 +105,7 @@ void slb_flush_and_rebolt(void)
 * appropriately too. */
unsigned long linear_llp, vmalloc_llp, lflags, vflags;
unsigned 

Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-14 Thread Kamalesh Babulal
Paul Mackerras wrote:
 Kamalesh Babulal writes:
 
 The SHA1 ID of the kernel is 0e81a8ae37687845f7cdfa2adce14ea6a5f1dd34 
 (2.6.25-rc8) 
 and the source seems to have the patch 
 44387e9ff25267c78a99229aca55ed750e9174c7.

 The kernel was patched only the patch you gave me 
 (http://lkml.org/lkml/2008/4/8/42). 
 
 Please try again with both that patch and the one below.  Once again
 it won't fix the bug but will give us more information.  When the oops
 occurs, the kernel will print a lot of debug information that should
 help locate the problem.
 
 Paul.
 
 diff --git a/arch/powerpc/kernel/asm-offsets.c 
 b/arch/powerpc/kernel/asm-offsets.c
 index e932b43..f16db50 100644
 --- a/arch/powerpc/kernel/asm-offsets.c
 +++ b/arch/powerpc/kernel/asm-offsets.c
 @@ -144,6 +144,9 @@ int main(void)
   DEFINE(PACA_SLBSHADOWPTR, offsetof(struct paca_struct, slb_shadow_ptr));
   DEFINE(PACA_DATA_OFFSET, offsetof(struct paca_struct, data_offset));
   DEFINE(PACA_TRAP_SAVE, offsetof(struct paca_struct, trap_save));
 + DEFINE(PACASLBLOG, offsetof(struct paca_struct, slblog));
 + DEFINE(PACASLBLOGIX, offsetof(struct paca_struct, slblog_ix));
 + DEFINE(PACALASTSLB, offsetof(struct paca_struct, last_slb));
 
   DEFINE(SLBSHADOW_STACKVSID,
  offsetof(struct slb_shadow, save_area[SLB_NUM_BOLTED - 1].vsid));
 diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
 index 148a354..663df17 100644
 --- a/arch/powerpc/kernel/entry_64.S
 +++ b/arch/powerpc/kernel/entry_64.S
 @@ -419,6 +419,18 @@ END_FTR_SECTION_IFSET(CPU_FTR_1T_SEGMENT)
   slbmte  r7,r0
   isync
 
 + ld  r4,PACASLBLOGIX(r13)
 + addir4,r4,1
 + clrldi  r4,r4,64-6
 + std r4,PACASLBLOGIX(r13)
 + add r4,r4,r13
 + addir4,r4,PACASLBLOG
 + li  r5,4
 + std r5,0(r4)
 + mftbr5
 + std r5,8(r4)
 + std r6,16(r4)
 + std r0,24(r4)
  2:
   clrrdi  r7,r8,THREAD_SHIFT  /* base of new stack */
   /* Note: this uses SWITCH_FRAME_SIZE rather than INT_FRAME_SIZE
 @@ -533,6 +545,17 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
 
   stdcx.  r0,0,r1 /* to clear the reservation */
 
 + li  r4,0
 + slbmfee r2,r4
 + std r2,PACALASTSLB(r13)
 + slbmfev r2,r4
 + std r2,PACALASTSLB+8(r13)
 + li  r4,1
 + slbmfee r2,r4
 + std r2,PACALASTSLB+16(r13)
 + slbmfev r2,r4
 + std r2,PACALASTSLB+24(r13)
 +
   /*
* Clear RI before restoring r13.  If we are returning to
* userspace and we take an exception after restoring r13,
 diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
 index 4b5b7ff..c918f33 100644
 --- a/arch/powerpc/kernel/traps.c
 +++ b/arch/powerpc/kernel/traps.c
 @@ -1141,6 +1141,40 @@ void SPEFloatingPointException(struct pt_regs *regs)
  }
  #endif
 
 +static void dump_unrecov_slb(void)
 +{
 +#ifdef CONFIG_PPC64
 + long entry, rstart;
 + unsigned long esid, vsid;
 +
 + printk(KERN_EMERG SLB contents now:\n);
 + for (entry = 0; entry  64; ++entry) {
 + asm volatile(slbmfee  %0,%1 : =r (esid) : r (entry));
 + if (esid == 0)
 + /* valid bit is clear along with everything else */
 + continue;
 + asm volatile(slbmfev  %0,%1 : =r (vsid) : r (entry));
 + printk(KERN_EMERG %d: %.16lx %.16lx\n, entry, esid, vsid);
 + }
 +
 + printk(KERN_EMERG SLB 0-1 at last exception exit:\n);
 + printk(KERN_EMERG 0: %.16lx %.16lx\n, get_paca()-last_slb[0][0],
 +get_paca()-last_slb[0][1]);
 + printk(KERN_EMERG 1: %.16lx %.16lx\n, get_paca()-last_slb[1][0],
 +get_paca()-last_slb[1][1]);
 + printk(KERN_EMERG SLB update log:\n);
 + rstart = entry = get_paca()-slblog_ix;
 + do {
 + printk(KERN_EMERG %d: %lx %lx %.16lx %.16lx\n, entry,
 +get_paca()-slblog[entry][0],
 +get_paca()-slblog[entry][1],
 +get_paca()-slblog[entry][2],
 +get_paca()-slblog[entry][3]);
 + entry = (entry + 1) % 63;
 + } while (entry != rstart);
 +#endif
 +}
 +
  /*
   * We enter here if we get an unrecoverable exception, that is, one
   * that happened at a point where the RI (recoverable interrupt) bit
 @@ -1151,6 +1185,8 @@ void unrecoverable_exception(struct pt_regs *regs)
  {
   printk(KERN_EMERG Unrecoverable exception %lx at %lx\n,
  regs-trap, regs-nip);
 + if (regs-trap == 0x4100)
 + dump_unrecov_slb();
   die(Unrecoverable exception, regs, SIGABRT);
  }
 
 diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
 index 906daed..235edf7 100644
 --- a/arch/powerpc/mm/slb.c
 +++ b/arch/powerpc/mm/slb.c
 @@ -105,6 +105,7 @@ void slb_flush_and_rebolt(void)
* appropriately too. */
   unsigned long linear_llp, vmalloc_llp, lflags, vflags;
   unsigned 

Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-08 Thread Paul Mackerras
Kamalesh Babulal writes:

 The Kernel oopses is seen while running the kernbench followed by tbench with 
 2.6.25-rc2-git4 
 kernel on powerpc, this oops was reported for the 2.6.24-rc8-mm1 kernel 
 (http://lkml.org/lkml/2008/1/18/71)
 and is visible with almost all of the main line ,rc(s) and their git(s) 
 release from then.
 
 This oops is visible in the linux-next-20080220 kernel also.The machine is 
 power4+ box with four cpus and 
 has 30 GB RAM.

Please try to replicate the oops with the patch below applied.  It
doesn't solve the cause of the oops but it should mean the kernel
prints out more useful information about the cause of the oops.

I assume you can replicate the oops easily on this machine - is that
right?

Paul.

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 11b4f6d..a3ac72a 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -621,7 +621,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
mtlrr10
 
andi.   r10,r12,MSR_RI  /* check for unrecoverable exception */
-   beq-unrecov_slb
+   beq-2f
 
 .machine   push
 .machine   power4
@@ -643,6 +643,22 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
rfid
b   .   /* prevent speculative execution */
 
+2:
+#ifdef CONFIG_PPC_ISERIES
+BEGIN_FW_FTR_SECTION
+   b   unrecov_slb
+END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
+#endif /* CONFIG_PPC_ISERIES */
+   mfspr   r11,SPRN_SRR0
+   clrrdi  r10,r13,32
+   LOAD_HANDLER(r10,unrecov_slb)
+   mtspr   SPRN_SRR0,r10
+   mfmsr   r10
+   ori r10,r10,MSR_IR|MSR_DR|MSR_RI
+   mtspr   SPRN_SRR1,r10
+   rfid
+   b   .
+
 unrecov_slb:
EXCEPTION_PROLOG_COMMON(0x4100, PACA_EXSLB)
DISABLE_INTS
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-08 Thread Kamalesh Babulal
Paul Mackerras wrote:
 Kamalesh Babulal writes:
 
 The Kernel oopses is seen while running the kernbench followed by tbench 
 with 2.6.25-rc2-git4 
 kernel on powerpc, this oops was reported for the 2.6.24-rc8-mm1 kernel 
 (http://lkml.org/lkml/2008/1/18/71)
 and is visible with almost all of the main line ,rc(s) and their git(s) 
 release from then.

 This oops is visible in the linux-next-20080220 kernel also.The machine is 
 power4+ box with four cpus and 
 has 30 GB RAM.
 
 Please try to replicate the oops with the patch below applied.  It
 doesn't solve the cause of the oops but it should mean the kernel
 prints out more useful information about the cause of the oops.
 
 I assume you can replicate the oops easily on this machine - is that
 right?
 
 Paul.
 
 diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
 index 11b4f6d..a3ac72a 100644
 --- a/arch/powerpc/kernel/head_64.S
 +++ b/arch/powerpc/kernel/head_64.S
 @@ -621,7 +621,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
   mtlrr10
 
   andi.   r10,r12,MSR_RI  /* check for unrecoverable exception */
 - beq-unrecov_slb
 + beq-2f
 
  .machine push
  .machine power4
 @@ -643,6 +643,22 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
   rfid
   b   .   /* prevent speculative execution */
 
 +2:
 +#ifdef CONFIG_PPC_ISERIES
 +BEGIN_FW_FTR_SECTION
 + b   unrecov_slb
 +END_FW_FTR_SECTION_IFSET(FW_FEATURE_ISERIES)
 +#endif /* CONFIG_PPC_ISERIES */
 + mfspr   r11,SPRN_SRR0
 + clrrdi  r10,r13,32
 + LOAD_HANDLER(r10,unrecov_slb)
 + mtspr   SPRN_SRR0,r10
 + mfmsr   r10
 + ori r10,r10,MSR_IR|MSR_DR|MSR_RI
 + mtspr   SPRN_SRR1,r10
 + rfid
 + b   .
 +
  unrecov_slb:
   EXCEPTION_PROLOG_COMMON(0x4100, PACA_EXSLB)
   DISABLE_INTS
Hi Paul,

The kernel oops after applying the patch. Some time it takes more than
one run to reproduce it, it was reproducible in the second run this
time.

 Unrecoverable exception 4100 at c0008c8c
Oops: Unrecoverable exception, sig: 6 [#1]
SMP NR_CPUS=128 NUMA pSeries
Modules linked in:
NIP: c0008c8c LR: 0ff0135c CTR: 0ff012f0
REGS: c00772343bb0 TRAP: 4100   Not tainted  (2.6.25-rc8-autotest)
MSR: 80001030 ME,IR,DR  CR: 44044228  XER: 
TASK = c0077cfa0900[13437] 'cc1' THREAD: c0077234 CPU: 2
GPR00: 4000 c00772343e30 00bb d032 
GPR04: 00bb 0400 000a 0002 
GPR08:     
GPR12:  c0734000 0064 ffe6df08 
GPR16: 105b 105b 1044 105b 
GPR20: ffe6e008 105b 105b 000a 
GPR24: 0ffec408 0001 ffe6ddca 0400 
GPR28: 0ffec408 f7ff8000 0ffebff4 0400 
NIP [c0008c8c] restore+0x8c/0xc0
LR [0ff0135c] 0xff0135c
Call Trace:
[c00772343e30] [c0008cd4] do_work+0x14/0x2c (unreliable)
Instruction dump:
7c840078 7c810164 70604000 41820028 6000 7c4c42e6 e88d01f0 f84d01f0 
7c841050 e84d01e8 7c422214 f84d01e8 e9a100d8 7c7b03a6 e84101a0 7c4ff120 

(gdb) l *0xc0008cdc
0xc0008cdc is at arch/powerpc/kernel/entry_64.S:608.
603 mtmsrd  r10,1
604
605 andi.   r0,r4,_TIF_NEED_RESCHED
606 beq 1f
607 bl  .schedule
608 b   .ret_from_except_lite
609
610 1:  bl  .save_nvgprs
611 li  r3,0
612 addir4,r1,STACK_FRAME_OVERHEAD

please let me know if you need more information.
-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-08 Thread Paul Mackerras
Kamalesh Babulal writes:

 The kernel oops after applying the patch. Some time it takes more than
 one run to reproduce it, it was reproducible in the second run this
 time.
 
  Unrecoverable exception 4100 at c0008c8c
 Oops: Unrecoverable exception, sig: 6 [#1]
 SMP NR_CPUS=128 NUMA pSeries
 Modules linked in:
 NIP: c0008c8c LR: 0ff0135c CTR: 0ff012f0
 REGS: c00772343bb0 TRAP: 4100   Not tainted  (2.6.25-rc8-autotest)
 MSR: 80001030 ME,IR,DR  CR: 44044228  XER: 
 TASK = c0077cfa0900[13437] 'cc1' THREAD: c0077234 CPU: 2
 GPR00: 4000 c00772343e30 00bb d032 
 GPR04: 00bb 0400 000a 0002 
 GPR08:     
 GPR12:  c0734000 0064 ffe6df08 
 GPR16: 105b 105b 1044 105b 
 GPR20: ffe6e008 105b 105b 000a 
 GPR24: 0ffec408 0001 ffe6ddca 0400 
 GPR28: 0ffec408 f7ff8000 0ffebff4 0400 
 NIP [c0008c8c] restore+0x8c/0xc0
 LR [0ff0135c] 0xff0135c
 Call Trace:
 [c00772343e30] [c0008cd4] do_work+0x14/0x2c (unreliable)
 Instruction dump:
 7c840078 7c810164 70604000 41820028 6000 7c4c42e6 e88d01f0 f84d01f0 
 7c841050 e84d01e8 7c422214 f84d01e8 e9a100d8 7c7b03a6 e84101a0 7c4ff120 
 
 (gdb) l *0xc0008cdc
 0xc0008cdc is at arch/powerpc/kernel/entry_64.S:608.
 603 mtmsrd  r10,1
 604
 605 andi.   r0,r4,_TIF_NEED_RESCHED
 606 beq 1f
 607 bl  .schedule
 608 b   .ret_from_except_lite
 609
 610 1:  bl  .save_nvgprs
 611 li  r3,0
 612 addir4,r1,STACK_FRAME_OVERHEAD

The exception happened at c...8c8c but you looked at c...8cdc with
gdb.  What's at c...8c8c?

 please let me know if you need more information.

The .config would be useful, but don't spam everyone on cc with it,
just send it to me privately.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-08 Thread Kamalesh Babulal
Paul Mackerras wrote:
 Kamalesh Babulal writes:
 
 The kernel oops after applying the patch. Some time it takes more than
 one run to reproduce it, it was reproducible in the second run this
 time.

  Unrecoverable exception 4100 at c0008c8c
 Oops: Unrecoverable exception, sig: 6 [#1]
 SMP NR_CPUS=128 NUMA pSeries
 Modules linked in:
 NIP: c0008c8c LR: 0ff0135c CTR: 0ff012f0
 REGS: c00772343bb0 TRAP: 4100   Not tainted  (2.6.25-rc8-autotest)
 MSR: 80001030 ME,IR,DR  CR: 44044228  XER: 
 TASK = c0077cfa0900[13437] 'cc1' THREAD: c0077234 CPU: 2
 GPR00: 4000 c00772343e30 00bb d032 
 GPR04: 00bb 0400 000a 0002 
 GPR08:     
 GPR12:  c0734000 0064 ffe6df08 
 GPR16: 105b 105b 1044 105b 
 GPR20: ffe6e008 105b 105b 000a 
 GPR24: 0ffec408 0001 ffe6ddca 0400 
 GPR28: 0ffec408 f7ff8000 0ffebff4 0400 
 NIP [c0008c8c] restore+0x8c/0xc0
 LR [0ff0135c] 0xff0135c
 Call Trace:
 [c00772343e30] [c0008cd4] do_work+0x14/0x2c (unreliable)
 Instruction dump:
 7c840078 7c810164 70604000 41820028 6000 7c4c42e6 e88d01f0 f84d01f0 
 7c841050 e84d01e8 7c422214 f84d01e8 e9a100d8 7c7b03a6 e84101a0 7c4ff120 

snip
 The exception happened at c...8c8c but you looked at c...8cdc with
 gdb.  What's at c...8c8c?
 
 please let me know if you need more information.
 
 The .config would be useful, but don't spam everyone on cc with it,
 just send it to me privately.
 
 Paul.

Hi Paul,

Similar call trace was seen in 2.6.24-rc3-git2 kernel while bootup, I have 
attached the
boot log to bugzilla 
(http://bugzilla.kernel.org/attachment.cgi?id=15666action=view).
When looking for the last good one, we found that the kernel oops seems to be 
reproducible 
from the 2.6.24-rc8-git3 kernel onwards.

Thanks to nishanth for pointing it out, Please let me know if you need more 
information.

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.25-rc2-git4 - Regression Kernel oops while running kernbench and tbench on powerpc

2008-04-08 Thread Kamalesh Babulal
Paul Mackerras wrote:
 Kamalesh Babulal writes:
 
 The kernel oops after applying the patch. Some time it takes more than
 one run to reproduce it, it was reproducible in the second run this
 time.

  Unrecoverable exception 4100 at c0008c8c
 Oops: Unrecoverable exception, sig: 6 [#1]
 SMP NR_CPUS=128 NUMA pSeries
 Modules linked in:
 NIP: c0008c8c LR: 0ff0135c CTR: 0ff012f0
 REGS: c00772343bb0 TRAP: 4100   Not tainted  (2.6.25-rc8-autotest)
 MSR: 80001030 ME,IR,DR  CR: 44044228  XER: 
 TASK = c0077cfa0900[13437] 'cc1' THREAD: c0077234 CPU: 2
 GPR00: 4000 c00772343e30 00bb d032 
 GPR04: 00bb 0400 000a 0002 
 GPR08:     
 GPR12:  c0734000 0064 ffe6df08 
 GPR16: 105b 105b 1044 105b 
 GPR20: ffe6e008 105b 105b 000a 
 GPR24: 0ffec408 0001 ffe6ddca 0400 
 GPR28: 0ffec408 f7ff8000 0ffebff4 0400 
 NIP [c0008c8c] restore+0x8c/0xc0
 LR [0ff0135c] 0xff0135c
 Call Trace:
 [c00772343e30] [c0008cd4] do_work+0x14/0x2c (unreliable)
 Instruction dump:
 7c840078 7c810164 70604000 41820028 6000 7c4c42e6 e88d01f0 f84d01f0 
 7c841050 e84d01e8 7c422214 f84d01e8 e9a100d8 7c7b03a6 e84101a0 7c4ff120 
 
 That looks like the bug that was supposed to be fixed by commit
 44387e9ff25267c78a99229aca55ed750e9174c7, which is in 2.6.25-rc7 and
 later.  
 
 What was the SHA1 ID of the head commit for the kernel source that
 gave you this oops?  Did you have any other patches besides the one I
 sent you applied?
 
 Paul.

The SHA1 ID of the kernel is 0e81a8ae37687845f7cdfa2adce14ea6a5f1dd34 
(2.6.25-rc8) 
and the source seems to have the patch 44387e9ff25267c78a99229aca55ed750e9174c7.

The kernel was patched only the patch you gave me 
(http://lkml.org/lkml/2008/4/8/42). 

-- 
Thanks  Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev