Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-26 Thread Geert Uytterhoeven
On Thu, 26 Feb 2009, Mark Nelson wrote:
 On Thu, 26 Feb 2009 09:45:41 am Mark Nelson wrote:
  On Thu, 26 Feb 2009 12:31:20 am Geert Uytterhoeven wrote:
   On Wed, 25 Feb 2009, Mark Nelson wrote:
Does the following patch fix the errors you're seeing? (it applies the
same fix as the previous patch but this time to copy_tofrom_user, which
I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)
   
   Thanks, but I still get crashes in copy_page_range().
  
  Hmmm... I'm out of ideas for the moment, but thanks for testing anyway!
 
 If you revert both 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 and
 a4e22f02f5b6518c1484faea1f88d81802b9feac, does it help? You could also
 try to revert 57dda6ef5bd5b9e60410477ad29e654097e2cca1 just in case I
 need to keep wearing the brown paper bag for a bit longer :)

Still doesn't help.

However, I noticed I never enabled CONFIG_DEBUG_PAGEALLOC before 2.6.29-rc5.
So far I tried 2.6.2[5-8], and they all crash with CONFIG_DEBUG_PAGEALLOC.
I guess it never actually worked on PS3.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:+32 (0)2 700 8453
Fax:  +32 (0)2 700 8622
E-mail:   geert.uytterhoe...@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-25 Thread Geert Uytterhoeven
On Wed, 25 Feb 2009, Mark Nelson wrote:
 On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
  Jan Kara wrote:
 Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
   somehow got beyond end of the page referenced by bh-b_data. So it means
   that le16_to_cpu(entry-e_value_offs) + size  page_size. But
   ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
   particular checks whether e_value_offs + e_value_size isn't greater than
   bh-b_size. So I see no way how memcpy can get beyond end of the page.
 Sachin, is the problem reproducible? If yes, can you send us contents
 
  Yes, i am able to recreate this problem easily. As i had mentioned if the
  earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
  i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 
  boots
  without any problem.
 
 Hi Sanchin and Geert,
 
 Does the patch below fix the problems you're seeing? If it does I'll send
 a properly written up and formatted patch to linuxppc-dev (as well as
 another one to fix the same problem in copy_tofrom_user()).

Unfortunately not, now it crashes while accessing the memory pointed to by
GPR16, in

NIP: copy_page_range+x0608/0x628
LR:  dup_mm+0x2e4/0x428
Trace: debug_table+0xcc70/0x1afe0 (unreliable)
dup_mm+0x2e4/0x428
copy_process+0x86c/0xf9c
do_fork+0x188/0x39c
sys_clone+0x58/0x70
ppc_clone+0x8/0xc

However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
similar problems as above (crash in copy_page_range()).
Which makes me think that
  1. Your new patch fixes the problem introduced by 25d6e2d7,
  2. There's still another issue than the one introduced by 25d6e2d7.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:+32 (0)2 700 8453
Fax:  +32 (0)2 700 8622
E-mail:   geert.uytterhoe...@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-25 Thread Geert Uytterhoeven
On Wed, 25 Feb 2009, Mark Nelson wrote:
 On Wed, 25 Feb 2009 05:01:59 am Geert Uytterhoeven wrote:
  On Mon, 23 Feb 2009, Paul Mackerras wrote:
   Andrew Morton writes:
It looks like we died in ext3_xattr_block_get():

memcpy(buffer, bh-b_data + 
le16_to_cpu(entry-e_value_offs),
   size);

Perhaps entry-e_value_offs is no good.  I wonder if the filesystem is
corrupted and this snuck through the defenses.

I also wonder if there is enough info in that trace for a ppc person to
be able to determine whether the faulting address is in the source or
destination of the memcpy() (please)?
   
   It appears to have faulted on a load, implicating the source.  The
   address being referenced (0xc0003f38) doesn't look
   outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
   on, and what page size is selected?
  
  I'm seeing a similar thing on PS3, but not in ext3. During early userspace
  setup (udevd), it crashes accessing a 0xc00* address in:
  
  | NIP setup+0x20/0x130
  | LR copy_user_page+0x18/0x6c
  | Call trace:
  | do_wp_page+0x5b4/0x89c
  | do_page_fault+0x3a8/0x58c
  | handle_page_fault+0x20/0x5c
  
  I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.
  
  If needed, I can probably bisect this tomorrow. It definitely didn't happen 
  in
  2.6.29-rc5.
 
 No need to bisect - it was 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, my
 commit that optimised 64bit memcpy() for Power6 and Cell.
 
 The bug was in -rc1, but if your copies were 8-byte aligned with respect
 to the source the problem wouldn't have been seen... Could this have
 been why you didn't see it in -rc5?

Hmm... I just started seeing it on older kernels (-rc5+), too...

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:+32 (0)2 700 8453
Fax:  +32 (0)2 700 8622
E-mail:   geert.uytterhoe...@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-25 Thread Sachin P. Sant

Mark Nelson wrote:

Hi Sanchin and Geert,

Does the patch below fix the problems you're seeing? If it does I'll send
a properly written up and formatted patch to linuxppc-dev (as well as
another one to fix the same problem in copy_tofrom_user()).
  

This patch fixes the issue at my side. I tried booting the system few times
and every single time it came up clean.

Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-25 Thread Mark Nelson
On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
 On Wed, 25 Feb 2009, Mark Nelson wrote:
  On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
   Jan Kara wrote:
  Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
somehow got beyond end of the page referenced by bh-b_data. So it means
that le16_to_cpu(entry-e_value_offs) + size  page_size. But
ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
particular checks whether e_value_offs + e_value_size isn't greater than
bh-b_size. So I see no way how memcpy can get beyond end of the page.
  Sachin, is the problem reproducible? If yes, can you send us contents
  
   Yes, i am able to recreate this problem easily. As i had mentioned if the
   earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is 
   booted
   i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 
   boots
   without any problem.
  
  Hi Sanchin and Geert,
  
  Does the patch below fix the problems you're seeing? If it does I'll send
  a properly written up and formatted patch to linuxppc-dev (as well as
  another one to fix the same problem in copy_tofrom_user()).
 
 Unfortunately not, now it crashes while accessing the memory pointed to by
 GPR16, in
 
 NIP: copy_page_range+x0608/0x628
 LR:  dup_mm+0x2e4/0x428
 Trace: debug_table+0xcc70/0x1afe0 (unreliable)
 dup_mm+0x2e4/0x428
 copy_process+0x86c/0xf9c
 do_fork+0x188/0x39c
 sys_clone+0x58/0x70
 ppc_clone+0x8/0xc
 
 However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still see
 similar problems as above (crash in copy_page_range()).
 Which makes me think that
   1. Your new patch fixes the problem introduced by 25d6e2d7,
   2. There's still another issue than the one introduced by 25d6e2d7.

Does the following patch fix the errors you're seeing? (it applies the
same fix as the previous patch but this time to copy_tofrom_user, which
I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)

Thanks!

Mark

---
 arch/powerpc/lib/copyuser_64.S |   38 +++---
 1 file changed, 31 insertions(+), 7 deletions(-)

Index: upstream/arch/powerpc/lib/copyuser_64.S
===
--- upstream.orig/arch/powerpc/lib/copyuser_64.S
+++ upstream/arch/powerpc/lib/copyuser_64.S
@@ -62,18 +62,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 72:std r8,8(r3)
beq+3f
addir3,r3,16
-23:ld  r9,8(r4)
 .Ldo_tail:
bf  cr7*4+1,1f
-   rotldi  r9,r9,32
+23:lwz r9,8(r4)
+   addir4,r4,4
 73:stw r9,0(r3)
addir3,r3,4
 1: bf  cr7*4+2,2f
-   rotldi  r9,r9,16
+44:lhz r9,8(r4)
+   addir4,r4,2
 74:sth r9,0(r3)
addir3,r3,2
 2: bf  cr7*4+3,3f
-   rotldi  r9,r9,8
+45:lbz r9,8(r4)
 75:stb r9,0(r3)
 3: li  r3,0
blr
@@ -141,11 +142,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 6: cmpwi   cr1,r5,8
addir3,r3,32
sld r9,r9,r10
-   ble cr1,.Ldo_tail
+   ble cr1,7f
 34:ld  r0,8(r4)
srd r7,r0,r11
or  r9,r7,r9
-   b   .Ldo_tail
+7:
+   bf  cr7*4+1,1f
+   rotldi  r9,r9,32
+94:stw r9,0(r3)
+   addir3,r3,4
+1: bf  cr7*4+2,2f
+   rotldi  r9,r9,16
+95:sth r9,0(r3)
+   addir3,r3,2
+2: bf  cr7*4+3,3f
+   rotldi  r9,r9,8
+96:stb r9,0(r3)
+3: li  r3,0
+   blr
 
 .Ldst_unaligned:
PPC_MTOCRF  0x01,r6 /* put #bytes to 8B bdry into cr7 */
@@ -218,7 +232,6 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 121:
 132:
addir3,r3,8
-123:
 134:
 135:
 138:
@@ -226,6 +239,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 140:
 141:
 142:
+123:
+144:
+145:
 
 /*
  * here we have had a fault on a load and r3 points to the first
@@ -309,6 +325,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 187:
 188:
 189:   
+194:
+195:
+196:
 1:
ld  r6,-24(r1)
ld  r5,-8(r1)
@@ -329,7 +348,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
.llong  72b,172b
.llong  23b,123b
.llong  73b,173b
+   .llong  44b,144b
.llong  74b,174b
+   .llong  45b,145b
.llong  75b,175b
.llong  24b,124b
.llong  25b,125b
@@ -347,6 +368,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
.llong  79b,179b
.llong  80b,180b
.llong  34b,134b
+   .llong  94b,194b
+   .llong  95b,195b
+   .llong  96b,196b
.llong  35b,135b
.llong  81b,181b
.llong  36b,136b
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-25 Thread Mark Nelson
On Wed, 25 Feb 2009 10:08:22 pm Sachin P. Sant wrote:
 Mark Nelson wrote:
  Hi Sanchin and Geert,
 
  Does the patch below fix the problems you're seeing? If it does I'll send
  a properly written up and formatted patch to linuxppc-dev (as well as
  another one to fix the same problem in copy_tofrom_user()).

 This patch fixes the issue at my side. I tried booting the system few times
 and every single time it came up clean.

Good to hear. Thanks for testing Sanchin!

Mark
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-25 Thread Geert Uytterhoeven
On Wed, 25 Feb 2009, Mark Nelson wrote:
 On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
  On Wed, 25 Feb 2009, Mark Nelson wrote:
   On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
Jan Kara wrote:
   Hmm, OK. But then I'm not sure how that can happen. Obviously, 
 memcpy
 somehow got beyond end of the page referenced by bh-b_data. So it 
 means
 that le16_to_cpu(entry-e_value_offs) + size  page_size. But
 ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
 particular checks whether e_value_offs + e_value_size isn't greater 
 than
 bh-b_size. So I see no way how memcpy can get beyond end of the page.
   Sachin, is the problem reproducible? If yes, can you send us 
 contents
   
Yes, i am able to recreate this problem easily. As i had mentioned if 
the
earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is 
booted
i get this crash. But if i specify selinux=0 at command line, 
2.6.29-rc6 boots
without any problem.
   
   Hi Sanchin and Geert,
   
   Does the patch below fix the problems you're seeing? If it does I'll send
   a properly written up and formatted patch to linuxppc-dev (as well as
   another one to fix the same problem in copy_tofrom_user()).
  
  Unfortunately not, now it crashes while accessing the memory pointed to by
  GPR16, in
  
  NIP: copy_page_range+x0608/0x628
  LR:  dup_mm+0x2e4/0x428
  Trace: debug_table+0xcc70/0x1afe0 (unreliable)
  dup_mm+0x2e4/0x428
  copy_process+0x86c/0xf9c
  do_fork+0x188/0x39c
  sys_clone+0x58/0x70
  ppc_clone+0x8/0xc
  
  However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I still 
  see
  similar problems as above (crash in copy_page_range()).
  Which makes me think that
1. Your new patch fixes the problem introduced by 25d6e2d7,
2. There's still another issue than the one introduced by 25d6e2d7.
 
 Does the following patch fix the errors you're seeing? (it applies the
 same fix as the previous patch but this time to copy_tofrom_user, which
 I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)

Thanks, but I still get crashes in copy_page_range().

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:+32 (0)2 700 8453
Fax:  +32 (0)2 700 8622
E-mail:   geert.uytterhoe...@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-25 Thread Mark Nelson
On Thu, 26 Feb 2009 12:31:20 am Geert Uytterhoeven wrote:
 On Wed, 25 Feb 2009, Mark Nelson wrote:
  On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
   On Wed, 25 Feb 2009, Mark Nelson wrote:
On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
 Jan Kara wrote:
Hmm, OK. But then I'm not sure how that can happen. Obviously, 
  memcpy
  somehow got beyond end of the page referenced by bh-b_data. So it 
  means
  that le16_to_cpu(entry-e_value_offs) + size  page_size. But
  ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
  particular checks whether e_value_offs + e_value_size isn't greater 
  than
  bh-b_size. So I see no way how memcpy can get beyond end of the 
  page.
Sachin, is the problem reproducible? If yes, can you send us 
  contents

 Yes, i am able to recreate this problem easily. As i had mentioned if 
 the
 earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is 
 booted
 i get this crash. But if i specify selinux=0 at command line, 
 2.6.29-rc6 boots
 without any problem.

Hi Sanchin and Geert,

Does the patch below fix the problems you're seeing? If it does I'll 
send
a properly written up and formatted patch to linuxppc-dev (as well as
another one to fix the same problem in copy_tofrom_user()).
   
   Unfortunately not, now it crashes while accessing the memory pointed to by
   GPR16, in
   
   NIP: copy_page_range+x0608/0x628
   LR:  dup_mm+0x2e4/0x428
   Trace: debug_table+0xcc70/0x1afe0 (unreliable)
   dup_mm+0x2e4/0x428
   copy_process+0x86c/0xf9c
   do_fork+0x188/0x39c
   sys_clone+0x58/0x70
   ppc_clone+0x8/0xc
   
   However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I 
   still see
   similar problems as above (crash in copy_page_range()).
   Which makes me think that
 1. Your new patch fixes the problem introduced by 25d6e2d7,
 2. There's still another issue than the one introduced by 25d6e2d7.
  
  Does the following patch fix the errors you're seeing? (it applies the
  same fix as the previous patch but this time to copy_tofrom_user, which
  I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)
 
 Thanks, but I still get crashes in copy_page_range().
 

Hmmm... I'm out of ideas for the moment, but thanks for testing anyway!

Mark
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-25 Thread Mark Nelson
On Thu, 26 Feb 2009 09:45:41 am Mark Nelson wrote:
 On Thu, 26 Feb 2009 12:31:20 am Geert Uytterhoeven wrote:
  On Wed, 25 Feb 2009, Mark Nelson wrote:
   On Wed, 25 Feb 2009 08:50:46 pm Geert Uytterhoeven wrote:
On Wed, 25 Feb 2009, Mark Nelson wrote:
 On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
  Jan Kara wrote:
 Hmm, OK. But then I'm not sure how that can happen. Obviously, 
   memcpy
   somehow got beyond end of the page referenced by bh-b_data. So 
   it means
   that le16_to_cpu(entry-e_value_offs) + size  page_size. But
   ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
   particular checks whether e_value_offs + e_value_size isn't 
   greater than
   bh-b_size. So I see no way how memcpy can get beyond end of the 
   page.
 Sachin, is the problem reproducible? If yes, can you send us 
   contents
 
  Yes, i am able to recreate this problem easily. As i had mentioned 
  if the
  earlier kernel is booted with selinux enabled and then 2.6.29-rc6 
  is booted
  i get this crash. But if i specify selinux=0 at command line, 
  2.6.29-rc6 boots
  without any problem.
 
 Hi Sanchin and Geert,
 
 Does the patch below fix the problems you're seeing? If it does I'll 
 send
 a properly written up and formatted patch to linuxppc-dev (as well as
 another one to fix the same problem in copy_tofrom_user()).

Unfortunately not, now it crashes while accessing the memory pointed to 
by
GPR16, in

NIP: copy_page_range+x0608/0x628
LR:  dup_mm+0x2e4/0x428
Trace: debug_table+0xcc70/0x1afe0 (unreliable)
dup_mm+0x2e4/0x428
copy_process+0x86c/0xf9c
do_fork+0x188/0x39c
sys_clone+0x58/0x70
ppc_clone+0x8/0xc

However, after reverting 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, I 
still see
similar problems as above (crash in copy_page_range()).
Which makes me think that
  1. Your new patch fixes the problem introduced by 25d6e2d7,
  2. There's still another issue than the one introduced by 25d6e2d7.
   
   Does the following patch fix the errors you're seeing? (it applies the
   same fix as the previous patch but this time to copy_tofrom_user, which
   I updated in a4e22f02f5b6518c1484faea1f88d81802b9feac)
  
  Thanks, but I still get crashes in copy_page_range().
  
 
 Hmmm... I'm out of ideas for the moment, but thanks for testing anyway!
 
 Mark
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@ozlabs.org
 https://ozlabs.org/mailman/listinfo/linuxppc-dev
 

If you revert both 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556 and
a4e22f02f5b6518c1484faea1f88d81802b9feac, does it help? You could also
try to revert 57dda6ef5bd5b9e60410477ad29e654097e2cca1 just in case I
need to keep wearing the brown paper bag for a bit longer :)

Thanks!

Mark
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-24 Thread Jan Kara
  Hello,

On Tue 24-02-09 12:08:37, Sachin P. Sant wrote:
 Jan Kara wrote:
   Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
 somehow got beyond end of the page referenced by bh-b_data. So it means
 that le16_to_cpu(entry-e_value_offs) + size  page_size. But
 ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
 particular checks whether e_value_offs + e_value_size isn't greater than
 bh-b_size. So I see no way how memcpy can get beyond end of the page.
   Sachin, is the problem reproducible? If yes, can you send us contents
   
 Yes, i am able to recreate this problem easily. As i had mentioned if the
 earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
 i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
 without any problem.

 of the page just before the faulting address (i.e., for current fault it
 would be 0xc0003f37-0xc0003f37). As far as I can
 remember powerpc monitor could dump it.
   
 Here is the page dump. This time it crashed while accessing address
 0xc0002d67.
  Thanks for the dump.

 Unable to handle kernel paging request for data at address 0xc
 0002d67
 Faulting instruction address: 0xc0039574
 cpu 0x1: Vector: 300 (Data Access) at [c0004288b0b0]
pc: c0039574: .memcpy+0x74/0x244
lr: c01b497c: .ext3_xattr_get+0x288/0x2f4
sp: c0004288b330
   msr: 80009032

 1:mon d 0xc0002d66
 ... SNIP ...

 c0002d66efd0    ||
 c0002d66efe0    ||
 c0002d66eff0    ||
 c0002d66f000 02ea0004 0100e200d20a  ||
 c0002d66f010    ||
 c0002d66f020 0706e40f 1b00e200d20a  ||
 c0002d66f030 73656c696e757800   |selinux.|
 c0002d66f040    ||
 c0002d66f050    ||
 c0002d66f060    ||

 ... SNIP ...

 c0002d66ff60    ||
 c0002d66ff70    ||
 c0002d66ff80    ||
 c0002d66ff90    ||
 c0002d66ffa0    ||
 c0002d66ffb0    ||
 c0002d66ffc0    ||
 c0002d66ffd0    ||
 c0002d66ffe0 73797374 656d5f753a6f626a  |system_u:obj|
 c0002d66fff0 6563745f723a7573 725f743a7330  |ect_r:usr_t:s0..|
 c0002d67    ||
 1:mon r
 R00 = e40f   R16 = 005d
 R01 = c0004288b330   R17 = 
 R02 = c09f59b8   R18 = fffbfe9e
 R03 = c00044aa34a0   R19 = 10042638
 R04 = c0002d66fff4   R20 = 10041610
 R05 = 0003   R21 = 00ff
 R06 =    R22 = 0006
 R07 = 0001   R23 = c07d27c1
 R08 = 723a7573725f743a   R24 = c0002c0cd758
 R09 = 3a6f626a6563745f   R25 = c00044aa3488
 R10 = c017b43c   R26 = c0002c0cd6f0
 R11 = c0002d66f020   R27 = c0002c0cd860
 R12 = d23c14b0   R28 = c0002c0b0840
 R13 = c0a93680   R29 = 001b
 R14 = 41ed   R30 = c09880b0
 R15 = 1004   R31 = ffde
 pc  = c0039574 .memcpy+0x74/0x244
 lr  = c01b497c .ext3_xattr_get+0x288/0x2f4
 msr = 80009032   cr  = 4400044b
 ctr =    xer = 2001   trap =  300
 dar = c0002d67   dsisr = 4000
 1:mon zr

   BTW, I suppose you use 4KB blocksize on the filesystem, right?
   
 Yes.

 dumpe2fs /dev/sda3 | grep -i block size dumpe2fs 1.39 (29-May-2006)
 Block size:   4096
  OK. The xattr block causing oops is completely correct. To me it seems
more like some problem in powerpc memcpy() (I saw there went some changes
into in in the end of December) - we call it to copy 27 bytes from
address 0xc0002d66ffe4 (which is one byte before end of the page).
Could some of the powerpc guys have a look whether this could be the case?
I'm not quite fluent in the powerpc assembly so it would take me ages ;).

Honza
-- 
Jan Kara j...@suse.cz
SUSE Labs, CR
___
Linuxppc-dev mailing list

Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-24 Thread Jan Kara
 Andrew Morton wrote:
 hm, I wonder what could have caused that - we haven't altered
 fs/ext3/xattr.c in ages.
 
 What is the most recent kernel version you know of which didn't do
 this?  Bear in mind that this crash might be triggered by the
 current contents of the filesystem, so if possible, please test
 some other kernel versions on that disk.
   
 I am trying to boot a vanilla kernel on this machine for the first
 time. Haven't tried any other kernels. Will give it a try.
 
 It looks like we died in ext3_xattr_block_get():
 
  memcpy(buffer, bh-b_data + le16_to_cpu(entry-e_value_offs),
 size);
 
 Perhaps entry-e_value_offs is no good.  I wonder if the filesystem is
 corrupted and this snuck through the defenses.
 
 I also wonder if there is enough info in that trace for a ppc person to
 be able to determine whether the faulting address is in the source or
 destination of the memcpy() (please)?
   
 Some more information if this could be of any help.
 
 0:mon di 0xc0039574
 c0039574  e9240008  ld  r9,8(r4)
 c0039578  409d0010  ble cr7,c0039588# 
 .memcpy+0x88/0x244
 c003957c  79290002  rotldi  r9,r9,32
 c0039580  9123  stw r9,0(r3)
 c0039584  38630004  addir3,r3,4
 c0039588  409e0010  bne cr7,c0039598# 
 .memcpy+0x98/0x244
 c003958c  79298000  rotldi  r9,r9,16
 c0039590  b123  sth r9,0(r3)
 c0039594  38630002  addir3,r3,2
 c0039598  409f000c  bns cr7,c00395a4# 
 .memcpy+0xa4/0x244
 c003959c  79294000  rotldi  r9,r9,8
 c00395a0  9923  stb r9,0(r3)
 c00395a4  e8610030  ld  r3,48(r1)
 c00395a8  4e800020  blr
 c00395ac  78a6e8c2  rldicl  r6,r5,61,3
 c00395b0  38a5fff0  addir5,r5,-16
 0:mon r
 R00 = e40f   R16 = 100edbc8
 R01 = c0003e59b3e0   R17 = 100b
 R02 = c09c2110   R18 = 0005
 R03 = c00044bc90e0   R19 = fff0d7a8
 R04 = c00039c4   R20 = fff0d708
 R05 = 0003   R21 = 00ff
 R06 =    R22 = 0006
 R07 = 0001   R23 = c079ab49
 R08 = 723a7573725f743a   R24 = c000372fe2a8
 R09 = 3a6f626a6563745f   R25 = c00044bc90c8
 R10 = c0003b250968   R26 = c000372fe240
 R11 = c0039500   R27 = c000372fe3b0
 R12 = d244c590   R28 = c000372c5280
 R13 = c0a53480   R29 = 001b
 R14 = 100d   R30 = d24654d0
 R15 =    R31 = ffde
 pc  = c0039574 .memcpy+0x74/0x244
 lr  = d244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
 msr = 80009032   cr  = 4400844b
 ctr =    xer = 0001   trap =  300
 dar = c00039d0   dsisr = 4000
 0:mon
  Yes, this makes me even more suspitious that memcpy() on powerpc could
be at fault. The instruction (ld r9,8(r4)) is loading last 8 bytes to copy,
but in fact it should load only 3 bytes in our case because remaining 5
bytes are not in the range we specified and thus larger load can cause
page fault...

Honza
-- 
Jan Kara j...@suse.cz
SuSE CR Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-24 Thread Geert Uytterhoeven
On Mon, 23 Feb 2009, Paul Mackerras wrote:
 Andrew Morton writes:
  It looks like we died in ext3_xattr_block_get():
  
  memcpy(buffer, bh-b_data + le16_to_cpu(entry-e_value_offs),
 size);
  
  Perhaps entry-e_value_offs is no good.  I wonder if the filesystem is
  corrupted and this snuck through the defenses.
  
  I also wonder if there is enough info in that trace for a ppc person to
  be able to determine whether the faulting address is in the source or
  destination of the memcpy() (please)?
 
 It appears to have faulted on a load, implicating the source.  The
 address being referenced (0xc0003f38) doesn't look
 outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
 on, and what page size is selected?

I'm seeing a similar thing on PS3, but not in ext3. During early userspace
setup (udevd), it crashes accessing a 0xc00* address in:

| NIP setup+0x20/0x130
| LR copy_user_page+0x18/0x6c
| Call trace:
| do_wp_page+0x5b4/0x89c
| do_page_fault+0x3a8/0x58c
| handle_page_fault+0x20/0x5c

I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.

If needed, I can probably bisect this tomorrow. It definitely didn't happen in
2.6.29-rc5.

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:+32 (0)2 700 8453
Fax:  +32 (0)2 700 8622
E-mail:   geert.uytterhoe...@sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-24 Thread Mark Nelson
On Wed, 25 Feb 2009 02:51:20 am Jan Kara wrote:
   Hello,
 
 On Tue 24-02-09 12:08:37, Sachin P. Sant wrote:
  Jan Kara wrote:
Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
  somehow got beyond end of the page referenced by bh-b_data. So it means
  that le16_to_cpu(entry-e_value_offs) + size  page_size. But
  ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
  particular checks whether e_value_offs + e_value_size isn't greater than
  bh-b_size. So I see no way how memcpy can get beyond end of the page.
Sachin, is the problem reproducible? If yes, can you send us contents

  Yes, i am able to recreate this problem easily. As i had mentioned if the
  earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
  i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 
  boots
  without any problem.
 
  of the page just before the faulting address (i.e., for current fault it
  would be 0xc0003f37-0xc0003f37). As far as I can
  remember powerpc monitor could dump it.

  Here is the page dump. This time it crashed while accessing address
  0xc0002d67.
   Thanks for the dump.
 
  Unable to handle kernel paging request for data at address 0xc
  0002d67
  Faulting instruction address: 0xc0039574
  cpu 0x1: Vector: 300 (Data Access) at [c0004288b0b0]
 pc: c0039574: .memcpy+0x74/0x244
 lr: c01b497c: .ext3_xattr_get+0x288/0x2f4
 sp: c0004288b330
msr: 80009032
 
  1:mon d 0xc0002d66
  ... SNIP ...
 
  c0002d66efd0    ||
  c0002d66efe0    ||
  c0002d66eff0    ||
  c0002d66f000 02ea0004 0100e200d20a  ||
  c0002d66f010    ||
  c0002d66f020 0706e40f 1b00e200d20a  ||
  c0002d66f030 73656c696e757800   |selinux.|
  c0002d66f040    ||
  c0002d66f050    ||
  c0002d66f060    ||
 
  ... SNIP ...
 
  c0002d66ff60    ||
  c0002d66ff70    ||
  c0002d66ff80    ||
  c0002d66ff90    ||
  c0002d66ffa0    ||
  c0002d66ffb0    ||
  c0002d66ffc0    ||
  c0002d66ffd0    ||
  c0002d66ffe0 73797374 656d5f753a6f626a  |system_u:obj|
  c0002d66fff0 6563745f723a7573 725f743a7330  |ect_r:usr_t:s0..|
  c0002d67    ||
  1:mon r
  R00 = e40f   R16 = 005d
  R01 = c0004288b330   R17 = 
  R02 = c09f59b8   R18 = fffbfe9e
  R03 = c00044aa34a0   R19 = 10042638
  R04 = c0002d66fff4   R20 = 10041610
  R05 = 0003   R21 = 00ff
  R06 =    R22 = 0006
  R07 = 0001   R23 = c07d27c1
  R08 = 723a7573725f743a   R24 = c0002c0cd758
  R09 = 3a6f626a6563745f   R25 = c00044aa3488
  R10 = c017b43c   R26 = c0002c0cd6f0
  R11 = c0002d66f020   R27 = c0002c0cd860
  R12 = d23c14b0   R28 = c0002c0b0840
  R13 = c0a93680   R29 = 001b
  R14 = 41ed   R30 = c09880b0
  R15 = 1004   R31 = ffde
  pc  = c0039574 .memcpy+0x74/0x244
  lr  = c01b497c .ext3_xattr_get+0x288/0x2f4
  msr = 80009032   cr  = 4400044b
  ctr =    xer = 2001   trap =  300
  dar = c0002d67   dsisr = 4000
  1:mon zr
 
BTW, I suppose you use 4KB blocksize on the filesystem, right?

  Yes.
 
  dumpe2fs /dev/sda3 | grep -i block size dumpe2fs 1.39 (29-May-2006)
  Block size:   4096
   OK. The xattr block causing oops is completely correct. To me it seems
 more like some problem in powerpc memcpy() (I saw there went some changes
 into in in the end of December) - we call it to copy 27 bytes from
 address 0xc0002d66ffe4 (which is one byte before end of the page).
 Could some of the powerpc guys have a look whether this could be the case?
 I'm not quite fluent in the powerpc assembly so it would take me ages ;).

You're right - it's a problem with the 64bit 

Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-24 Thread Mark Nelson
On Wed, 25 Feb 2009 05:01:59 am Geert Uytterhoeven wrote:
 On Mon, 23 Feb 2009, Paul Mackerras wrote:
  Andrew Morton writes:
   It looks like we died in ext3_xattr_block_get():
   
 memcpy(buffer, bh-b_data + le16_to_cpu(entry-e_value_offs),
size);
   
   Perhaps entry-e_value_offs is no good.  I wonder if the filesystem is
   corrupted and this snuck through the defenses.
   
   I also wonder if there is enough info in that trace for a ppc person to
   be able to determine whether the faulting address is in the source or
   destination of the memcpy() (please)?
  
  It appears to have faulted on a load, implicating the source.  The
  address being referenced (0xc0003f38) doesn't look
  outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
  on, and what page size is selected?
 
 I'm seeing a similar thing on PS3, but not in ext3. During early userspace
 setup (udevd), it crashes accessing a 0xc00* address in:
 
 | NIP setup+0x20/0x130
 | LR copy_user_page+0x18/0x6c
 | Call trace:
 | do_wp_page+0x5b4/0x89c
 | do_page_fault+0x3a8/0x58c
 | handle_page_fault+0x20/0x5c
 
 I have CONFIG_DEBUG_PAGEALLOC=y. If I disable it, the system boots fine.
 
 If needed, I can probably bisect this tomorrow. It definitely didn't happen in
 2.6.29-rc5.

No need to bisect - it was 25d6e2d7c58ddc4a3b614fc5381591c0cfe66556, my
commit that optimised 64bit memcpy() for Power6 and Cell.

The bug was in -rc1, but if your copies were 8-byte aligned with respect
to the source the problem wouldn't have been seen... Could this have
been why you didn't see it in -rc5?

I'll work on a fix now.

Thanks!

Mark
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-24 Thread Mark Nelson
On Tue, 24 Feb 2009 05:38:37 pm Sachin P. Sant wrote:
 Jan Kara wrote:
Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
  somehow got beyond end of the page referenced by bh-b_data. So it means
  that le16_to_cpu(entry-e_value_offs) + size  page_size. But
  ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
  particular checks whether e_value_offs + e_value_size isn't greater than
  bh-b_size. So I see no way how memcpy can get beyond end of the page.
Sachin, is the problem reproducible? If yes, can you send us contents

 Yes, i am able to recreate this problem easily. As i had mentioned if the
 earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
 i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
 without any problem.

Hi Sanchin and Geert,

Does the patch below fix the problems you're seeing? If it does I'll send
a properly written up and formatted patch to linuxppc-dev (as well as
another one to fix the same problem in copy_tofrom_user()).

Thanks and sorry again!

Mark

---
 arch/powerpc/lib/memcpy_64.S |   26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

Index: upstream/arch/powerpc/lib/memcpy_64.S
===
--- upstream.orig/arch/powerpc/lib/memcpy_64.S
+++ upstream/arch/powerpc/lib/memcpy_64.S
@@ -53,18 +53,19 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
 3: std r8,8(r3)
beq 3f
addir3,r3,16
-   ld  r9,8(r4)
 .Ldo_tail:
bf  cr7*4+1,1f
-   rotldi  r9,r9,32
+   lwz r9,8(r4)
+   addir4,r4,4
stw r9,0(r3)
addir3,r3,4
 1: bf  cr7*4+2,2f
-   rotldi  r9,r9,16
+   lhz r9,8(r4)
+   addir4,r4,2
sth r9,0(r3)
addir3,r3,2
 2: bf  cr7*4+3,3f
-   rotldi  r9,r9,8
+   lbz r9,8(r4)
stb r9,0(r3)
 3: ld  r3,48(r1)   /* return dest pointer */
blr
@@ -133,11 +134,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_
cmpwi   cr1,r5,8
addir3,r3,32
sld r9,r9,r10
-   ble cr1,.Ldo_tail
+   ble cr1,6f
ld  r0,8(r4)
srd r7,r0,r11
or  r9,r7,r9
-   b   .Ldo_tail
+6:
+   bf  cr7*4+1,1f
+   rotldi  r9,r9,32
+   stw r9,0(r3)
+   addir3,r3,4
+1: bf  cr7*4+2,2f
+   rotldi  r9,r9,16
+   sth r9,0(r3)
+   addir3,r3,2
+2: bf  cr7*4+3,3f
+   rotldi  r9,r9,8
+   stb r9,0(r3)
+3: ld  r3,48(r1)   /* return dest pointer */
+   blr
 
 .Ldst_unaligned:
PPC_MTOCRF  0x01,r6 # put #bytes to 8B bdry into cr7
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-23 Thread Andrew Morton
On Mon, 23 Feb 2009 15:16:05 +0530 Sachin P. Sant sach...@in.ibm.com wrote:

 2.6.29-rc6 bootup on a powerpc box failed with
 
 Unable to handle kernel paging request for data at address 0xc0003f38
 Faulting instruction address: 0xc0039574
 cpu 0x1: Vector: 300 (Data Access) at [c0003baf3020]
 pc: c0039574: .memcpy+0x74/0x244
 lr: d244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
 sp: c0003baf32a0
msr: 80009032
dar: c0003f38
  dsisr: 4000
   current = 0xc0003e54b010
   paca= 0xc0a53680
 pid   = 1840, comm = readahead
 enter ? for help
 [link register   ] d244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
 [c0003baf32a0] d2449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
 (unreliab
 le)
 [c0003baf3390] d244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
 [c0003baf3400] c0148154 .generic_getxattr+0x74/0x9c
 [c0003baf34a0] c0333400 .inode_doinit_with_dentry+0x1c4/0x678
 [c0003baf3560] c032c6b0 .security_d_instantiate+0x50/0x68
 [c0003baf35e0] c013c818 .d_instantiate+0x78/0x9c
 [c0003baf3680] c013ced0 .d_splice_alias+0xf0/0x120
 [c0003baf3720] d243e05c .ext3_lookup+0xec/0x134 [ext3]
 [c0003baf37c0] c0131e74 .do_lookup+0x110/0x260
 [c0003baf3880] c0134ed0 .__link_path_walk+0xa98/0x1010
 [c0003baf3970] c01354a0 .path_walk+0x58/0xc4
 [c0003baf3a20] c0135720 .do_path_lookup+0x138/0x1e4
 [c0003baf3ad0] c013645c .path_lookup_open+0x6c/0xc8
 [c0003baf3b70] c0136780 .do_filp_open+0xcc/0x874
 [c0003baf3d10] c01251e0 .do_sys_open+0x80/0x140
 [c0003baf3dc0] c016aaec .compat_sys_open+0x24/0x38
 [c0003baf3e30] c000855c syscall_exit+0x0/0x40
 --- Exception: c01 (System Call) at 0ff0ef18
 SP (ffc6f4b0) is in userspace
 1:mon
 
 Following EXT3 related options were enabled in the config.
 
 CONFIG_EXT3_FS=m
 CONFIG_EXT3_FS_XATTR=y
 CONFIG_EXT3_FS_POSIX_ACL=y
 CONFIG_EXT3_FS_SECURITY=y
 

hm, I wonder what could have caused that - we haven't altered
fs/ext3/xattr.c in ages.

What is the most recent kernel version you know of which didn't do
this?  Bear in mind that this crash might be triggered by the
current contents of the filesystem, so if possible, please test
some other kernel versions on that disk.

It looks like we died in ext3_xattr_block_get():

memcpy(buffer, bh-b_data + le16_to_cpu(entry-e_value_offs),
   size);

Perhaps entry-e_value_offs is no good.  I wonder if the filesystem is
corrupted and this snuck through the defenses.

I also wonder if there is enough info in that trace for a ppc person to
be able to determine whether the faulting address is in the source or
destination of the memcpy() (please)?

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-23 Thread Paul Mackerras
Andrew Morton writes:

 It looks like we died in ext3_xattr_block_get():
 
   memcpy(buffer, bh-b_data + le16_to_cpu(entry-e_value_offs),
  size);
 
 Perhaps entry-e_value_offs is no good.  I wonder if the filesystem is
 corrupted and this snuck through the defenses.
 
 I also wonder if there is enough info in that trace for a ppc person to
 be able to determine whether the faulting address is in the source or
 destination of the memcpy() (please)?

It appears to have faulted on a load, implicating the source.  The
address being referenced (0xc0003f38) doesn't look
outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
on, and what page size is selected?

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-23 Thread Sachin P. Sant

Andrew Morton wrote:

hm, I wonder what could have caused that - we haven't altered
fs/ext3/xattr.c in ages.

What is the most recent kernel version you know of which didn't do
this?  Bear in mind that this crash might be triggered by the
current contents of the filesystem, so if possible, please test
some other kernel versions on that disk.
  

I am trying to boot a vanilla kernel on this machine for the first
time. Haven't tried any other kernels. Will give it a try.


It looks like we died in ext3_xattr_block_get():

memcpy(buffer, bh-b_data + le16_to_cpu(entry-e_value_offs),
   size);

Perhaps entry-e_value_offs is no good.  I wonder if the filesystem is
corrupted and this snuck through the defenses.

I also wonder if there is enough info in that trace for a ppc person to
be able to determine whether the faulting address is in the source or
destination of the memcpy() (please)?
  

Some more information if this could be of any help.

0:mon di 0xc0039574
c0039574  e9240008  ld  r9,8(r4)
c0039578  409d0010  ble cr7,c0039588# 
.memcpy+0x88/0x244
c003957c  79290002  rotldi  r9,r9,32
c0039580  9123  stw r9,0(r3)
c0039584  38630004  addir3,r3,4
c0039588  409e0010  bne cr7,c0039598# 
.memcpy+0x98/0x244
c003958c  79298000  rotldi  r9,r9,16
c0039590  b123  sth r9,0(r3)
c0039594  38630002  addir3,r3,2
c0039598  409f000c  bns cr7,c00395a4# 
.memcpy+0xa4/0x244
c003959c  79294000  rotldi  r9,r9,8
c00395a0  9923  stb r9,0(r3)
c00395a4  e8610030  ld  r3,48(r1)
c00395a8  4e800020  blr
c00395ac  78a6e8c2  rldicl  r6,r5,61,3
c00395b0  38a5fff0  addir5,r5,-16
0:mon r
R00 = e40f   R16 = 100edbc8
R01 = c0003e59b3e0   R17 = 100b
R02 = c09c2110   R18 = 0005
R03 = c00044bc90e0   R19 = fff0d7a8
R04 = c00039c4   R20 = fff0d708
R05 = 0003   R21 = 00ff
R06 =    R22 = 0006
R07 = 0001   R23 = c079ab49
R08 = 723a7573725f743a   R24 = c000372fe2a8
R09 = 3a6f626a6563745f   R25 = c00044bc90c8
R10 = c0003b250968   R26 = c000372fe240
R11 = c0039500   R27 = c000372fe3b0
R12 = d244c590   R28 = c000372c5280
R13 = c0a53480   R29 = 001b
R14 = 100d   R30 = d24654d0
R15 =    R31 = ffde
pc  = c0039574 .memcpy+0x74/0x244
lr  = d244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
msr = 80009032   cr  = 4400844b
ctr =    xer = 0001   trap =  300
dar = c00039d0   dsisr = 4000
0:mon

So the other thing i noticed was that this machine was running
a kernel with selinux enabled. I turned off selinux and there
were no issues during bootup. It was a clean boot.

Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-23 Thread Sachin P. Sant

Paul Mackerras wrote:

It appears to have faulted on a load, implicating the source.  The
address being referenced (0xc0003f38) doesn't look
outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
on, and what page size is selected?

Yes CONFIG_DEBUG_PAGEALLOC is enabled and the page size is 64K.

CONFIG_DEBUG_PAGEALLOC=y
CONFIG_PPC_64K_PAGES=y

Thanks
-Sachin


--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-23 Thread Jan Kara
 Andrew Morton writes:
 
  It looks like we died in ext3_xattr_block_get():
  
  memcpy(buffer, bh-b_data + le16_to_cpu(entry-e_value_offs),
 size);
  
  Perhaps entry-e_value_offs is no good.  I wonder if the filesystem is
  corrupted and this snuck through the defenses.
  
  I also wonder if there is enough info in that trace for a ppc person to
  be able to determine whether the faulting address is in the source or
  destination of the memcpy() (please)?
 
 It appears to have faulted on a load, implicating the source.  The
 address being referenced (0xc0003f38) doesn't look
 outlandish.  I wonder if this kernel has CONFIG_DEBUG_PAGEALLOC turned
 on, and what page size is selected?
  Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
somehow got beyond end of the page referenced by bh-b_data. So it means
that le16_to_cpu(entry-e_value_offs) + size  page_size. But
ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
particular checks whether e_value_offs + e_value_size isn't greater than
bh-b_size. So I see no way how memcpy can get beyond end of the page.
  Sachin, is the problem reproducible? If yes, can you send us contents
of the page just before the faulting address (i.e., for current fault it
would be 0xc0003f37-0xc0003f37). As far as I can
remember powerpc monitor could dump it.
  BTW, I suppose you use 4KB blocksize on the filesystem, right?

Honza
-- 
Jan Kara j...@suse.cz
SuSE CR Labs
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Crash (ext3 ) during 2.6.29-rc6 boot

2009-02-23 Thread Sachin P. Sant

Jan Kara wrote:

  Hmm, OK. But then I'm not sure how that can happen. Obviously, memcpy
somehow got beyond end of the page referenced by bh-b_data. So it means
that le16_to_cpu(entry-e_value_offs) + size  page_size. But
ext3_xattr_find_entry() calls ext3_xattr_check_entry() which in
particular checks whether e_value_offs + e_value_size isn't greater than
bh-b_size. So I see no way how memcpy can get beyond end of the page.
  Sachin, is the problem reproducible? If yes, can you send us contents
  

Yes, i am able to recreate this problem easily. As i had mentioned if the
earlier kernel is booted with selinux enabled and then 2.6.29-rc6 is booted
i get this crash. But if i specify selinux=0 at command line, 2.6.29-rc6 boots
without any problem.


of the page just before the faulting address (i.e., for current fault it
would be 0xc0003f37-0xc0003f37). As far as I can
remember powerpc monitor could dump it.
  

Here is the page dump. This time it crashed while accessing address
0xc0002d67.

Unable to handle kernel paging request for data at address 0xc
0002d67
Faulting instruction address: 0xc0039574
cpu 0x1: Vector: 300 (Data Access) at [c0004288b0b0]
   pc: c0039574: .memcpy+0x74/0x244
   lr: c01b497c: .ext3_xattr_get+0x288/0x2f4
   sp: c0004288b330
  msr: 80009032

1:mon d 0xc0002d66
... SNIP ...

c0002d66efd0    ||
c0002d66efe0    ||
c0002d66eff0    ||
c0002d66f000 02ea0004 0100e200d20a  ||
c0002d66f010    ||
c0002d66f020 0706e40f 1b00e200d20a  ||
c0002d66f030 73656c696e757800   |selinux.|
c0002d66f040    ||
c0002d66f050    ||
c0002d66f060    ||

... SNIP ...

c0002d66ff60    ||
c0002d66ff70    ||
c0002d66ff80    ||
c0002d66ff90    ||
c0002d66ffa0    ||
c0002d66ffb0    ||
c0002d66ffc0    ||
c0002d66ffd0    ||
c0002d66ffe0 73797374 656d5f753a6f626a  |system_u:obj|
c0002d66fff0 6563745f723a7573 725f743a7330  |ect_r:usr_t:s0..|
c0002d67    ||
1:mon r
R00 = e40f   R16 = 005d
R01 = c0004288b330   R17 = 
R02 = c09f59b8   R18 = fffbfe9e
R03 = c00044aa34a0   R19 = 10042638
R04 = c0002d66fff4   R20 = 10041610
R05 = 0003   R21 = 00ff
R06 =    R22 = 0006
R07 = 0001   R23 = c07d27c1
R08 = 723a7573725f743a   R24 = c0002c0cd758
R09 = 3a6f626a6563745f   R25 = c00044aa3488
R10 = c017b43c   R26 = c0002c0cd6f0
R11 = c0002d66f020   R27 = c0002c0cd860
R12 = d23c14b0   R28 = c0002c0b0840
R13 = c0a93680   R29 = 001b
R14 = 41ed   R30 = c09880b0
R15 = 1004   R31 = ffde
pc  = c0039574 .memcpy+0x74/0x244
lr  = c01b497c .ext3_xattr_get+0x288/0x2f4
msr = 80009032   cr  = 4400044b
ctr =    xer = 2001   trap =  300
dar = c0002d67   dsisr = 4000
1:mon zr


  BTW, I suppose you use 4KB blocksize on the filesystem, right?
  

Yes.

dumpe2fs /dev/sda3 | grep -i block size 
dumpe2fs 1.39 (29-May-2006)

Block size:   4096

Thanks
-Sachin

--

-
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
-

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev