Re[4]: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-08 Thread Andreas-Sokov
Hello, Neil.

YOU WROTE : 5 февраля 2008 г., 13:10:00:
 On Tuesday February 5, [EMAIL PROTECTED] wrote:
 Feb  5 11:56:12 raid01 kernel: BUG: unable to handle kernel paging request 
 at virtual address 001cd901

 This looks like some sort of memory corruption.



 Possible you have bad memory, or a bad CPU, or you are overclocking
 the CPU, or it is getting hot, or something.


 But you clearly have a hardware error.

 NeilBrown

At this moment i have checked my server. As you wrote earlier Somethere is 
hidden
prblem or problems. We try find in waht is it and did not find. Try change other
memory modules - result was same (kernel panic, one way or another).

SO, then we move RAID-HDDs into another computer reshape have passed fine!
And now there continue reshape 5-7 drives normaly.

Thank you very much !

-- 
Best regards,
Andreas-Sokov

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-07 Thread Bill Davidsen

Andreas-Sokov wrote:

Hello, Neil.

.
  

Possible you have bad memory, or a bad CPU, or you are overclocking
the CPU, or it is getting hot, or something.



As seems to me all my problems has been started after i have started update 
MDADM.
This is server worked normaly (but only not like soft-raid) more 2-3 years.
Last 6 months it worked as soft-raid. All was normaly, Even I have added 
successfully
4th hdd into raid5 )when it stared was 3 hdd). And then Reshaping have been 
passed fine.

Yesterday i have did memtest86 onto it server and 10 passes was WITH OUT any 
errors.
Temperature of server is about 25 grad celsius.
No overlocking, all set to default.

  
What did you find when you loaded the module with gdb as Neil suggested? 
If the code in the module doesn't match the code in memory you have a 
hardware error. memtest86 is a useful tool, but it is not a definitive 
test because it doesn't use all CPUs and do i/o at the same time to load 
the memory bus.



Realy i do not know what to do because off wee nedd grow our storage, and we 
can not.
unfortunately, At this moment - Mdadm do not help us in this decision, but very 
want
it get.
  


I would pull out half my memory and retest. If it still fails I would 
swap to the other half of memory. If that didn't show a change I would 
check that the code in the module is what Neil showed in his last 
message (I assume you already have), and then reseat all of the cables, etc.


I agree with Neil:

But you clearly have a hardware error.



  

NeilBrown





  



--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re[4]: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-06 Thread Andreas-Sokov
Hello, Neil.

.
 Possible you have bad memory, or a bad CPU, or you are overclocking
 the CPU, or it is getting hot, or something.

As seems to me all my problems has been started after i have started update 
MDADM.
This is server worked normaly (but only not like soft-raid) more 2-3 years.
Last 6 months it worked as soft-raid. All was normaly, Even I have added 
successfully
4th hdd into raid5 )when it stared was 3 hdd). And then Reshaping have been 
passed fine.

Yesterday i have did memtest86 onto it server and 10 passes was WITH OUT any 
errors.
Temperature of server is about 25 grad celsius.
No overlocking, all set to default.

Realy i do not know what to do because off wee nedd grow our storage, and we 
can not.
unfortunately, At this moment - Mdadm do not help us in this decision, but very 
want
it get.

 But you clearly have a hardware error.

 NeilBrown



-- 
Best regards,
Andreas-Sokov

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-06 Thread Janek Kozicki
Andreas-Sokov said: (by the date of Wed, 6 Feb 2008 22:15:05 +0300)

 Hello, Neil.
 
 .
  Possible you have bad memory, or a bad CPU, or you are overclocking
  the CPU, or it is getting hot, or something.
 
 As seems to me all my problems has been started after i have started update 
 MDADM.

what is the update?

- you installed a new version of mdadm?
- you installed new kernel?
- something else?

- what was the version before, and what version is now?

- can you downgrade to previous version?


best regards
-- 
Janek Kozicki |
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re[2]: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-05 Thread Andreas-Sokov
Hello, Neil.

YOU WROTE : 5 февраля 2008 г., 01:48:33:
 On Monday February 4, [EMAIL PROTECTED] wrote:
 
 [EMAIL PROTECTED]:/# cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
 [multipath] [faulty]
 md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1]
   1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] 
 [_]
 
 unused devices: none
 
 ##
 But how i can see the status of reshaping ?
 Is it reshaped realy ? or may be just hang up ? or may be mdadm nothing do 
 not give in
 general ?
 How long wait when reshaping will finish ?
 ##
 

 The reshape hasn't restarted.

 Did you do that mdadm -w /dev/md1 like I suggested?  If so, what
 happened?

 Possibly you tried mounting the filesystem before trying the mdadm
 -w.  There seems to be a bug such that doing this would cause the
 reshape not to restart, and mdadm -w would not help any more.

 I suggest you:

   echo 0  /sys/module/md_mod/parameters/start_ro

 stop the array 
   mdadm -S /dev/md1
 (after unmounting if necessary).

 Then assemble the array again.
 Then
   mdadm -w /dev/md1

 just to be sure.

 If this doesn't work, please report exactly what you did, exactly what
 message you got and exactly where message appeared in the kernel log.

 NeilBrown
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

I read again your latter.
at first time i did not do

echo 0  /sys/module/md_mod/parameters/start_ro

now i have done this, then
mdadm -S /dev/md1
mdadm /dev/md1 -A /dev/sd[bcdef]
mdadm -w /dev/md1

and i have : after 2 minutes kernel show something
but reshaping during in process still

[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
[multipath] [faulty]
md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1]
  1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_]
  [==..]  reshape = 10.1% (49591552/488386496) 
finish=12127.2min speed=602K/sec

unused devices: none
[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
[multipath] [faulty]
md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1]
  1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_]
  [==..]  reshape = 10.1% (49591552/488386496) 
finish=12259.0min speed=596K/sec

unused devices: none
[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
[multipath] [faulty]
md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1]
  1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_]
  [==..]  reshape = 10.1% (49591552/488386496) 
finish=12311.7min speed=593K/sec

unused devices: none
[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
[multipath] [faulty]
md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1]
  1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_]
  [==..]  reshape = 10.1% (49591552/488386496) 
finish=12338.1min speed=592K/sec

unused devices: none




Feb  5 11:54:21 raid01 kernel: raid5: reshape will continue
Feb  5 11:54:21 raid01 kernel: raid5: device sdc operational as raid disk 0
Feb  5 11:54:21 raid01 kernel: raid5: device sdf operational as raid disk 3
Feb  5 11:54:21 raid01 kernel: raid5: device sde operational as raid disk 2
Feb  5 11:54:21 raid01 kernel: raid5: device sdd operational as raid disk 1
Feb  5 11:54:21 raid01 kernel: raid5: allocated 5245kB for md1
Feb  5 11:54:21 raid01 kernel: raid5: raid level 5 set md1 active with 4 out of 
5 devices, algorithm 2
Feb  5 11:54:21 raid01 kernel: RAID5 conf printout:
Feb  5 11:54:21 raid01 kernel:  --- rd:5 wd:4
Feb  5 11:54:21 raid01 kernel:  disk 0, o:1, dev:sdc
Feb  5 11:54:21 raid01 kernel:  disk 1, o:1, dev:sdd
Feb  5 11:54:21 raid01 kernel:  disk 2, o:1, dev:sde
Feb  5 11:54:21 raid01 kernel:  disk 3, o:1, dev:sdf
Feb  5 11:54:21 raid01 kernel: ...ok start reshape thread
Feb  5 11:54:21 raid01 mdadm: RebuildStarted event detected on md device 
/dev/md1
Feb  5 11:54:21 raid01 kernel: md: reshape of RAID array md1
Feb  5 11:54:21 raid01 kernel: md: minimum _guaranteed_  speed: 1000 
KB/sec/disk.
Feb  5 11:54:21 raid01 kernel: md: using maximum available idle IO bandwidth 
(but not more than 20 KB/sec) for reshape.
Feb  5 11:54:21 raid01 kernel: md: using 128k window, over a total of 488386496 
blocks.
Feb  5 11:56:12 raid01 kernel: BUG: unable to handle kernel paging request at 
virtual address 001cd901
Feb  5 11:56:12 raid01 kernel:  printing eip:

Re: Re[2]: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-05 Thread Neil Brown
On Tuesday February 5, [EMAIL PROTECTED] wrote:
 Feb  5 11:56:12 raid01 kernel: BUG: unable to handle kernel paging request at 
 virtual address 001cd901

This looks like some sort of memory corruption.

 Feb  5 11:56:12 raid01 kernel: EIP is at md_do_sync+0x629/0xa32

This tells us what code is executing.

 Feb  5 11:56:12 raid01 kernel: Code: 54 24 48 0f 87 a4 01 00 00 72 0a 3b 44 
 24 44 0f 87 98 01 00 00 3b 7c 24 40 75 0a 3b 74 24 3c 0f 84 88 01 00 00 0b 85 
 30 01 00 00 88 08 0f 85 90 01 00 00 8b 85 30 01 00 00 a8 04 0f 85 82 01 00

This tells us what the actual byte of code were.
If I feed this line (from Code: onwards) into ksymoops I get 

   0:   54push   %esp
   1:   24 48 and$0x48,%al
   3:   0f 87 a4 01 00 00 ja 1ad _EIP+0x1ad
   9:   72 0a jb 15 _EIP+0x15
   b:   3b 44 24 44   cmp0x44(%esp),%eax
   f:   0f 87 98 01 00 00 ja 1ad _EIP+0x1ad
  15:   3b 7c 24 40   cmp0x40(%esp),%edi
  19:   75 0a jne25 _EIP+0x25
  1b:   3b 74 24 3c   cmp0x3c(%esp),%esi
  1f:   0f 84 88 01 00 00 je 1ad _EIP+0x1ad
  25:   0b 85 30 01 00 00 or 0x130(%ebp),%eax
Code;   Before first symbol
  2b:   88 08 mov%cl,(%eax)
  2d:   0f 85 90 01 00 00 jne1c3 _EIP+0x1c3
  33:   8b 85 30 01 00 00 mov0x130(%ebp),%eax
  39:   a8 04 test   $0x4,%al
  3b:   0f.byte 0xf
  3c:   85.byte 0x85
  3d:   82(bad)  
  3e:   01 00 add%eax,(%eax)


I removed the Code;... lines as they are just noise, except for the
one that points to the current instruction in the middle.
Note that it is dereferencing %eax, after just 'or'ing some value into
it, which is rather unusual.

Now get the md-mod.ko for the kernel you are running.
run
   gdb md-mod.ko

and give the command

   disassemble md_do_sync

and look for code at offset 0x629, which is 1577 in decimal.

I found a similar kernel to what you are running, and the matching code
is 

0x55c0 md_do_sync+1485:   cmp0x30(%esp),%eax
0x55c4 md_do_sync+1489:   ja 0x5749 md_do_sync+1878
0x55ca md_do_sync+1495:   cmp0x2c(%esp),%edi
0x55ce md_do_sync+1499:   jne0x55da md_do_sync+1511
0x55d0 md_do_sync+1501:   cmp0x28(%esp),%esi
0x55d4 md_do_sync+1505:   je 0x5749 md_do_sync+1878
0x55da md_do_sync+1511:   mov0x130(%ebp),%eax
0x55e0 md_do_sync+1517:   test   $0x8,%al
0x55e2 md_do_sync+1519:   jne0x575f md_do_sync+1900
0x55e8 md_do_sync+1525:   mov0x130(%ebp),%eax
0x55ee md_do_sync+1531:   test   $0x4,%al
0x55f0 md_do_sync+1533:   jne0x575f md_do_sync+1900
0x55f6 md_do_sync+1539:   mov0x38(%esp),%ecx
0x55fa md_do_sync+1543:   mov0x0,%eax
-

Note the sequence cmp, ja, cmp, jne, cmp, je
where the cmp arguments are consecutive 4byte values on the stack
(%esp).
In the code from your oops, the offsets are 0x44 0x40 0x3c.
In the kernel I found they are 0x30 0x2c 0x28.  The difference is some
subtle difference in the kernel, possibly a different compiler or
something.

Anyway, your code crashed at 


  25:   0b 85 30 01 00 00 or 0x130(%ebp),%eax
Code;   Before first symbol
  2b:   88 08 mov%cl,(%eax)

The matching code in the kernel I found is 

0x55da md_do_sync+1511:   mov0x130(%ebp),%eax
0x55e0 md_do_sync+1517:   test   $0x8,%al

Note that you have an 'or', the kernel I found has 'mov'.

If we look at the actual byte of code for those two instructions
the code that crashed shows the bytes above:

0b 85 30 01 00 00
88 08

if I get the same bytes with gdb:

(gdb) x/8b 0x55da
0x55da md_do_sync+1511:   0x8b0x850x300x010x000x00
0xa80x08
(gdb) 

So what should be 8b has become 0b, and what should be a8 has
become 08.

If you look for the same data in your md-mod.ko, you might find
slightly different details but it is clear to me that the code in
memory is bad.

Possible you have bad memory, or a bad CPU, or you are overclocking
the CPU, or it is getting hot, or something.


But you clearly have a hardware error.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-04 Thread Neil Brown
On Monday February 4, [EMAIL PROTECTED] wrote:
 
 [EMAIL PROTECTED]:/# cat /proc/mdstat
 Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
 [multipath] [faulty]
 md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1]
   1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] 
 [_]
 
 unused devices: none
 
 ##
 But how i can see the status of reshaping ?
 Is it reshaped realy ? or may be just hang up ? or may be mdadm nothing do 
 not give in
 general ?
 How long wait when reshaping will finish ?
 ##
 

The reshape hasn't restarted.

Did you do that mdadm -w /dev/md1 like I suggested?  If so, what
happened?

Possibly you tried mounting the filesystem before trying the mdadm
-w.  There seems to be a bug such that doing this would cause the
reshape not to restart, and mdadm -w would not help any more.

I suggest you:

  echo 0  /sys/module/md_mod/parameters/start_ro

stop the array 
  mdadm -S /dev/md1
(after unmounting if necessary).

Then assemble the array again.
Then
  mdadm -w /dev/md1

just to be sure.

If this doesn't work, please report exactly what you did, exactly what
message you got and exactly where message appeared in the kernel log.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm 2.6.4 : How i can check out current status of reshaping ?

2008-02-03 Thread Andreas-Sokov
Hi linux-raid.

on DEBIAN :

[EMAIL PROTECTED]:/# mdadm -D /dev/md1
/dev/md1:
Version : 00.91.03
  Creation Time : Tue Nov 13 18:42:36 2007
 Raid Level : raid5
 Array Size : 1465159488 (1397.29 GiB 1500.32 GB)
  Used Dev Size : 488386496 (465.76 GiB 500.11 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Mon Feb  4 06:51:47 2008
  State : clean, degraded
 Active Devices : 4
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 1

 Layout : left-symmetric
 Chunk Size : 64K

  Delta Devices : 1, (4-5)
^
  UUID : 4fbdc8df:07b952cf:7cc6faa0:04676ba5
 Events : 0.683598

Number   Major   Minor   RaidDevice State
   0   8   320  active sync   /dev/sdc
   1   8   481  active sync   /dev/sdd
   2   8   642  active sync   /dev/sde
   3   8   803  active sync   /dev/sdf
   4   004  removed

   5   8   16-  spare   /dev/sdb



[EMAIL PROTECTED]:/# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
[multipath] [faulty]
md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1]
  1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_]

unused devices: none

##
But how i can see the status of reshaping ?
Is it reshaped realy ? or may be just hang up ? or may be mdadm nothing do not 
give in
general ?
How long wait when reshaping will finish ?
##




-- 
Best regards,
Andreas-Sokov

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html