Re[4]: mdadm 2.6.4 : How i can check out current status of reshaping ?
Hello, Neil. YOU WROTE : 5 февраля 2008 г., 13:10:00: On Tuesday February 5, [EMAIL PROTECTED] wrote: Feb 5 11:56:12 raid01 kernel: BUG: unable to handle kernel paging request at virtual address 001cd901 This looks like some sort of memory corruption. Possible you have bad memory, or a bad CPU, or you are overclocking the CPU, or it is getting hot, or something. But you clearly have a hardware error. NeilBrown At this moment i have checked my server. As you wrote earlier Somethere is hidden prblem or problems. We try find in waht is it and did not find. Try change other memory modules - result was same (kernel panic, one way or another). SO, then we move RAID-HDDs into another computer reshape have passed fine! And now there continue reshape 5-7 drives normaly. Thank you very much ! -- Best regards, Andreas-Sokov - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm 2.6.4 : How i can check out current status of reshaping ?
Andreas-Sokov wrote: Hello, Neil. . Possible you have bad memory, or a bad CPU, or you are overclocking the CPU, or it is getting hot, or something. As seems to me all my problems has been started after i have started update MDADM. This is server worked normaly (but only not like soft-raid) more 2-3 years. Last 6 months it worked as soft-raid. All was normaly, Even I have added successfully 4th hdd into raid5 )when it stared was 3 hdd). And then Reshaping have been passed fine. Yesterday i have did memtest86 onto it server and 10 passes was WITH OUT any errors. Temperature of server is about 25 grad celsius. No overlocking, all set to default. What did you find when you loaded the module with gdb as Neil suggested? If the code in the module doesn't match the code in memory you have a hardware error. memtest86 is a useful tool, but it is not a definitive test because it doesn't use all CPUs and do i/o at the same time to load the memory bus. Realy i do not know what to do because off wee nedd grow our storage, and we can not. unfortunately, At this moment - Mdadm do not help us in this decision, but very want it get. I would pull out half my memory and retest. If it still fails I would swap to the other half of memory. If that didn't show a change I would check that the code in the module is what Neil showed in his last message (I assume you already have), and then reseat all of the cables, etc. I agree with Neil: But you clearly have a hardware error. NeilBrown -- Bill Davidsen [EMAIL PROTECTED] Woe unto the statesman who makes war without a reason that will still be valid when the war is over... Otto von Bismark - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re[4]: mdadm 2.6.4 : How i can check out current status of reshaping ?
Hello, Neil. . Possible you have bad memory, or a bad CPU, or you are overclocking the CPU, or it is getting hot, or something. As seems to me all my problems has been started after i have started update MDADM. This is server worked normaly (but only not like soft-raid) more 2-3 years. Last 6 months it worked as soft-raid. All was normaly, Even I have added successfully 4th hdd into raid5 )when it stared was 3 hdd). And then Reshaping have been passed fine. Yesterday i have did memtest86 onto it server and 10 passes was WITH OUT any errors. Temperature of server is about 25 grad celsius. No overlocking, all set to default. Realy i do not know what to do because off wee nedd grow our storage, and we can not. unfortunately, At this moment - Mdadm do not help us in this decision, but very want it get. But you clearly have a hardware error. NeilBrown -- Best regards, Andreas-Sokov - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm 2.6.4 : How i can check out current status of reshaping ?
Andreas-Sokov said: (by the date of Wed, 6 Feb 2008 22:15:05 +0300) Hello, Neil. . Possible you have bad memory, or a bad CPU, or you are overclocking the CPU, or it is getting hot, or something. As seems to me all my problems has been started after i have started update MDADM. what is the update? - you installed a new version of mdadm? - you installed new kernel? - something else? - what was the version before, and what version is now? - can you downgrade to previous version? best regards -- Janek Kozicki | - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re[2]: mdadm 2.6.4 : How i can check out current status of reshaping ?
Hello, Neil. YOU WROTE : 5 февраля 2008 г., 01:48:33: On Monday February 4, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:/# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_] unused devices: none ## But how i can see the status of reshaping ? Is it reshaped realy ? or may be just hang up ? or may be mdadm nothing do not give in general ? How long wait when reshaping will finish ? ## The reshape hasn't restarted. Did you do that mdadm -w /dev/md1 like I suggested? If so, what happened? Possibly you tried mounting the filesystem before trying the mdadm -w. There seems to be a bug such that doing this would cause the reshape not to restart, and mdadm -w would not help any more. I suggest you: echo 0 /sys/module/md_mod/parameters/start_ro stop the array mdadm -S /dev/md1 (after unmounting if necessary). Then assemble the array again. Then mdadm -w /dev/md1 just to be sure. If this doesn't work, please report exactly what you did, exactly what message you got and exactly where message appeared in the kernel log. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html I read again your latter. at first time i did not do echo 0 /sys/module/md_mod/parameters/start_ro now i have done this, then mdadm -S /dev/md1 mdadm /dev/md1 -A /dev/sd[bcdef] mdadm -w /dev/md1 and i have : after 2 minutes kernel show something but reshaping during in process still [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_] [==..] reshape = 10.1% (49591552/488386496) finish=12127.2min speed=602K/sec unused devices: none [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_] [==..] reshape = 10.1% (49591552/488386496) finish=12259.0min speed=596K/sec unused devices: none [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_] [==..] reshape = 10.1% (49591552/488386496) finish=12311.7min speed=593K/sec unused devices: none [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_] [==..] reshape = 10.1% (49591552/488386496) finish=12338.1min speed=592K/sec unused devices: none Feb 5 11:54:21 raid01 kernel: raid5: reshape will continue Feb 5 11:54:21 raid01 kernel: raid5: device sdc operational as raid disk 0 Feb 5 11:54:21 raid01 kernel: raid5: device sdf operational as raid disk 3 Feb 5 11:54:21 raid01 kernel: raid5: device sde operational as raid disk 2 Feb 5 11:54:21 raid01 kernel: raid5: device sdd operational as raid disk 1 Feb 5 11:54:21 raid01 kernel: raid5: allocated 5245kB for md1 Feb 5 11:54:21 raid01 kernel: raid5: raid level 5 set md1 active with 4 out of 5 devices, algorithm 2 Feb 5 11:54:21 raid01 kernel: RAID5 conf printout: Feb 5 11:54:21 raid01 kernel: --- rd:5 wd:4 Feb 5 11:54:21 raid01 kernel: disk 0, o:1, dev:sdc Feb 5 11:54:21 raid01 kernel: disk 1, o:1, dev:sdd Feb 5 11:54:21 raid01 kernel: disk 2, o:1, dev:sde Feb 5 11:54:21 raid01 kernel: disk 3, o:1, dev:sdf Feb 5 11:54:21 raid01 kernel: ...ok start reshape thread Feb 5 11:54:21 raid01 mdadm: RebuildStarted event detected on md device /dev/md1 Feb 5 11:54:21 raid01 kernel: md: reshape of RAID array md1 Feb 5 11:54:21 raid01 kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Feb 5 11:54:21 raid01 kernel: md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for reshape. Feb 5 11:54:21 raid01 kernel: md: using 128k window, over a total of 488386496 blocks. Feb 5 11:56:12 raid01 kernel: BUG: unable to handle kernel paging request at virtual address 001cd901 Feb 5 11:56:12 raid01 kernel: printing eip:
Re: Re[2]: mdadm 2.6.4 : How i can check out current status of reshaping ?
On Tuesday February 5, [EMAIL PROTECTED] wrote: Feb 5 11:56:12 raid01 kernel: BUG: unable to handle kernel paging request at virtual address 001cd901 This looks like some sort of memory corruption. Feb 5 11:56:12 raid01 kernel: EIP is at md_do_sync+0x629/0xa32 This tells us what code is executing. Feb 5 11:56:12 raid01 kernel: Code: 54 24 48 0f 87 a4 01 00 00 72 0a 3b 44 24 44 0f 87 98 01 00 00 3b 7c 24 40 75 0a 3b 74 24 3c 0f 84 88 01 00 00 0b 85 30 01 00 00 88 08 0f 85 90 01 00 00 8b 85 30 01 00 00 a8 04 0f 85 82 01 00 This tells us what the actual byte of code were. If I feed this line (from Code: onwards) into ksymoops I get 0: 54push %esp 1: 24 48 and$0x48,%al 3: 0f 87 a4 01 00 00 ja 1ad _EIP+0x1ad 9: 72 0a jb 15 _EIP+0x15 b: 3b 44 24 44 cmp0x44(%esp),%eax f: 0f 87 98 01 00 00 ja 1ad _EIP+0x1ad 15: 3b 7c 24 40 cmp0x40(%esp),%edi 19: 75 0a jne25 _EIP+0x25 1b: 3b 74 24 3c cmp0x3c(%esp),%esi 1f: 0f 84 88 01 00 00 je 1ad _EIP+0x1ad 25: 0b 85 30 01 00 00 or 0x130(%ebp),%eax Code; Before first symbol 2b: 88 08 mov%cl,(%eax) 2d: 0f 85 90 01 00 00 jne1c3 _EIP+0x1c3 33: 8b 85 30 01 00 00 mov0x130(%ebp),%eax 39: a8 04 test $0x4,%al 3b: 0f.byte 0xf 3c: 85.byte 0x85 3d: 82(bad) 3e: 01 00 add%eax,(%eax) I removed the Code;... lines as they are just noise, except for the one that points to the current instruction in the middle. Note that it is dereferencing %eax, after just 'or'ing some value into it, which is rather unusual. Now get the md-mod.ko for the kernel you are running. run gdb md-mod.ko and give the command disassemble md_do_sync and look for code at offset 0x629, which is 1577 in decimal. I found a similar kernel to what you are running, and the matching code is 0x55c0 md_do_sync+1485: cmp0x30(%esp),%eax 0x55c4 md_do_sync+1489: ja 0x5749 md_do_sync+1878 0x55ca md_do_sync+1495: cmp0x2c(%esp),%edi 0x55ce md_do_sync+1499: jne0x55da md_do_sync+1511 0x55d0 md_do_sync+1501: cmp0x28(%esp),%esi 0x55d4 md_do_sync+1505: je 0x5749 md_do_sync+1878 0x55da md_do_sync+1511: mov0x130(%ebp),%eax 0x55e0 md_do_sync+1517: test $0x8,%al 0x55e2 md_do_sync+1519: jne0x575f md_do_sync+1900 0x55e8 md_do_sync+1525: mov0x130(%ebp),%eax 0x55ee md_do_sync+1531: test $0x4,%al 0x55f0 md_do_sync+1533: jne0x575f md_do_sync+1900 0x55f6 md_do_sync+1539: mov0x38(%esp),%ecx 0x55fa md_do_sync+1543: mov0x0,%eax - Note the sequence cmp, ja, cmp, jne, cmp, je where the cmp arguments are consecutive 4byte values on the stack (%esp). In the code from your oops, the offsets are 0x44 0x40 0x3c. In the kernel I found they are 0x30 0x2c 0x28. The difference is some subtle difference in the kernel, possibly a different compiler or something. Anyway, your code crashed at 25: 0b 85 30 01 00 00 or 0x130(%ebp),%eax Code; Before first symbol 2b: 88 08 mov%cl,(%eax) The matching code in the kernel I found is 0x55da md_do_sync+1511: mov0x130(%ebp),%eax 0x55e0 md_do_sync+1517: test $0x8,%al Note that you have an 'or', the kernel I found has 'mov'. If we look at the actual byte of code for those two instructions the code that crashed shows the bytes above: 0b 85 30 01 00 00 88 08 if I get the same bytes with gdb: (gdb) x/8b 0x55da 0x55da md_do_sync+1511: 0x8b0x850x300x010x000x00 0xa80x08 (gdb) So what should be 8b has become 0b, and what should be a8 has become 08. If you look for the same data in your md-mod.ko, you might find slightly different details but it is clear to me that the code in memory is bad. Possible you have bad memory, or a bad CPU, or you are overclocking the CPU, or it is getting hot, or something. But you clearly have a hardware error. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm 2.6.4 : How i can check out current status of reshaping ?
On Monday February 4, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:/# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_] unused devices: none ## But how i can see the status of reshaping ? Is it reshaped realy ? or may be just hang up ? or may be mdadm nothing do not give in general ? How long wait when reshaping will finish ? ## The reshape hasn't restarted. Did you do that mdadm -w /dev/md1 like I suggested? If so, what happened? Possibly you tried mounting the filesystem before trying the mdadm -w. There seems to be a bug such that doing this would cause the reshape not to restart, and mdadm -w would not help any more. I suggest you: echo 0 /sys/module/md_mod/parameters/start_ro stop the array mdadm -S /dev/md1 (after unmounting if necessary). Then assemble the array again. Then mdadm -w /dev/md1 just to be sure. If this doesn't work, please report exactly what you did, exactly what message you got and exactly where message appeared in the kernel log. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
mdadm 2.6.4 : How i can check out current status of reshaping ?
Hi linux-raid. on DEBIAN : [EMAIL PROTECTED]:/# mdadm -D /dev/md1 /dev/md1: Version : 00.91.03 Creation Time : Tue Nov 13 18:42:36 2007 Raid Level : raid5 Array Size : 1465159488 (1397.29 GiB 1500.32 GB) Used Dev Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 5 Total Devices : 5 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Mon Feb 4 06:51:47 2008 State : clean, degraded Active Devices : 4 Working Devices : 5 Failed Devices : 0 Spare Devices : 1 Layout : left-symmetric Chunk Size : 64K Delta Devices : 1, (4-5) ^ UUID : 4fbdc8df:07b952cf:7cc6faa0:04676ba5 Events : 0.683598 Number Major Minor RaidDevice State 0 8 320 active sync /dev/sdc 1 8 481 active sync /dev/sdd 2 8 642 active sync /dev/sde 3 8 803 active sync /dev/sdf 4 004 removed 5 8 16- spare /dev/sdb [EMAIL PROTECTED]:/# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdc[0] sdb[5](S) sdf[3] sde[2] sdd[1] 1465159488 blocks super 0.91 level 5, 64k chunk, algorithm 2 [5/4] [_] unused devices: none ## But how i can see the status of reshaping ? Is it reshaped realy ? or may be just hang up ? or may be mdadm nothing do not give in general ? How long wait when reshaping will finish ? ## -- Best regards, Andreas-Sokov - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html