Re: PROBLEM: RAID5 reshape data corruption
- Message from [EMAIL PROTECTED] - Date: Sun, 06 Jan 2008 00:31:46 +0100 From: Nagilum [EMAIL PROTECTED] At the moment I'm thinking about writing a small perl program that will generate me a shell script or makefile containing dd commands that will copy the chunks from the drive to /dev/md0. I don't care if that will be dog slow as long as I get most of my data back. (I'd probably go forward instead of backward to take advantage of the readahead, after I've determined the exact start chunk.) For that I need to know one more thing. Used Dev Size is 488308672k for md0 as well as the disk, 16k chunk size. 488308672/16 = 30519292.00 so the first dd would look like: dd if=/dev/sdg of=/dev/md0 bs=16k count=1 skip=30519291 seek=X The big question now being how to calculate X. Since I have a working testcase I can do a lot of testing before touching the real thing. The formula to get X will probably contain a 5 for the 5(+1) devices the raid spans now, a 4 for the 4(+1) devices the raid spanned before the reshape, a 3 for the device number of the disk that failed and of course the skip/current chunk number. Can you help me come up with it? Thanks again for looking into the whole issue. - End message from [EMAIL PROTECTED] - Ok, the spare time over the weekend allowed me to make some headway. I'm not sure if the attachment will make it through to the ML so I uploaded the perl script to: http://www.nagilum.de/md/rdrep.pl First tests show already promising results although I seem to miss the start of the error corruption. Anyway unlike with the testcase at the real array I have to start after the area that is unreadable. I have already determined that last Friday. Anyway I would appreciate it if someone could have a look over the script. I'll probably change it a little bit and make every other dd run via exec instead of system to use some parallelism. (I guess the overhead for runnung dd will take about as much time as the transfer itself) Thanks again, Alex #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. rdrep.pl Description: Perl program pgpqtVehc384R.pgp Description: PGP Digital Signature
Re: PROBLEM: RAID5 reshape data corruption
- Message from [EMAIL PROTECTED] - Date: Fri, 4 Jan 2008 09:37:24 +1100 From: Neil Brown [EMAIL PROTECTED] Reply-To: Neil Brown [EMAIL PROTECTED] Subject: Re: PROBLEM: RAID5 reshape data corruption To: Nagilum [EMAIL PROTECTED] Cc: linux-raid@vger.kernel.org, Dan Williams [EMAIL PROTECTED], H. Peter Anvin [EMAIL PROTECTED] I'm not just interested in a simple behaviour fix I'm also interested in what actually happens and if possible a repair program for that kind of data corruption. What happens is that when reshape happens while a device is missing, the data on that device should be computed from the other data devices and parity. However because of the above bug, the data is copied into the new layout before the compute is complete. This means that the data that was on that device is really lost beyond recovery. I'm really sorry about that, but there is nothing that can be done to recover the lost data. Thanks a lot Neil! I can confirm your findings, the data in the chunks is the data from the broken device. Now to my particular case: I still have the old disk and I haven't touched the array since. I just run a dd_rescue -r (reverse) on the old disk and as I expected most of it (99%) is still readable. So what I want to do is read the chunks from that disk - starting at the end down to the 4% point where the reshape was interrupted due to the disk read error - and replace the chunks on md0. That should restore most of the data. Now in order to do so I need to know how to calculate the different positions of the chunks. So for the old disk I have: nas:~# mdadm -E /dev/sdg /dev/sdg: Magic : a92b4efc Version : 00.91.00 UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380 Creation Time : Sat Sep 15 21:11:41 2007 Raid Level : raid5 Used Dev Size : 488308672 (465.69 GiB 500.03 GB) Array Size : 2441543360 (2328.44 GiB 2500.14 GB) Raid Devices : 6 Total Devices : 7 Preferred Minor : 0 Reshape pos'n : 118360960 (112.88 GiB 121.20 GB) Delta Devices : 1 (5-6) Update Time : Fri Nov 23 20:05:50 2007 State : active Active Devices : 6 Working Devices : 7 Failed Devices : 0 Spare Devices : 1 Checksum : 9a8358c4 - correct Events : 0.677965 Layout : left-symmetric Chunk Size : 16K Number Major Minor RaidDevice State this 3 8 963 active sync /dev/sdg 0 0 800 active sync /dev/sda 1 1 8 161 active sync /dev/sdb 2 2 8 322 active sync /dev/sdc 3 3 8 963 active sync /dev/sdg 4 4 8 644 active sync /dev/sde 5 5 8 805 active sync /dev/sdf 6 6 8 486 spare /dev/sdd the current array is: nas:~# mdadm -Q --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Sat Sep 15 21:11:41 2007 Raid Level : raid5 Array Size : 2441543360 (2328.44 GiB 2500.14 GB) Used Dev Size : 488308672 (465.69 GiB 500.03 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sat Jan 5 17:53:54 2008 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 16K UUID : 25da80a6:d56eb9d6:0d7656f3:2f233380 Events : 0.986918 Number Major Minor RaidDevice State 0 800 active sync /dev/sda 1 8 161 active sync /dev/sdb 2 8 322 active sync /dev/sdc 3 8 483 active sync /dev/sdd 4 8 644 active sync /dev/sde 5 8 805 active sync /dev/sdf At the moment I'm thinking about writing a small perl program that will generate me a shell script or makefile containing dd commands that will copy the chunks from the drive to /dev/md0. I don't care if that will be dog slow as long as I get most of my data back. (I'd probably go forward instead of backward to take advantage of the readahead, after I've determined the exact start chunk.) For that I need to know one more thing. Used Dev Size is 488308672k for md0 as well as the disk, 16k chunk size. 488308672/16 = 30519292.00 so the first dd would look like: dd if=/dev/sdg of=/dev/md0 bs=16k count=1 skip=30519291 seek=X The big question now being how to calculate X. Since I have a working testcase I can do a lot of testing before touching the real thing. The formula to get X will probably contain a 5 for the 5(+1) devices the raid spans now, a 4 for the 4(+1) devices the raid spanned before the reshape, a 3 for the device number of the disk that failed
Re: PROBLEM: RAID5 reshape data corruption
On Monday December 31, [EMAIL PROTECTED] wrote: Ok, since my previous thread didn't seem to attract much attention, let me try again. Thank you for your report and your patience. An interrupted RAID5 reshape will cause the md device in question to contain one corrupt chunk per stripe if resumed in the wrong manner. A testcase can be found at http://www.nagilum.de/md/ . The first testcase can be initialized with start.sh the real test can then be run with test.sh. The first testcase also uses dm-crypt and xfs to show the corruption. It looks like this can be fixed with the patch: Signed-off-by: Neil Brown [EMAIL PROTECTED] ### Diffstat output ./drivers/md/raid5.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c --- .prev/drivers/md/raid5.c2008-01-04 09:20:54.0 +1100 +++ ./drivers/md/raid5.c2008-01-04 09:21:05.0 +1100 @@ -2865,7 +2865,7 @@ static void handle_stripe5(struct stripe md_done_sync(conf-mddev, STRIPE_SECTORS, 1); } - if (s.expanding s.locked == 0) + if (s.expanding s.locked == 0 s.req_compute == 0) handle_stripe_expansion(conf, sh, NULL); if (sh-ops.count) With this patch in place, the v2 test only reports errors after the end of the original array, as you would expect (the new space is initialised to 0). I'm not just interested in a simple behaviour fix I'm also interested in what actually happens and if possible a repair program for that kind of data corruption. What happens is that when reshape happens while a device is missing, the data on that device should be computed from the other data devices and parity. However because of the above bug, the data is copied into the new layout before the compute is complete. This means that the data that was on that device is really lost beyond recovery. I'm really sorry about that, but there is nothing that can be done to recover the lost data. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
PROBLEM: RAID5 reshape data corruption
Ok, since my previous thread didn't seem to attract much attention, let me try again. An interrupted RAID5 reshape will cause the md device in question to contain one corrupt chunk per stripe if resumed in the wrong manner. A testcase can be found at http://www.nagilum.de/md/ . The first testcase can be initialized with start.sh the real test can then be run with test.sh. The first testcase also uses dm-crypt and xfs to show the corruption. The second testcase uses nothing but mdadm and testpat - a small program to write and verify a simple testpattern designed to find block data corruptions. Use v2_start.sh v2_test.sh to run. At the end it will point out all the wrong bytes on the md device. I'm not just interested in a simple behaviour fix I'm also interested in what actually happens and if possible a repair program for that kind of data corruption. The bug is architectural agnostic. I first came across it using 2.6.23.8 on amd64 but I verified it on 2.6.23.[8-12] and 2.6.24-rc[5,6] on ppc. Always using mdadm 2.6.4. The situation the bug first showed up was as follows: 1. A RAID5 reshape from 5-6 device was started. 2. After about 4% one disk failed, the machine appeared unresponsive and was rebooted. 3. A spare disk was added to the array. 4. The bad drive was re-added to the array in a different bay and the reshape resumed. 5. The drive failed again but the reshape continued. 6. The reshaped finished and after that the resync. The data after at about 4% on the md device is broken as described above. Kind regards, Alex. #_ __ _ __ http://www.nagilum.org/ \n icq://69646724 # # / |/ /__ _(_) /_ _ [EMAIL PROTECTED] \n +491776461165 # # // _ `/ _ `/ / / // / ' \ Amiga (68k/PPC): AOS/NetBSD/Linux # # /_/|_/\_,_/\_, /_/_/\_,_/_/_/_/ Mac (PPC): MacOS-X / NetBSD /Linux # # /___/ x86: FreeBSD/Linux/Solaris/Win2k ARM9: EPOC EV6 # cakebox.homeunix.net - all the machine one needs.. pgp41FEJ6D5Gy.pgp Description: PGP Digital Signature