Re: raid5 failure
On Fri, 21 Jul 2000, Seth Vidal wrote: Hi, We've been using the sw raid 5 support in linux for about 2-3 months now. We've had good luck with it. Until this week. In this one week we've lost two drives on a 3 drive array. Completely eliminating the array. We have good backups, made everynight, so the data is safe. The problem is this: What could have caused these dual drive failures? One went out on saturday the next on the following friday. Complete death. One drive won't detect anywhere anymore and its been RMA'd the other detects and I'm currently mke2fs -c on the drive. Hey Seth, Sorry to hear about your drive failures. To me, this is something that most people ignore about RAID5: Lose more than one drive and everything is toast. Good reason to have a drive setup as a hot spare, not to mention an extra drive laying on the shelf. And hold your breathe while the array is rebuilding. Bill Carlson Systems Programmer[EMAIL PROTECTED]| Opinions are mine, Virtual Hospital http://www.vh.org/| not my employer's. University of Iowa Hospitals and Clinics|
Re: raid5 failure
Hey Seth, Sorry to hear about your drive failures. To me, this is something that most people ignore about RAID5: Lose more than one drive and everything is toast. Good reason to have a drive setup as a hot spare, not to mention an extra drive laying on the shelf. And hold your breathe while the array is rebuilding. it actually will probably be ok in the long run. we had GOOD backups. it took us less than 6hours to bring the whole thing backup (including rebuilding the machine, restoring from tape and eating dinner) so I feel good about our ability to recover from a disaster. and I'm not afraid of the WORST anymore with raid5. the logs screamed holy hell so I knew what was RIGHT away. so all in all I'm glad we're through it. though a hot spare is in the plans for the next iteration of this array :) -sv
Re: raid5 failure
Could this be a powersupply failure? For example. I've seen 144 V on the motherboard. None of the drives survived as you can expect. It was after a storm with lightnings :-) Szilva -- http://www.wbic.cam.ac.uk/~sj233
AW: raid5 troubles
Hi Danilo, [root@mrqserv2 linux]# mkraid /dev/md0 handling MD device /dev/md0 analyzing super-block disk 0: /dev/sdb1, 4233096kB, raid superblock at 4233024kB disk 1: /dev/sdc1, 4233096kB, raid superblock at 4233024kB disk 2: /dev/sda6, failed mkraid: aborted, see the syslog and /proc/mdstat for potential clues. [root@mrqserv2 linux]# what is wrong here ? Most probably your version of raidtools-0.90 doesn't recognize the failed-disk directive. I use the version from Ingo's page (marked dangerous) http://people.redhat.com/mingo/raid-patches/... and it works fine. Nope, "disk 2: /dev/sda6, failed" shows that mkraid definitely did recognice the failed-disk directive; it's been in the official tools for quite some time, the 19990824 tools definitely support it. no need to go for the "dangerous" tools. Bzw, has anyone checked what's different in this tools package in comparison to the 19990824 release? Bye, Martin PS: the problem was missing kernel patch. "you have moved your mouse, please reboot to make this change take effect" -- Martin Bene vox: +43-316-813824 simon media fax: +43-316-813824-6 Andreas-Hofer-Platz 9 e-mail: [EMAIL PROTECTED] 8010 Graz, Austria -- finger [EMAIL PROTECTED] for PGP public key
Re: raid5 troubles
On Fri, Jul 21, 2000 at 11:17:18AM +0200, Martin Bene wrote: "dangerous" tools. Bzw, has anyone checked what's different in this tools package in comparison to the 19990824 release? yes it raises the max number of devices per superblock!!! -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l.
raid5 failure
Hi, We've been using the sw raid 5 support in linux for about 2-3 months now. We've had good luck with it. Until this week. In this one week we've lost two drives on a 3 drive array. Completely eliminating the array. We have good backups, made everynight, so the data is safe. The problem is this: What could have caused these dual drive failures? One went out on saturday the next on the following friday. Complete death. One drive won't detect anywhere anymore and its been RMA'd the other detects and I'm currently mke2fs -c on the drive. Could this be a powersupply failure? What is it that would cause this sort of fault? Additionally it appears like the ide drive that is the system's os disk is also failing. It gets lba and seek failures reptetively. I'm 99.999% certain this has NOTHING to do with software but I'd love to know at what to point the finger. -sv
slink (old) raid5 recovery
I have a quite large (~490G) raid5 array for slink (originally 2.2.13) and succeed in to shut down incorretly. There was no any hardware failure, but ckraid did not fixed the array. Seems stucked about 10-20% completion (I've tried to run it about 5 times, the completion percentage was different always). To revive it I've upgraded the kernel (2.2.16-AC0 mingo patch + AC pre17p12 patches) and re-compiled the raidtools-0.90. mkraid --upgrade seems fixed everything in one second, although mount complains that "e2fsck recommended". OK, start e2fsck -v /dev/md1 (before mount), and it passes the first two steps. At step 3 (checking directory connectivity if I am right) stops. According to strace the actual steps are two mmap() calls where it is waiting, eating 99.9% of CPU (4 CPU pII DellPowerEdge 6300 with 2G RAM), kswapd does nothing at all. Should I upgrade to the new-style e2fs or was it a silly idea to make a huge raid like this with ext2fs? Szilva -- http://www.wbic.cam.ac.uk/~sj233
raid5 troubles
i want to use my 3 4gb uw harddiscs in a raid5 combination i do the steps discribed in the howto at http://www.linuxdoc.org/HOWTO/Boot+Root+Raid+LILO-4.html my /etc/raidtab raiddev /dev/md0 raid-level 5 nr-raid-disks 3 chunk-size 32 # Spare disks for hot reconstruction nr-spare-disks 0 parity-algorithmleft-symmetric persistent-superblock 1 device /dev/sdb1 raid-disk 0 device /dev/sdc1 raid-disk 1 device /dev/sda6 failed-disk 2 the /dev/sda6 is my rootfilesystem, set to failed like in the howto [root@mrqserv2 linux]# fdisk -ul /dev/sdb Disk /dev/sdb: 255 heads, 63 sectors, 527 cylinders Units = sectors of 1 * 512 bytes Device BootStart EndBlocks Id System /dev/sdb163 8466254 4233096 fd Linux raid autodetect [root@mrqserv2 linux]# fdisk -ul /dev/sdc Disk /dev/sdc: 255 heads, 63 sectors, 555 cylinders Units = sectors of 1 * 512 bytes Device BootStart EndBlocks Id System /dev/sdc163 8466254 4233096 fd Linux raid autodetect /dev/sdc2 8466255 8916074224910 83 Linux [root@mrqserv2 linux]# fdisk -ul /dev/sda Disk /dev/sda: 255 heads, 63 sectors, 553 cylinders Units = sectors of 1 * 512 bytes Device BootStart EndBlocks Id System /dev/sda1 *63 48194 24066 82 Linux swap /dev/sda2 48195 8883944 44178755 Extended /dev/sda5 48258321299136521 82 Linux swap /dev/sda6321363 8883944 4281291 83 Linux but when i do mkraid, i get an error :-((( [root@mrqserv2 linux]# mkraid /dev/md0 handling MD device /dev/md0 analyzing super-block disk 0: /dev/sdb1, 4233096kB, raid superblock at 4233024kB disk 1: /dev/sdc1, 4233096kB, raid superblock at 4233024kB disk 2: /dev/sda6, failed mkraid: aborted, see the syslog and /proc/mdstat for potential clues. [root@mrqserv2 linux]# what is wrong here ? mfg mrq1 -- linux 2.2.16 on a dual pentium iii 500 512mb Total of 2 processors activated (1992.29 BogoMIPS). 8:20pm up 2:07, 1 user, load average: 2.28, 2.19, 2.11
Re: raid5 troubles
On Thu, 20 Jul 2000, Hermann 'mrq1' Gausterer wrote: but when i do mkraid, i get an error :-((( [root@mrqserv2 linux]# mkraid /dev/md0 handling MD device /dev/md0 analyzing super-block disk 0: /dev/sdb1, 4233096kB, raid superblock at 4233024kB disk 1: /dev/sdc1, 4233096kB, raid superblock at 4233024kB disk 2: /dev/sda6, failed mkraid: aborted, see the syslog and /proc/mdstat for potential clues. [root@mrqserv2 linux]# what is wrong here ? Most probably your version of raidtools-0.90 doesn't recognize the failed-disk directive. I use the version from Ingo's page (marked dangerous) http://people.redhat.com/mingo/raid-patches/... and it works fine. D.
Re: raid5 troubles
hi, to everybody on the list, thank you again for your help, it works ! :-)) [mrq1@mrqserv2 mrq1]$ cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] read_ahead 1024 sectors md0 : active raid5 sda1[3] sdc1[1] sdb1[0] 8466048 blocks level 5, 32k chunk, algorithm 2 [3/2] [UU_] recovery=51% finish=16.6min unused devices: none [mrq1@mrqserv2 mrq1]$ mfg mrq1 -- linux 2.2.16 on a dual pentium iii 500 512mb Total of 2 processors activated (1992.29 BogoMIPS). 3:10am up 16 min, 1 user, load average: 2.52, 2.28, 1.35
Re: Trouble in RAID5 - other stuff
Dear Alvin, OnSun, 16 Jul 2000 22:44:54 -0700 (PDT) [EMAIL PROTECTED](Alvin Oga) said: hi "raiders"... i recently changed my raid5 box that was running on debian-2.2 into a new atx case new linux-2.2.16...etc.e.tc... - - its in a 1U raid5 box... worlds first ?? - seems like mkraid does various different things ??? some mkraid works and other versions dont... takamura-san, try a different mkraid... ?? mkraid that came with your linux distro - mkraid from raidtools-0.90 - mkraid from raidtools-0.41.. Thank you for your suggestion, but where can I find 0.41? ftp://ftp.fi.kernel.org/pub/linux/daemons/raid/alpha/ has only 0.90. my raid5 problem could be a flaky sdc drive since it keeps disappearing randomly and forces a degraded mode... ( have no spare drives either ) am doing a tests for... dd if=/Raid5/1_GB_file.txt of=/Raid5/test/1.x dd if=/Raid5/1_GB_file.txt of=/Raid5/test/2.x diff /Raid5/test/1.x /Raid5/test/2.x - if it crashes...at least i can just rebuild the 1U raid5 box Well, when the raid gets to work I'll try this. Takamura Seishi Takamura, Dr.Eng. NTT Cyber Space Laboratories Y517A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan Tel: +81 468 59 2371, Fax: +81 468 59 2829 E-mail: [EMAIL PROTECTED]
Trouble in RAID5
Dear Raid users, I've been using RAID5 system for nearly six months without problem, but recently the machine halted while the rebooting process (displayed message attached below). I tried old valid kernels and some succeeded to boot, but the md device(/dev/md0) was still invisible. According to the dmesg (attached the last of the mail), the kernel looks trying to recognize md device. I tried several old kernels which all had worked fine, but the situation didn't change. There's no /proc/mdstat, and "mkraid --upgrade /dev/md0" failed saying handling MD device /dev/md0 analyzing super-block disk 0: /dev/sda1, 48846847kB, raid superblock at 48846720kB array needs no upgrade mkraid: aborted, see the syslog and /proc/mdstat for potential clues. Is it possible to recover the content of the RAID? (there were about 200G of data..) Or do I have to "really force" mkraid? Any suggestion, information is welcome. Other information (System, raidtab, etc) is also attached below. Best regards, Takamura Seishi Takamura, Dr.Eng. NTT Cyber Space Laboratories Y517A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan Tel: +81 468 59 2371, Fax: +81 468 59 2829 E-mail: [EMAIL PROTECTED] Message (from the display, possibly incorrectly typed): attempt to access beyond end of device 03:07: rw=0, want=2, limit=0 dev 03:07 blksize=1024 blocknr=1 sector=2 size=1024 count=1 EXT2-fs: unable to read superblock attempt to access beyond end of device 03:07: rw=0, want=1, limit=0 dev 03:07 blksize=1024 blocknr=0 sector=0 size=1024 count=1 FAT bread failed attempt to access beyond end of device 03:07: rw=0, want=33, limit=0 dev 03:07 blksize=1024 blocknr=32 sector=64 size=1024 count=1 isofs_read_super: bread failed, dev=03:07, iso_blknum=16, block=32 Kernel panic: VFS: unable to mount root fs on 03:07 System: RedHat 6.1 kernel-2.2.14 + raid-2.2.14-B1 + mypatch1(attached below) raidtools 19990824-0.90 + mypatch2(attached below) CPU Pentium III 600MHz(single) 3 SCSI Cards (Adaptec AHA2940U2W) 24 SCSI HDD Drives (Seagate ST150176LW Barracuda 50.1GB) Each SCSI card has eight HDD's connected RAID is mounted /raid (not /) /etc/raidtab: raiddev /dev/md0 raid-level 5 nr-raid-disks 24 nr-spare-disks 0 chunk-size 32 persistent-superblock 1 parity-algorithmleft-symmetric device /dev/sda1 raid-disk 0 ... device /dev/sdx1 raid-disk 23 mypatch1: (increment disk# limit from 12 to 24, fix integer overflow) --- linux/include/linux/raid/md_p.h~Thu Mar 23 11:23:03 2000 +++ linux/include/linux/raid/md_p.h Thu Mar 23 13:47:20 2000 @@ -65,7 +65,7 @@ #define MD_SB_GENERIC_STATE_WORDS 32 #define MD_SB_GENERIC_WORDS(MD_SB_GENERIC_CONSTANT_WORDS + MD_SB_GENERIC_STATE_WORDS) #define MD_SB_PERSONALITY_WORDS64 -#define MD_SB_DISKS_WORDS 384 +#define MD_SB_DISKS_WORDS 800 #define MD_SB_DESCRIPTOR_WORDS 32 #define MD_SB_RESERVED_WORDS (1024 - MD_SB_GENERIC_WORDS - MD_SB_PERSONALITY_WORDS - MD_SB_DISKS_WORDS - MD_SB_DESCRIPTOR_WORDS) #define MD_SB_EQUAL_WORDS (MD_SB_GENERIC_WORDS + MD_SB_PERSONALITY_WORDS + MD_SB_DISKS_WORDS) --- linux/drivers/block/raid5.c~Thu Mar 23 11:23:03 2000 +++ linux/drivers/block/raid5.c Thu Mar 23 13:43:54 2000 @@ -665,7 +665,7 @@ * Output: index of the data and parity disk, and the sector # in them. */ static inline unsigned long -raid5_compute_sector (int r_sector, unsigned int raid_disks, unsigned int data_disks, +raid5_compute_sector (unsigned long r_sector, unsigned int raid_disks, unsigned int +data_disks, unsigned int * dd_idx, unsigned int * pd_idx, raid5_conf_t *conf) { mypatch2: (increment disk# limit from 12 to 24) --- md-int.h~ Fri Jan 14 15:19:22 2000 +++ md-int.hMon Jan 17 12:29:34 2000 @@ -137,7 +137,7 @@ #define MD_SB_GENERIC_STATE_WORDS 32 #define MD_SB_GENERIC_WORDS(MD_SB_GENERIC_CONSTANT_WORDS + MD_SB_GENERIC_STATE_WORDS) #define MD_SB_PERSONALITY_WORDS64 -#define MD_SB_DISKS_WORDS 800 /* taka, was 384*/ +#define MD_SB_DISKS_WORDS 800 /* taka, was 384 (see +/usr/src/linux/include/linux/raid/md_p.h) */ #define MD_SB_DESCRIPTOR_WORDS 32 #define MD_SB_RESERVED_WORDS (1024 - MD_SB_GENERIC_WORDS - MD_SB_PERSONALITY_WORDS - MD_SB_DISKS_WORDS - MD_SB_DESCRIPTOR_WORDS) #define MD_SB_EQUAL_WORDS (MD_SB_GENERIC_WORDS + MD_SB_PERSONALITY_WORDS + MD_SB_DISKS_WORDS) part of dmesg output: Partition check: sda: sda1 sdb: sdb1 sdc: sdc1 sdd: sdd1 sde: sde1 sdf: sdf1 sdg: sdg1 sdh: sdh1 sdi: sdi1 sdj: sdj1 sdk: sdk1 sdl: sdl1 sdm: sdm1 sdn: sdn1 sdo: sdo1 sdp: sdp1 sdq: sdq1 sdr: sdr1 sds: sds1 sdt: sdt1 sdu: sdu1 sdv: sdv1 sdw: sdw1 sd
Re: Trouble in RAID5 - other stuff
hi "raiders"... i recently changed my raid5 box that was running on debian-2.2 into a new atx case new linux-2.2.16...etc.e.tc... - - its in a 1U raid5 box... worlds first ?? - seems like mkraid does various different things ??? some mkraid works and other versions dont... takamura-san, try a different mkraid... ?? mkraid that came with your linux distro - mkraid from raidtools-0.90 - mkraid from raidtools-0.41.. --- my raid5 problem could be a flaky sdc drive since it keeps disappearing randomly and forces a degraded mode... ( have no spare drives either ) am doing a tests for... dd if=/Raid5/1_GB_file.txt of=/Raid5/test/1.x dd if=/Raid5/1_GB_file.txt of=/Raid5/test/2.x diff /Raid5/test/1.x /Raid5/test/2.x - if it crashes...at least i can just rebuild the 1U raid5 box have fun linux'ing and raiding alvin I've been using RAID5 system for nearly six months without problem, but recently the machine halted while the rebooting process (displayed message attached below). I tried old valid kernels and some succeeded to boot, but the md device(/dev/md0) was still invisible. According to the dmesg (attached the last of the mail), the kernel looks trying to recognize md device. I tried several old kernels which all had worked fine, but the situation didn't change. There's no /proc/mdstat, and "mkraid --upgrade /dev/md0" failed saying handling MD device /dev/md0 analyzing super-block disk 0: /dev/sda1, 48846847kB, raid superblock at 48846720kB array needs no upgrade mkraid: aborted, see the syslog and /proc/mdstat for potential clues. Is it possible to recover the content of the RAID? (there were about 200G of data..) Or do I have to "really force" mkraid? Any suggestion, information is welcome. Other information (System, raidtab, etc) is also attached below. Best regards, Takamura Seishi Takamura, Dr.Eng. NTT Cyber Space Laboratories Y517A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan Tel: +81 468 59 2371, Fax: +81 468 59 2829 E-mail: [EMAIL PROTECTED]
Problems on reintegrating one disk into raid5-array
Hi.. due to a system crash one partition of the raid array has an invalid event counter... so my array runs un degraded mode... but how can I integrate it back to the array??? Where does raid store the superblockinfo?? I tried to remove the partition with fdisk, recreatet it, formated it, dd'ed the first 5% with zeros.. but raid still tells me that the event counters mismatch what can I do now??? thanks... c.u. ..patrick
Re: Problems on reintegrating one disk into raid5-array
Hi, As far as I know "raidhotadd" is what you need. Tamas. On Wed, 12 Jul 2000, Patrick Scharrenberg wrote: Hi.. due to a system crash one partition of the raid array has an invalid event counter... so my array runs un degraded mode... but how can I integrate it back to the array??? Where does raid store the superblockinfo?? I tried to remove the partition with fdisk, recreatet it, formated it, dd'ed the first 5% with zeros.. but raid still tells me that the event counters mismatch what can I do now??? thanks... c.u. ..patrick
AW: RAID5
Hi Peter, I write in english because most of the members in this mailing list don't understand german, I think. I'm also a beginner with Linux and using SUSE 6.4 with no new kernel. Like you I have a lot trouble with the configuration. Maybe you read my messeages which said that "mkraid" doesn't work. It says that my device is mounted that I want to mirror, but that ok, it's the "/" partition and has to be mounted. Do you get an error message from mkraid? Maybe there is a hint in the rdstat(?) file which should contain any error messages when RAID configuration doesn't work. What do you mean when you write "nothing works"? Horst Zymelka -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]Im Auftrag von [EMAIL PROTECTED] Gesendet am: Mittwoch, 12. Juli 2000 17:53 An: [EMAIL PROTECTED] Betreff: RAID5 Hallo liebe Linux- und RAID-Freunde, nach mehrfachem Lesen von man-pages und einigen vergeblichen Versuchen ein RAID5 auf der Basis von drei Festplatten aufzubauen, fühle ich mich durch Ihren Vermerk ermuntert Sie doch anzusprechen. Etwas zu meiner Person: Seit mehreren Jahren befasse ich mich mit der EDV-Technik. Zuerst war es die CAD-, dann die Netzwerktechnik. Schliesslich kamen die Netzwerkprüfungen und der MCSE. Da ich persönlich einen grossen Wert auf die heterogene Netzwerkumgebung lege, ist auch der Gedanke nahe, Linux als ausgereiftes Betriebssystem zu erforschen. Meine ersten Schritte habe ich mit einer Test-Version der SUSE-Distribution von Linux 6.0 gemacht. Linux hat sofort meine Sympathie gewonnen. Jetzt habe ich eine komplette SUSE-Distribution von Linux 6.4 und versuche mich selbst weiterzubilden und meine Kenntnisse über Linux zu vergrössern. Für den Einsatz im Netzwerk ist RAID unumgänglich, also nichts wie RAID mit Linux aufzubauen. Meine Konfiguration ist folgende: 3 SCSI-Festplatten: - 1. Festplatte: 3 primäre Partitionen, 1 erweiterte Partiton, wobei die letzte logische Partition (sda7) mit der Grösse von 485 MB der erste Teil des RAID-Satzes sein soll (der Mount-Point / liegt auf sda3) - 2. Festplatte: 3 primäre Partitionen, wobei die letzte Partition (sdb3) mit der Grösse von 485 MB der zweite Teil des RAID-Satzes sein soll, - 3. Festplatte: 3 primäre Partitionen, wobei die letzte Partition (sdc3) mit der Grösse von 485 MB der dritte Teil des RAID-Satzes sein soll. Um die Möglichkeit des RAID-Aufbaus überhaupt zu haben, habe ich neuen Kernel kompiliert, und den Controller-Treiber eingebunden. Unter /etc habe ich die "raidtab" aufgebaut. Einmal habe ich die RAID-Partitionen mit FAT, einmal mit ext2 formatiert. Nichts ist gelungen. Die Reihenfolge, die ich mir vorstelle ist: - mit "mkraid" den RAID anzulegen, - den Rechner neu zu starten, - den RAID zu formatieren (mit ext2), - mit "raidstart" den RAID zu starten. Der Befehl "mkraid" lehnt es immer ab den RAID anzulegen. Auch das Erzwingen mit "-f" bringt mich nicht weiter. Hier meine grosse Bitte an Sie: Könnten Sie mir per E-Mail einen Hinweis geben wie es richtig zu machen ist? Vielleicht ist es nur noch eine Kleinigkeit ... Auf jeden Fall schon ein "Danke schön"! Mit freundlichen Grüssen Peter Palica Email: [EMAIL PROTECTED]
Re: AW: RAID5
horst- you cannot make a raid array from a mounted disk. mkraid will potentially destroy the file system that is on your disk. if you wish to include you current / in a raid set, then you need to look at the 'failed-disk' directive in your raidtab. more info can be found in jacob's wonderful howto: http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html and the latest raid patches and tools are available at www.redhat.com/~mingo/ make sure you are running those... allan Johnny [EMAIL PROTECTED] said: Hi Peter, I write in english because most of the members in this mailing list don't understand german, I think. I'm also a beginner with Linux and using SUSE 6.4 with no new kernel. Like you I have a lot trouble with the configuration. Maybe you read my messeages which said that "mkraid" doesn't work. It says that my device is mounted that I want to mirror, but that ok, it's the "/" partition and has to be mounted. Do you get an error message from mkraid? Maybe there is a hint in the rdstat(?) file which should contain any error messages when RAID configuration doesn't work. What do you mean when you write "nothing works"? Horst Zymelka -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]Im Auftrag von [EMAIL PROTECTED] Gesendet am: Mittwoch, 12. Juli 2000 17:53 An: [EMAIL PROTECTED] Betreff: RAID5 Hallo liebe Linux- und RAID-Freunde, nach mehrfachem Lesen von man-pages und einigen vergeblichen Versuchen ein RAID5 auf der Basis von drei Festplatten aufzubauen, fühle ich mich durch Ihren Vermerk ermuntert Sie doch anzusprechen. Etwas zu meiner Person: Seit mehreren Jahren befasse ich mich mit der EDV-Technik. Zuerst war es die CAD-, dann die Netzwerktechnik. Schliesslich kamen die Netzwerkprüfungen und der MCSE. Da ich persönlich einen grossen Wert auf die heterogene Netzwerkumgebung lege, ist auch der Gedanke nahe, Linux als ausgereiftes Betriebssystem zu erforschen. Meine ersten Schritte habe ich mit einer Test-Version der SUSE-Distribution von Linux 6.0 gemacht. Linux hat sofort meine Sympathie gewonnen. Jetzt habe ich eine komplette SUSE-Distribution von Linux 6.4 und versuche mich selbst weiterzubilden und meine Kenntnisse über Linux zu vergrössern. Für den Einsatz im Netzwerk ist RAID unumgänglich, also nichts wie RAID mit Linux aufzubauen. Meine Konfiguration ist folgende: 3 SCSI-Festplatten: - 1. Festplatte: 3 primäre Partitionen, 1 erweiterte Partiton, wobei die letzte logische Partition (sda7) mit der Grösse von 485 MB der erste Teil des RAID-Satzes sein soll (der Mount-Point / liegt auf sda3) - 2. Festplatte: 3 primäre Partitionen, wobei die letzte Partition (sdb3) mit der Grösse von 485 MB der zweite Teil des RAID-Satzes sein soll, - 3. Festplatte: 3 primäre Partitionen, wobei die letzte Partition (sdc3) mit der Grösse von 485 MB der dritte Teil des RAID-Satzes sein soll. Um die Möglichkeit des RAID-Aufbaus überhaupt zu haben, habe ich neuen Kernel kompiliert, und den Controller-Treiber eingebunden. Unter /etc habe ich die "raidtab" aufgebaut. Einmal habe ich die RAID-Partitionen mit FAT, einmal mit ext2 formatiert. Nichts ist gelungen. Die Reihenfolge, die ich mir vorstelle ist: - mit "mkraid" den RAID anzulegen, - den Rechner neu zu starten, - den RAID zu formatieren (mit ext2), - mit "raidstart" den RAID zu starten. Der Befehl "mkraid" lehnt es immer ab den RAID anzulegen. Auch das Erzwingen mit "-f" bringt mich nicht weiter. Hier meine grosse Bitte an Sie: Könnten Sie mir per E-Mail einen Hinweis geben wie es richtig zu machen ist? Vielleicht ist es nur noch eine Kleinigkeit ... Auf jeden Fall schon ein "Danke schön"! Mit freundlichen Grüssen Peter Palica Email: [EMAIL PROTECTED]
Re: Problems on reintegrating one disk into raid5-array
I've had this problem. It was due to that disk being damaged. I suggest you do a scan on that disk with the scsi utility of your controller. If it reports bad sectors and such, swap the disk, create the linux raid partition on it and hot swap it in, as described in the HOWTO -- ai http://sefiroth.org On Wed, 12 Jul 2000, Patrick Scharrenberg wrote: Hi.. due to a system crash one partition of the raid array has an invalid event counter... so my array runs un degraded mode... but how can I integrate it back to the array??? Where does raid store the superblockinfo?? I tried to remove the partition with fdisk, recreatet it, formated it, dd'ed the first 5% with zeros.. but raid still tells me that the event counters mismatch what can I do now??? thanks... c.u. ..patrick
Re: RAID5
On Wed, 12 Jul 2000, [EMAIL PROTECTED] wrote: Hallo liebe Linux- und RAID-Freunde, nach mehrfachem Lesen von man-pages und einigen vergeblichen Versuchen ein RAID5 auf der Basis von drei Festplatten aufzubauen, fühle ich mich durch Ihren Vermerk ermuntert Sie doch anzusprechen. Be sure to read the HOWTO as well. http://ostenfeld.dtu.dk/~jakob/Software-RAID.HOWTO/ Etwas zu meiner Person: Seit mehreren Jahren befasse ich mich mit der EDV-Technik. Zuerst war es die CAD-, dann die Netzwerktechnik. Schliesslich kamen die Netzwerkprüfungen und der MCSE. Da ich persönlich einen grossen Wert auf die heterogene Netzwerkumgebung lege, ist auch der Gedanke nahe, Linux als ausgereiftes Betriebssystem zu erforschen. Meine ersten Schritte habe ich mit einer Test-Version der SUSE-Distribution von Linux 6.0 gemacht. Linux hat sofort meine Sympathie gewonnen. Jetzt habe ich eine komplette SUSE-Distribution von Linux 6.4 und versuche mich selbst weiterzubilden und meine Kenntnisse über Linux zu vergrössern. Für den Einsatz im Netzwerk ist RAID unumgänglich, also nichts wie RAID mit Linux aufzubauen. Meine Konfiguration ist folgende: 3 SCSI-Festplatten: - 1. Festplatte: 3 primäre Partitionen, 1 erweiterte Partiton, wobei die letzte logische Partition (sda7) mit der Grösse von 485 MB der erste Teil des RAID-Satzes sein soll (der Mount-Point / liegt auf sda3) Ok. Linux in general doesn't care much whether your partitions are primary or extended though. - 2. Festplatte: 3 primäre Partitionen, wobei die letzte Partition (sdb3) mit der Grösse von 485 MB der zweite Teil des RAID-Satzes sein soll, - 3. Festplatte: 3 primäre Partitionen, wobei die letzte Partition (sdc3) mit der Grösse von 485 MB der dritte Teil des RAID-Satzes sein soll. Um die Möglichkeit des RAID-Aufbaus überhaupt zu haben, habe ich neuen Kernel kompiliert, und den Controller-Treiber eingebunden. Unter /etc habe ich die "raidtab" aufgebaut. Einmal habe ich die RAID-Partitionen mit FAT, einmal mit ext2 formatiert. Nichts ist gelungen. What filesystem you put on your RAID device doesn't matter. If you have a problem with RAID, no filesystem is going to solve it. Die Reihenfolge, die ich mir vorstelle ist: - mit "mkraid" den RAID anzulegen, Does this succeed ? - den Rechner neu zu starten, No need. - den RAID zu formatieren (mit ext2), - mit "raidstart" den RAID zu starten. Oh no, you must start the RAID _before_ you can format it. If the RAID isn't running (eg. it is not present to the system, it doesn not *exist*) then how could you put data on it ? I would recommend that you use the autostart feature and persistent superblocks. That way you won't have to care about starting the array on boot. It just happens. If you read the HOWTO, I'm sure these things are going to be more clear to you. If not, let me know :) Der Befehl "mkraid" lehnt es immer ab den RAID anzulegen. Auch das Erzwingen mit "-f" bringt mich nicht weiter. We need to see your raidtab _and_ the error message from mkraid in order to help. Hier meine grosse Bitte an Sie: Könnten Sie mir per E-Mail einen Hinweis geben wie es richtig zu machen ist? Vielleicht ist es nur noch eine Kleinigkeit ... Auf jeden Fall schon ein "Danke schön"! I hope things clear up after you read the HOWTO. If not, please write again to the list or to me in person. But please, in english:) Your english is better than my german, I promise you:) -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
sw raid5 upgrade
hi ya "raiders" i just upgraded my old sw raid5 on debian-2.2 w/ linux-2.2.10 to linux-2.2.16 w/ the patches from mingo's patch dirs... works good...nice and clean...no problems... good work guys and my (abbreviated) collection of raid stuff... http://www.linux-consulting.com/Raid/Docs c ya alvin
Re: big raid5
At 17:25 Uhr -0700 05.07.2000, Ben wrote: So I can't get your point. Well, unfortunately we're using IDE drives, each connected to an IDE/SCSI adapter Okay, this wasn't clear. Sorry. Simply test by copying something onto it, sync, work otherwise so the kernel buffers get flushed and read this file. Compare the two. If they are equal, there's no problem. If not... "work otherwise"? I can't parse this sentance, sorry. Do something to get rid of the buffers so the file will be actually read from disk: for example do a dd if=/dev/sda of=/dev/null bs=1024 count=size-of-your-RAM :wq! PoC
Re: big raid5
Hi, Well, unfortunately we're using IDE drives, each connected to an IDE/SCSI adapter, which has an ide interface on one side and a scsi-2 interface on the other. As we're on something of a budget, this is what we have to work with if we're going for storage volume. If you use the same type of IDE-to-SCSI-Adaptors as me, the are either UW-SCSI or U-SCSI and use an ARC760 chip and have AEC-7720UW (e.g.) printed on their PCB. The UW version has a througput of 22..25 MB/s using one IBM DTLA 45GB disks and doesn´t cost that much more than the U version. You maybe should use a newer kernel. I personally still use 2.2.13 (but had slight problems with the versions before) and RAID 0.90 19990824. No, it wasn't even mounted. Maybe you could try setting /proc/.../md/min-speed to a higher value !? Thomas
big raid5
We just made ourselves a raid5 software raid out of 7 60GB drives, using the 2.2.11 kernel, appropriate patches, and the raid 0.90 tools. The drives are all connected on the same SCSI-2 bus (we care about quantity and reliability, not speed), which is obviously not a performance deamon but should work just fine. They problem we have is that it performed like a slug when resyncing - about 4 full days! I understand this is 420GB but still that seems a bit excessive. Also, the percentage till completion reported in /proc/mdstat never rose above 12% - it would always drop back to 0%. But the ETA steadily fell, and now it claims it's done... is it really synced? Has anybody else had these problems?
Re: big raid5
On Wed, 5 Jul 2000, Ben wrote: The drives are all connected on the same SCSI-2 bus (we care about quantity and reliability, not speed), which is obviously not a performance deamon but should work just fine. If you care for reability, you should probably end up in using some sort of hardware array instead. For many persons linux raid works reliable and very fine. Also does LVD-SCSI in it's U2W incarnation which is also way faster than simple FAST-SCSI-WIDE (what in fact is the most you can get from vanilla SCSI-2). So I can't get your point. They problem we have is that it performed like a slug when resyncing - about 4 full days! I understand this is 420GB but still 420GB are 430080MB. 4 full days are 345600 seconds. You got an average transfer speed of 1.24MBps which is quite normal on a plain FAST-SCSI (without the wide option) with a bunch of disks sharing this bus and perhaps medium activity on the array or other devices on the bus (resync then stops, because resync is done in the background). Also, the percentage till completion reported in /proc/mdstat never rose above 12% - it would always drop back to 0%. Was there activity on the array? the ETA steadily fell, and now it claims it's done... is it really synced? If there's no activity on the disks, it is done. Simply test by copying something onto it, sync, work otherwise so the kernel buffers get flushed and read this file. Compare the two. If they are equal, there's no problem. If not... :wq! PoC
Re: big raid5
If you care for reability, you should probably end up in using some sort of hardware array instead. For many persons linux raid works reliable and very fine. Also does LVD-SCSI in it's U2W incarnation which is also way faster than simple FAST-SCSI-WIDE (what in fact is the most you can get from vanilla SCSI-2). So I can't get your point. Well, unfortunately we're using IDE drives, each connected to an IDE/SCSI adapter, which has an ide interface on one side and a scsi-2 interface on the other. As we're on something of a budget, this is what we have to work with if we're going for storage volume. Also, the percentage till completion reported in /proc/mdstat never rose above 12% - it would always drop back to 0%. Was there activity on the array? No, it wasn't even mounted. the ETA steadily fell, and now it claims it's done... is it really synced? If there's no activity on the disks, it is done. Simply test by copying something onto it, sync, work otherwise so the kernel buffers get flushed and read this file. Compare the two. If they are equal, there's no problem. If not... "work otherwise"? I can't parse this sentance, sorry.
Easy way to convert RAID5 to RAID0?
I find that my RAID5 array is just too slow for my DB application. I have a large number of DB files on this array. I would like to convert to RAID0, and I can back up my files, but I was wondering if there is a way to convert without reformatting? Dave
Re: Easy way to convert RAID5 to RAID0?
[[EMAIL PROTECTED]] I find that my RAID5 array is just too slow for my DB application. I have a large number of DB files on this array. I would like to convert to RAID0, and I can back up my files, but I was wondering if there is a way to convert without reformatting? Not currently, although it may be worth reconsidering a conversion from 5 - 0 if you can alleviate your performance problems with other methods (chunk size, -R stride=, reiserfs, more memory, etc) Just a thought, although for anything OLTP-ish you're going to be so insert- and update-heavy that I'm sure raid5's going to be less than ideal for some performance requirements... Keep in mind that you won't be able to survive through a disk failure like you can now, though (I know you already know this, just want to rehash :) James
Re: Easy way to convert RAID5 to RAID0?
Hi James, thanks for the info. I was wondering if there is a way to convert without reformatting? James Not currently, although it may be worth reconsidering a James conversion from 5 - 0 if you can alleviate your performance James problems with other methods (chunk size, -R stride=, reiserfs, James more memory, etc) OK, I wasn't aware of the chunk size and -R stride= tunings. Where can I read about these? I was also under the impression that reiserfs was not working/stable over software RAID5. Has that changed? James Just a thought, although for anything OLTP-ish you're going to James be so insert- and update-heavy that I'm sure raid5's going to James be less than ideal for some performance requirements... Keep James in mind that you won't be able to survive through a disk James failure like you can now, though (I know you already know this, James just want to rehash :) Yes, I know that. Unfortunately, I'm working on an extremely insert-heavy application (over 100 million records per day). I would really like ReiserFS (due to the large file size as well as for the journaling). I don't see how RAID5 can meet my needs. Dave
Re: Easy way to convert RAID5 to RAID0?
[[EMAIL PROTECTED]] Yes, I know that. Unfortunately, I'm working on an extremely insert-heavy application (over 100 million records per day). I would really like ReiserFS (due to the large file size as well as for the journaling). I don't see how RAID5 can meet my needs. FWIW, ReiserFS won't get you much unless there are large numbers of files involved. I run s/w raid0 over h/w raid5 with ext2 specifically because it's faster for my situation with relatively low file counts (about 100 files per directory). James
autostart with raid5 over raid0?
Hi all, I've been using raid5 with auto-detection for over a year without problems. Everything including the root fs is on raid5, the machine boots from floppy. I now want to rearrange the disks in raid0 arrays, and make a raid5 of these. Will auto-detection/autostart work in this case? It should in theory...
RE: autostart with raid5 over raid0?
-Original Message- From: Carlos Carvalho [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 21, 2000 2:19 PM To: [EMAIL PROTECTED] Subject: autostart with raid5 over raid0? Hi all, I've been using raid5 with auto-detection for over a year without problems. Everything including the root fs is on raid5, the machine boots from floppy. I now want to rearrange the disks in raid0 arrays, and make a raid5 of these. Will auto-detection/autostart work in this case? It should in theory... Nope. RAID code doesn't support layering of RAID right now. There was a special case for RAID 1 over 0 (or the other way around?), but it turns out that it didn't quite work properly. So not only will autodetect not work correctly, it won't work at all. :-( I don't know what the plans are for this in 2.4, but it would definately be cool. Greg
How to shutdown properly for Software Raid5 on RH6.2
Hi, How to shutdown a computer properly so that the raid5 will sync properly during shutdown? Leng Wee
Re: bonnie++ for RAID5 performance statistics
James Manning [EMAIL PROTECTED] wrote: [Gregory Leblanc] [root@bod tiobench-0.3.1]# ./tiobench.pl --dir /raid5 No size specified, using 200 MB Size is MB, BlkSz is Bytes, Read, Write, and Seeks are MB/sec Try making the size at least double that of ram. Actually, I do exactly that, clamping at 200MB and 2000MB currently. I usually use 10x (e.g. 2 GB or a bit less for 256 MB memory). Makes the cache involved at max 10%.
RE: bonnie++ for RAID5 performance statistics
-Original Message- From: Darren Evans [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 07, 2000 3:02 AM To: [EMAIL PROTECTED] Subject: bonnie++ for RAID5 performance statistics I guess this kind of thing would be great to be detailed in the FAQ. Did you try reading the archives for this list, or the benchmarking HOWTO? Anyone care to swap statistics so I know how valid these are. This is with an Adaptec AIC-7895 Ultra SCSI host adapter. Is this good, reasonable or bad timing? Impossible to tell, since we only know the adapter. How many disks, what sort of configuration, what processor/ram? Without those, you can't even guess at how the performance compares. You should also check out tiobench if you're doing multi-disk things, since it does a pretty darn good job of threading, which takes better advantage of RAID. tiobench.sourceforge.net, I think. One other thing, I find it easier to read things if your mail program doesn't wrap lines like that. If you can't modify it, attachments are good for me. Later! Greg
RE: bonnie++ for RAID5 performance statistics
-Original Message- From: Darren Evans [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 08, 2000 2:16 AM To: Gregory Leblanc Cc: [EMAIL PROTECTED] Subject: RE: bonnie++ for RAID5 performance statistics Hi Greg, Yeah I know sorry about the mail line wrap thing I only noticed after I had sent the email. 4 SCSI disks 40mb/s synchronous SCSI config, 2 Intel P500's and 256mb RAM, Redhat 6.2, raid0145-19990824-2.2.11, raidtools-19990824-0.90.tar.gz and kernel 2.2.13 SMP. [root@bod tiobench-0.3.1]# ./tiobench.pl --dir /raid5 No size specified, using 200 MB Size is MB, BlkSz is Bytes, Read, Write, and Seeks are MB/sec Try making the size at least double that of ram. This helps to eliminate the effects of caching to ram (I used to use 3x ram size, but my RAID sets aren't big enough for that anymore). The other thing to look at is the number of runs. It takes a fair bit of time to figure out what a reasonable number is to ensure consistent results. I've found that between 4 and 6 gets me stable numbers. [snip] Options ... Run #1: ./tiotest -t 2 -f 100 -r 2000 -b 4096 -d /raid5 -T Is that enough to go on? Thanks for the lead on tiobench. Not sure what you're asking, can you elaborate? Greg
RE: bonnie++ for RAID5 performance statistics
Hi Greg, Yeah I know sorry about the mail line wrap thing I only noticed after I had sent the email. 4 SCSI disks 40mb/s synchronous SCSI config, 2 Intel P500's and 256mb RAM, Redhat 6.2, raid0145-19990824-2.2.11, raidtools-19990824-0.90.tar.gz and kernel 2.2.13 SMP. [root@bod tiobench-0.3.1]# ./tiobench.pl --dir /raid5 No size specified, using 200 MB Size is MB, BlkSz is Bytes, Read, Write, and Seeks are MB/sec File Block Num Seq ReadRand Read Seq Write Rand Write DirSize Size Thr Rate (CPU%) Rate (CPU%) Rate (CPU%) Rate (CPU%) --- -- --- --- --- --- --- --- /raid5 20040961 27.96 19.4% 0.794 1.01% 12.93 12.6% 0.877 1.62% /raid5 20040962 21.37 15.8% 0.991 1.11% 11.46 17.1% 0.801 1.71% /raid5 20040964 18.23 13.7% 1.153 1.20% 11.00 19.4% 0.777 2.00% /raid5 20040968 15.70 15.0% 1.306 1.46% 10.62 20.4% 0.768 2.27% Options ... Run #1: ./tiotest -t 2 -f 100 -r 2000 -b 4096 -d /raid5 -T Is that enough to go on? Thanks for the lead on tiobench. Darren -Original Message- From: Gregory Leblanc [mailto:[EMAIL PROTECTED]] Sent: Thursday, June 08, 2000 3:29 AM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: RE: bonnie++ for RAID5 performance statistics -Original Message- From: Darren Evans [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 07, 2000 3:02 AM To: [EMAIL PROTECTED] Subject: bonnie++ for RAID5 performance statistics I guess this kind of thing would be great to be detailed in the FAQ. Did you try reading the archives for this list, or the benchmarking HOWTO? Anyone care to swap statistics so I know how valid these are. This is with an Adaptec AIC-7895 Ultra SCSI host adapter. Is this good, reasonable or bad timing? Impossible to tell, since we only know the adapter. How many disks, what sort of configuration, what processor/ram? Without those, you can't even guess at how the performance compares. You should also check out tiobench if you're doing multi-disk things, since it does a pretty darn good job of threading, which takes better advantage of RAID. tiobench.sourceforge.net, I think. One other thing, I find it easier to read things if your mail program doesn't wrap lines like that. If you can't modify it, attachments are good for me. Later! Greg
Re: bonnie++ for RAID5 performance statistics
[Gregory Leblanc] [root@bod tiobench-0.3.1]# ./tiobench.pl --dir /raid5 No size specified, using 200 MB Size is MB, BlkSz is Bytes, Read, Write, and Seeks are MB/sec Try making the size at least double that of ram. Actually, I do exactly that, clamping at 200MB and 2000MB currently. Next ver will up it to 4xRAM but probably leave the clamps as is. (note: only clamps when size not specified... it always trusts the user) James
RE: bonnie++ for RAID5 performance statistics
-Original Message- From: James Manning [mailto:[EMAIL PROTECTED]] Sent: Friday, June 09, 2000 12:46 PM To: Gregory Leblanc Cc: [EMAIL PROTECTED] Subject: Re: bonnie++ for RAID5 performance statistics [Gregory Leblanc] [root@bod tiobench-0.3.1]# ./tiobench.pl --dir /raid5 No size specified, using 200 MB Size is MB, BlkSz is Bytes, Read, Write, and Seeks are MB/sec Try making the size at least double that of ram. Actually, I do exactly that, clamping at 200MB and 2000MB currently. Next ver will up it to 4xRAM but probably leave the clamps as is. (note: only clamps when size not specified... it always trusts the user) Sounds good, James, but Darren said that his machine had 256MB of ram. I wouldn't have mentioned it, except that it wasn't using enough, I think. On a side note, I think that 3x would be a better number than 4, but maybe it's just me. I've got multiple machines with 256MB of ram, but only 1GB or 2GB RAID sets. 4x ram would overflow the smaller RAID sets. Is anybody else using RAID 1 to get more life out of a bunch of older 1GB disk? Greg
Re: bonnie++ for RAID5 performance statistics
[Gregory Leblanc] Sounds good, James, but Darren said that his machine had 256MB of ram. I wouldn't have mentioned it, except that it wasn't using enough, I think. it tries to stat /proc/kcore currently. no procfs and it'll fail to get a good number... I've thought about other approaches, too, but since this is just a fall-back mechanism when the person doesn't specify a size (like they should), I don't give it much worry. Patches always welcome, though, of course :) a side note, I think that 3x would be a better number than 4, but maybe it's just me. I've got multiple machines with 256MB of ram, but only 1GB or 2GB RAID sets. 4x ram would overflow the smaller RAID sets. I've thought about parsing df output of the $dir and clamping on that, but I haven't gotten around to it yet. Keep in mind, this is still all fall-back... you should be passing the right value in the first place :) James
bonnie++ for RAID5 performance statistics
I guess this kind of thing would be great to be detailed in the FAQ. Anyone care to swap statistics so I know how valid these are. This is with an Adaptec AIC-7895 Ultra SCSI host adapter. Is this good, reasonable or bad timing? [darren@bod bonnie++-1.00a]$ bonnie++ -d /raid5 -m bod -s 90mb Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.00a --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine MB K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP bod 90 7667 85 16816 16 6809 18 9332 90 28841 25 nan -2147483648 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delet e-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 30 157 96 354 99 5788 87 177 98 612 99 188 29 bod,90,7667,85,16816,16,6809,18,9332,90,28841,25, nan,-2147483648,30,157,96,354,99,5788,87,177,98,612,99,188,29 -- Darren Evans Tel: +44(0)20 7700 9960 Systems, Profero Ltd Fax: +44(0)20 7700 9961
Problem with RAID5 - corrupt files
Hallo! I have some problems with my RAID5-system. The setup worked fine, everything is running. But if I copy files to my RAID-drive, the files are corrupt - that means, if I have copied a zipped file to my disks and want to unzip it, I get some CRC-errors. I compared the file on the RAID-drive to the original one - quite some differences! I don't know what I did wrong. This happens with all files I copy to my RAID - and if I use the same zipped file, the thing happens at different position within the file (hope this was understandable ;-) ). Does anybody have an idea or has fixed the same prob? To my setup: * Mandrake Linux 7.0 * Adaptec 39160 U160-SCSI-Controller (parity in BIOS enabled, running U160 speed, SCSI-chain terminated) * 3 Seagate U160 Barracuda harddisks Thanks a lot. Greetings Chris
raid5 didn't reconstruct
Hi there: I installed red-hat6.2 and raidtools are within it. I made a raid5: my /etc/raidtab raiddev /dev/md0raid-level 5nr-raid-disks 3nr-spare-disks 0persistent-superblock 1chunk-size 4 parity-algorithm left-symmetric device /dev/sda7raid-disk 0device /dev/sdb1raid-disk 1 device /dev/sdc1raid-disk 2 it works fine. Then, I wanna try the reconstruction of raid5. So I fdisk /dev/sdc, and kill the partition sdc1. Reboot and see what will happen. But I got the /proc/mdstat Personalities : [raid5]read_ahead 1024 sectorsmd0 : active raid5 sdb1[1] sda7[0] 1720064 blocks level 5, 4k chunk, algorithm 2[3/2] [UU_]unused devices: none I think that means the sdc1 is not in raid array now. So I try to fdisk /dev/sdc and make a partition sdc1 as the same size it was. And I set it to fd type. But after I reboot, the /proc/mdstat was the same as above. So cound you teach me what's the exact way to recontruct the raid5 if one disk failed. Thank you! --- Kevin
Help with RAID5 damage please
Hello, I have all my backup on server with 8 EIDE disk in RAID5 array. This server was cold rebooted and now RAID5 has unconsistent superblock. Is there any posibility to get my data back from RAID ? Thanks, Pavel This is what happens when I try to start raid (raidstart): May 18 16:38:27 backup kernel: (read) hda2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdb2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdc2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdd2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hde2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdf2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdg2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdh2's sb offset: 36363008 [events: 000a] May 18 16:38:27 backup kernel: autorun ... May 18 16:38:27 backup kernel: considering hdh2 ... May 18 16:38:27 backup kernel: adding hdh2 ... May 18 16:38:27 backup kernel: adding hdg2 ... May 18 16:38:27 backup kernel: adding hdf2 ... May 18 16:38:27 backup kernel: adding hde2 ... May 18 16:38:27 backup kernel: adding hdd2 ... May 18 16:38:27 backup kernel: adding hdc2 ... May 18 16:38:27 backup kernel: adding hdb2 ... May 18 16:38:27 backup kernel: adding hda2 ... May 18 16:38:27 backup kernel: created md0 May 18 16:38:27 backup kernel: bindhda2,1 May 18 16:38:27 backup kernel: bindhdb2,2 May 18 16:38:27 backup kernel: bindhdc2,3 May 18 16:38:27 backup kernel: bindhdd2,4 May 18 16:38:27 backup kernel: bindhde2,5 May 18 16:38:27 backup kernel: bindhdf2,6 May 18 16:38:27 backup kernel: bindhdg2,7 May 18 16:38:27 backup kernel: bindhdh2,8 May 18 16:38:27 backup kernel: running: hdh2hdg2hdf2hde2hdd2hdc2hdb2hda2 May 18 16:38:27 backup kernel: now! May 18 16:38:27 backup kernel: hdh2's event counter: 000a May 18 16:38:27 backup kernel: hdg2's event counter: 0008 May 18 16:38:27 backup kernel: hdf2's event counter: 0008 May 18 16:38:27 backup kernel: hde2's event counter: 0008 May 18 16:38:27 backup kernel: hdd2's event counter: 0008 May 18 16:38:27 backup kernel: hdc2's event counter: 0008 May 18 16:38:27 backup kernel: hdb2's event counter: 0008 May 18 16:38:27 backup kernel: hda2's event counter: 0008 May 18 16:38:27 backup kernel: md: superblock update time inconsistency -- using the most recent one May 18 16:38:27 backup kernel: freshest: hdh2 May 18 16:38:27 backup kernel: md: kicking non-fresh hdg2 from array! May 18 16:38:27 backup kernel: unbindhdg2,7 May 18 16:38:27 backup kernel: export_rdev(hdg2) May 18 16:38:27 backup kernel: md: kicking non-fresh hdf2 from array! May 18 16:38:27 backup kernel: unbindhdf2,6 May 18 16:38:27 backup kernel: export_rdev(hdf2) May 18 16:38:27 backup kernel: md: kicking non-fresh hde2 from array! May 18 16:38:27 backup kernel: unbindhde2,5 May 18 16:38:27 backup kernel: export_rdev(hde2) May 18 16:38:27 backup kernel: md: kicking non-fresh hdd2 from array! May 18 16:38:27 backup kernel: unbindhdd2,4 May 18 16:38:27 backup kernel: export_rdev(hdd2) May 18 16:38:27 backup kernel: md: kicking non-fresh hdc2 from array! May 18 16:38:27 backup kernel: unbindhdc2,3 May 18 16:38:27 backup kernel: export_rdev(hdc2) May 18 16:38:27 backup kernel: md: kicking non-fresh hdb2 from array! May 18 16:38:27 backup kernel: unbindhdb2,2 May 18 16:38:27 backup kernel: export_rdev(hdb2) May 18 16:38:27 backup kernel: md: kicking non-fresh hda2 from array! May 18 16:38:27 backup kernel: unbindhda2,1 May 18 16:38:27 backup kernel: export_rdev(hda2) May 18 16:38:27 backup kernel: md0: former device hda2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hdb2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hdc2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hdd2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hde2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hdf2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hdg2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md: md0: raid array is not clean -- starting background reconstruction May 18 16:38:27 backup kernel: md0: max total readahead window set to 896k May 18 16:38:27 backup kernel: md0: 7 data-disks, max readahead per data-disk: 128k May 18 16:38:27 backup kernel: raid5: device hdh2 operational as raid disk 7 May 18 16:38:27 backup kernel: raid5: not enough operational devices for md0 (7/8 failed) May 18 16:38:27 backup kernel: RAID5 conf printout: May 18 16:38:27 backup kernel: --- rd:8 wd:1 fd:7 May 18 16:38:27 backup kernel: disk 0, s:0, o:0, n:0 rd:0 us:1 dev:[dev 00:00] May 18 16:38:27 backup kernel
Re: Help with RAID5 damage please
Your logs indicate that the Raid code decided to look at hdh2 as gospel and dismiss all of the rest. The easiest solution is to temporarily disconnect or disable hdh2, then restart the system. It will accept the data on all of the other drives as OK now and start up the array in "degraded" mode due to the missing hdh2 drive. Shut the system down once more, reattach hdh2 and start it up one more time. This time, all of the drives should be there, but with hdh2 listed as out of step. Now you should be able to do a "raidhotadd /dev/md0 /dev/hdh2" to start reconstruction with hdh2 included. Good luck! Rich B - Original Message - From: "Pavel Kucera" [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, May 18, 2000 9:46 AM Subject: Help with RAID5 damage please Hello, I have all my backup on server with 8 EIDE disk in RAID5 array. This server was cold rebooted and now RAID5 has unconsistent superblock. Is there any posibility to get my data back from RAID ? Thanks, Pavel This is what happens when I try to start raid (raidstart): May 18 16:38:27 backup kernel: (read) hda2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdb2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdc2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdd2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hde2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdf2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdg2's sb offset: 36363008 [events: 0008] May 18 16:38:27 backup kernel: (read) hdh2's sb offset: 36363008 [events: 000a] May 18 16:38:27 backup kernel: autorun ... May 18 16:38:27 backup kernel: considering hdh2 ... May 18 16:38:27 backup kernel: adding hdh2 ... May 18 16:38:27 backup kernel: adding hdg2 ... May 18 16:38:27 backup kernel: adding hdf2 ... May 18 16:38:27 backup kernel: adding hde2 ... May 18 16:38:27 backup kernel: adding hdd2 ... May 18 16:38:27 backup kernel: adding hdc2 ... May 18 16:38:27 backup kernel: adding hdb2 ... May 18 16:38:27 backup kernel: adding hda2 ... May 18 16:38:27 backup kernel: created md0 May 18 16:38:27 backup kernel: bindhda2,1 May 18 16:38:27 backup kernel: bindhdb2,2 May 18 16:38:27 backup kernel: bindhdc2,3 May 18 16:38:27 backup kernel: bindhdd2,4 May 18 16:38:27 backup kernel: bindhde2,5 May 18 16:38:27 backup kernel: bindhdf2,6 May 18 16:38:27 backup kernel: bindhdg2,7 May 18 16:38:27 backup kernel: bindhdh2,8 May 18 16:38:27 backup kernel: running: hdh2hdg2hdf2hde2hdd2hdc2hdb2hda2 May 18 16:38:27 backup kernel: now! May 18 16:38:27 backup kernel: hdh2's event counter: 000a May 18 16:38:27 backup kernel: hdg2's event counter: 0008 May 18 16:38:27 backup kernel: hdf2's event counter: 0008 May 18 16:38:27 backup kernel: hde2's event counter: 0008 May 18 16:38:27 backup kernel: hdd2's event counter: 0008 May 18 16:38:27 backup kernel: hdc2's event counter: 0008 May 18 16:38:27 backup kernel: hdb2's event counter: 0008 May 18 16:38:27 backup kernel: hda2's event counter: 0008 May 18 16:38:27 backup kernel: md: superblock update time inconsistency -- using the most recent one May 18 16:38:27 backup kernel: freshest: hdh2 May 18 16:38:27 backup kernel: md: kicking non-fresh hdg2 from array! May 18 16:38:27 backup kernel: unbindhdg2,7 May 18 16:38:27 backup kernel: export_rdev(hdg2) May 18 16:38:27 backup kernel: md: kicking non-fresh hdf2 from array! May 18 16:38:27 backup kernel: unbindhdf2,6 May 18 16:38:27 backup kernel: export_rdev(hdf2) May 18 16:38:27 backup kernel: md: kicking non-fresh hde2 from array! May 18 16:38:27 backup kernel: unbindhde2,5 May 18 16:38:27 backup kernel: export_rdev(hde2) May 18 16:38:27 backup kernel: md: kicking non-fresh hdd2 from array! May 18 16:38:27 backup kernel: unbindhdd2,4 May 18 16:38:27 backup kernel: export_rdev(hdd2) May 18 16:38:27 backup kernel: md: kicking non-fresh hdc2 from array! May 18 16:38:27 backup kernel: unbindhdc2,3 May 18 16:38:27 backup kernel: export_rdev(hdc2) May 18 16:38:27 backup kernel: md: kicking non-fresh hdb2 from array! May 18 16:38:27 backup kernel: unbindhdb2,2 May 18 16:38:27 backup kernel: export_rdev(hdb2) May 18 16:38:27 backup kernel: md: kicking non-fresh hda2 from array! May 18 16:38:27 backup kernel: unbindhda2,1 May 18 16:38:27 backup kernel: export_rdev(hda2) May 18 16:38:27 backup kernel: md0: former device hda2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hdb2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hdc2 is unavailable, removing from array! May 18 16:38:27 backup kernel: md0: former device hdd2 is unavailable, removing from array! May 18 16:38
Re: Help with RAID5 damage please
Hi there, On Thu, 18 May 2000, Richard Bollinger wrote: May 18 16:38:27 backup kernel: hdh2's event counter: 000a May 18 16:38:27 backup kernel: hdg2's event counter: 0008 May 18 16:38:27 backup kernel: hdf2's event counter: 0008 May 18 16:38:27 backup kernel: hde2's event counter: 0008 May 18 16:38:27 backup kernel: hdd2's event counter: 0008 May 18 16:38:27 backup kernel: hdc2's event counter: 0008 May 18 16:38:27 backup kernel: hdb2's event counter: 0008 May 18 16:38:27 backup kernel: hda2's event counter: 0008 May 18 16:38:27 backup kernel: md: superblock update time inconsistency May 18 16:38:27 backup kernel: unbindhdb2,2 Your logs indicate that the Raid code decided to look at hdh2 as gospel and dismiss all of the rest. The easiest solution is to temporarily disconnect or disable hdh2, then restart the system. It will accept the data on all of the other drives as OK now and start up the array in "degraded" mode due to the missing hdh2 drive. Shut the system down once more, reattach hdh2 and start it up one more time. This time, all of the drives should be there, but with hdh2 listed as out of step. Now you should be able to do a "raidhotadd /dev/md0 /dev/hdh2" to start reconstruction with hdh2 included. I've thought of a problem with this. IIRC, the event counter is incremented once for each successful mount. If you follow this procedure, then the raid driver will increment hd[a-g]'s event counters to 9, and when you boot back up again you'll be in the same situation. The best suggestion I can give is to reboot three times so that the event counters cycle through '9', 'a' and then 'b'. When you reattach hdh2, and reboot the event counters for hd[a-g] will be greater than hdh's, and you can do the raidhotadd then! Hope this helps! Corin /+-\ | Corin Hartland-Swann | Mobile: +44 (0) 79 5854 0027| | Commerce Internet Ltd |Tel: +44 (0) 20 7491 2000| | 22 Cavendish Buildings |Fax: +44 (0) 20 7491 2010| | Gilbert Street | | | Mayfair|Web: http://www.commerce.uk.net/ | | London W1Y 1FE | E-Mail: [EMAIL PROTECTED]| \+-/
RE: How to test raid5 performance best ?
-Original Message- From: octave klaba [mailto:[EMAIL PROTECTED]] Sent: Monday, May 15, 2000 7:25 AM To: Thomas Scholten Cc: Linux Raid Mailingliste Subject: Re: How to test raid5 performance best ? 1. Which tools should i use to test raid-performace ? tiotest. I lost the official url you can download it from http://ftp.ovh.net/tiotest-0.25.tar.gz Try http://tiobench.sourceforge.net. That's a pretty old version, there have been a number of improvements. 2. is it possible to add disks to a raid5 after its been started ? good question ;) I thought there was something to do this, but I'm not sure. I'd think that LVM would be able to make this workable more than just filesystems on disks, but I'm not sure. Grego
How to test raid5 performance best ?
Hello All, some day ago i joined the Software-Raid-Club :) I'm now running a SCSI-Raid5 with 3 2 GB partitions. I choosed a chunk-size of 32 kb. Referring to the FAQ i'm told to experiment to get best performance chunk-size, but i definitly have no good clue how to test performace :-/ so please help me with the following questions: 1. Which tools should i use to test raid-performace ? 2. is it possible to add disks to a raid5 after its been started ? thanks for your help and greetings from good ol' germany Thomas
Re: How to test raid5 performance best ?
Hi, 1. Which tools should i use to test raid-performace ? tiotest. I lost the official url you can download it from http://ftp.ovh.net/tiotest-0.25.tar.gz 2. is it possible to add disks to a raid5 after its been started ? good question ;) -- Amicalement, oCtAvE Connexion terminée par expiration du délai d'attente
[PATCH] 2.2.14-B1 bug in file raid5.c, line 659
Summary: raid5_error needs to handle the first scsi error from a device and do the necessary action, but silently return on subsequent failures. - 3 h/w raid0's in a s/w raid5 - initial resync isn't finished (not important) - scsi error passed up takes out one of the devices bug triggered is when raid5_error is called passing in a device (sde1) that doesn't match against "disk-dev == dev disk-operational" (mainly because the disk-operational was already set to 0 13 seconds previously when the first scsi error was passed back and sde1 matched) Since multiple scsi errors getting passed back from the same failure seems valid (multiple commands had been sent, and each will fail in turn), we should simply handle the first one and have raid5_error exit quietly on the later ones (re-doing the spare code execution could possibly even cause big problems for multiple available spares). Patch attached. Personalities : [raid5] read_ahead 1024 sectors md0 : active raid5 sde1[2](F) sdd1[1] sdc1[0] 177718016 blocks level 5, 4k chunk, algorithm 0 [3/2] [UU_] unused devices: none log attached. James --- linux/drivers/block/raid5.c.origThu Apr 20 11:27:37 2000 +++ linux/drivers/block/raid5.c Thu Apr 20 11:32:16 2000 @@ -611,23 +611,29 @@ PRINTK(("raid5_error called\n")); conf-resync_parity = 0; for (i = 0, disk = conf-disks; i conf-raid_disks; i++, disk++) { - if (disk-dev == dev disk-operational) { - disk-operational = 0; - mark_disk_faulty(sb-disks+disk-number); - mark_disk_nonsync(sb-disks+disk-number); - mark_disk_inactive(sb-disks+disk-number); - sb-active_disks--; - sb-working_disks--; - sb-failed_disks++; - mddev-sb_dirty = 1; - conf-working_disks--; - conf-failed_disks++; - md_wakeup_thread(conf-thread); - printk (KERN_ALERT - "raid5: Disk failure on %s, disabling device." - " Operation continuing on %d devices\n", - partition_name (dev), conf-working_disks); - return -EIO; + /* Did we find the device with the error? */ + if (disk-dev == dev) { + /* Did we handle its failure already? */ + if (disk-operational) { + disk-operational = 0; + mark_disk_faulty(sb-disks+disk-number); + mark_disk_nonsync(sb-disks+disk-number); + mark_disk_inactive(sb-disks+disk-number); + sb-active_disks--; + sb-working_disks--; + sb-failed_disks++; + mddev-sb_dirty = 1; + conf-working_disks--; + conf-failed_disks++; + md_wakeup_thread(conf-thread); + printk (KERN_ALERT + "raid5: Disk failure on %s, disabling device." + " Operation continuing on %d devices\n", + partition_name (dev), conf-working_disks); + return -EIO; + } + /* Don't do anything for failures past the first */ + return 0; } } /* Apr 19 16:02:41 rts-test2 kernel: SCSI disk error : host 3 channel 0 id 2 lun 0 return code = 800 Apr 19 16:02:41 rts-test2 kernel: [valid=0] Info fld=0x0, Current sd08:41: sense key None Apr 19 16:02:41 rts-test2 kernel: scsidisk I/O error: dev 08:41, sector 9296408 Apr 19 16:02:41 rts-test2 kernel: interrupting MD-thread pid 2807 Apr 19 16:02:41 rts-test2 kernel: raid5: parity resync was not fully finished, restarting next time. Apr 19 16:02:41 rts-test2 kernel: raid5: Disk failure on sde1, disabling device. Operation continuing on 2 devices Apr 19 16:02:41 rts-test2 kernel: md: recovery thread got woken up ... Apr 19 16:02:41 rts-test2 kernel: md0: no spare disk to reconstruct array! -- continuing in degraded mode Apr 19 16:02:41 rts-test2 kernel: md: recovery thread finished ... Apr 19 16:02:41 rts-test2 kernel: md: updating md0 RAID superblock on device Apr 19 16:02:41 rts-test2 kernel: (skipping faulty sde1 ) Apr 19 16:02:41 rts-test2 kernel: sdd1 [events: 0002](write) sdd1's sb offset: 88859008 Apr 19 16:02:41 rts-test2 kernel: sdc1 [events: 0002](write) sdc1's sb offset: 88859008 Apr 19 16:02:41 rts-test2 kernel: . Apr 19 16:02:41 rts-test2 kernel:
Raid5 'no spare-disk availabel'
Hello I did remove sdq1 from my 6-device autodetecting kernel 2.2.11-raid5-set. Now there's no way, to bring it back. 'No spare-disk' it says. What do I need to do? I tried to rearange the sequenze in /etc/raidtab (device 0 to the bottom) I added a spare-disk in raidtab. Please see the boot-messages at the bottom. Thanks Urs running: sdr1sdq1sdp1sdl1sdk1sdj1 now! sdr1's event counter: 001a sdq1's event counter: 0008 sdp1's event counter: 001a sdl1's event counter: 001a sdk1's event counter: 001a sdj1's event counter: 001a md: superblock update time inconsistency -- using the most recent one freshest: sdr1 md: kicking non-fresh sdq1 from array! unbindsdq1,5 export_rdev(sdq1) md0: max total readahead window set to 640k md0: 5 data-disks, max readahead per data-disk: 128k raid5: device sdr1 operational as raid disk 1 raid5: device sdp1 operational as raid disk 5 raid5: device sdl1 operational as raid disk 0 raid5: device sdk1 operational as raid disk 2 raid5: device sdj1 operational as raid disk 4 raid5: md0, not all disks are operational -- trying to recover array raid5: allocated 6350kB for md0 raid5: raid level 5 set md0 active with 5 out of 6 devices, algorithm 2 RAID5 conf printout: --- rd:6 wd:5 fd:1 disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sdl1 disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdr1 disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdk1 disk 3, s:0, o:0, n:3 rd:3 us:1 dev:[dev 00:00] disk 4, s:0, o:1, n:4 rd:4 us:1 dev:sdj1 disk 5, s:0, o:1, n:5 rd:5 us:1 dev:sdp1 disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] md: updating md0 RAID superblock on device sdr1 [events: 001b](write) sdr1's sb offset: 2048192 md: recovery thread got woken up ... md0: no spare disk to reconstruct array! -- continuing in degraded mode md: recovery thread finished ... sdp1 [events: 001b](write) sdp1's sb offset: 2048192 sdl1 [events: 001b](write) sdl1's sb offset: 2048192 sdk1 [events: 001b](write) sdk1's sb offset: 2048192 sdj1 [events: 001b](write) sdj1's sb offset: 2048192 . ... autorun DONE. -- This is raidtab -- raiddev /dev/md0 raid-level 5 nr-raid-disks 6 persistent-superblock 1 chunk-size 32 parity-algorithmleft-symmetric # only added later, after building raidset! nr-spare-disks 1 device /dev/sdo1 spare-disk 0 device /dev/sdr1 raid-disk 1 device /dev/sdk1 raid-disk 2 device /dev/sdq1 raid-disk 3 device /dev/sdj1 raid-disk 4 device /dev/sdp1 raid-disk 5 device /dev/sdl1 raid-disk 0
Re: Can't recover raid5 1 disk failure - Could not import [dev21:01]!
On Wed, 12 Apr 2000, Darren Nickerson wrote: So no problem, I have 3 of the four left, right? The array was marked [_UUU] just before I power cycled (the disk was crashing) and since it had been marked faulty, I was able to raidhotremove the underlined one. But now, it won't boot into degraded mode. As I try to boot redhat to single user, I am told: md: could not lock [dev 21:01], zero size? Marking faulty Could not import [dev 21:01]! Autostart [dev 21:01] failed! this happens because raidstart looks at the first entry in /etc/raidtab to start up an array. If that entry is damaged, it does not cycle through the other entries to start up the array. The solution is to permutate the entries in /etc/raidtab. (make sure to restore the original order) if you switch to boot-time autostart then this should not happen, RAID partitions are first collected then started up, and the code should be able to start up the array, no matter which disk got damaged. Ingo
Re: Can't recover raid5 1 disk failure - Could not import [dev 21:01]!
Thanks for the reply Ingo! Great work you're doing, thanks. On Wed, 12 Apr 2000, "Ingo" == Ingo Molnar wrote: Ingo this happens because raidstart looks at the first entry in /etc/raidtab Ingo to start up an array. If that entry is damaged, it does not cycle Ingo through the other entries to start up the array. The solution is to Ingo permutate the entries in /etc/raidtab. (make sure to restore the Ingo original order) I know this now, but I never would have guessed it before. The error was so cryptic!! Ingo if you switch to boot-time autostart then this should not happen, RAID Ingo partitions are first collected then started up, and the code should be Ingo able to start up the array, no matter which disk got damaged. I'm confused. I thought I WAS boot-time autostarting. RedHat's definitely autodetecting and starting the array very early in the boot process, but I'm clearly not entirely properly setup here because my partition types are not 0xfd, which seems to be important for some reason or another. This is what I do see (with no prompting or intervention from me): md.c: sizeof(mdp_super_t) = 4096 Partition check: hda: hda1 hda2 hda5 hda6 hdb: hdb1 hdc: [PTBL] [8191/32/63] hdc1 hdd: hdd1 hdg: [PTBL] [8191/32/63] hdg1 hdi: [PTBL] [8191/32/63] hdi1 hdk: [PTBL] [8191/32/63] hdk1 autodetecting RAID arrays autorun ... ... autorun DONE. VFS: Mounted root (ext2 filesystem) readonly. Freeing unused kernel memory: 56k freed Adding Swap: 265032k swap-space (priority -1) (read) hdk1's sb offset: 33417088 [events: 0081] (read) hdg1's sb offset: 33417088 [events: 0081] (read) hdi1's sb offset: 33417088 [events: 0081] autorun ... considering hdi1 ... adding hdi1 ... adding hdg1 ... adding hdk1 ... created md1 bindhdk1,1 bindhdg1,2 bindhdi1,3 running: hdi1hdg1hdk1 now! hdi1's event counter: 0081 hdg1's event counter: 0081 hdk1's event counter: 0081 md: md1: raid array is not clean -- starting background reconstruction raid5 personality registered md1: max total readahead window set to 384k md1: 3 data-disks, max readahead per data-disk: 128k raid5: device hdi1 operational as raid disk 2 raid5: device hdg1 operational as raid disk 1 raid5: device hdk1 operational as raid disk 3 raid5: md1, not all disks are operational -- trying to recover array raid5: allocated 4248kB for md1 raid5: raid level 5 set md1 active with 3 out of 4 devices, algorithm 2 RAID5 conf printout: --- rd:4 wd:3 fd:1 So, you're saying that the array would have automatically recovered if I had had all five partitions set 0xfd? -Darren
Re: Can't recover raid5 1 disk failure - Could not import [dev21:01]!
On Wed, 12 Apr 2000, Darren Nickerson wrote: I'm confused. I thought I WAS boot-time autostarting. RedHat's definitely autodetecting and starting the array very early in the boot process, but I'm clearly not entirely properly setup here because my partition types are not 0xfd, which seems to be important for some reason or another. [...] well, it was boot-time 'very early' autostarting, but not RAID-autostarting in the classic sense. I think i'll fix raidstart to simply iterate through all available partitions, until one is started up correctly (or until all entries fail). This still doesnt cover all the cases which are covered by the 0xfd method (such as card failure, device reshuffling, etc.), but should cover your case (which is definitely the most common one). So, you're saying that the array would have automatically recovered if I had had all five partitions set 0xfd? yes, definitely. Not marking a partition 0xfd is the more conservative approach from the installer's point of view in a possibly multi-OS environment, you can always mark it 0xfd later on. Ingo
Re: Can't recover raid5 1 disk failure - Could not import [dev 21:01]!
On Wed, 12 Apr 2000, "Ingo" == Ingo Molnar wrote: Ingo well, it was boot-time 'very early' autostarting, but not Ingo RAID-autostarting in the classic sense. Understood. Ingo I think i'll fix raidstart to simply iterate through all available Ingo partitions, until one is started up correctly (or until all entries Ingo fail). Good idea, the present failure mode is extremely frightening if you're unlucky enough to lose the first disk :-( Ingo This still doesnt cover all the cases which are covered by the 0xfd Ingo method (such as card failure, device reshuffling, etc.), but should Ingo cover your case (which is definitely the most common one). Agreed. + So, you're saying that the array would have automatically recovered if I + had had all five partitions set 0xfd? Ingo yes, definitely. Not marking a partition 0xfd is the more conservative Ingo approach from the installer's point of view in a possibly multi-OS Ingo environment, you can always mark it 0xfd later on. Thanks for taking the time to clarify this for me. -Darren
Can't recover raid5 1 disk failure - Could not import [dev 21:01]!
Folks, My array decided to show me what was wrong with it (see my posts earlier today). It was a comprehensive head crash which was slow coming on but which eventually took the disk totally out of action. The Promise card does not even see it . . . :-( So no problem, I have 3 of the four left, right? The array was marked [_UUU] just before I power cycled (the disk was crashing) and since it had been marked faulty, I was able to raidhotremove the underlined one. But now, it won't boot into degraded mode. As I try to boot redhat to single user, I am told: Starting up RAID devices: /dev/md1: Invalid Argument /dev/md1 is not a Raid0 or linear array and dmesg says: md: could not lock [dev 21:01], zero size? Marking faulty Could not import [dev 21:01]! Autostart [dev 21:01] failed! Help! -darren
Re: Can't recover raid5 1 disk failure - **RECOVERED!!!**
On Wed, 12 Apr 2000, "Darren" == Darren Nickerson wrote: Darren But now, it won't boot into degraded mode. As I try to boot redhat to Darren single user, I am told: Darren Starting up RAID devices: /dev/md1: Invalid Argument Darren /dev/md1 is not a Raid0 or linear array Darren and dmesg says: Darren md: could not lock [dev 21:01], zero size? Darren Marking faulty Darren Could not import [dev 21:01]! Darren Autostart [dev 21:01] failed! Well, thanks to the heroic effort of one subscriber to this list, I have a working array. He shall remain nameless to keep everyone from running to him with their troubles ;-) The fix was to reorder the entries in raidtab, being careful not to change the actual RAID ordering, just the entries which specified them. The explanation as it was given to me: raidstart looks in the raidtab and finds the first device that the raidtab says is in the array. It then reads the superblock there and tells the kernel to import whatever devices that superblock says there are. Now the kernel at boot time finds the first device that has partition type 0xfd, which by coincidence happened to be the first device in your raidtab too. It does the same, and fails miserably just like raidstart. The superblock on the first device was simply screwed up beyond repair, yet had the superblock magic intact, that's why we never saw any reasonable error report. By re-ordering the devices in the raidtab, the kernel was told to try with a (luckily intact) superblock from one of the other disks. A very big and heartfelt thank-you to my saviour ;-) -darren
repartitioning to raid5
Hi, i want to reconfigure my server fairly dramaticly and im trying to work out how i can do it without great pain. I currently have 3 drive of ~ 20GB, i have another 20GB and 6.4GB i want to include in my array. I have about 40Gb of data currently on the drives, about 10GB on a raid0, the rest normal partitions. I want to get my 4 ~20GB drives into a raid5 setup, with 3 data disks using 18GB of each, and one parity disk, so i can have about 54GB of storage I will use about 2GB from each of the 5 drives for a raid0 And whats left over for the linux system Is it possible to integrate a disk with data already on it into a raid5 array? I was hoping there may be some trick with marking the disk as bad, spare or whatever to enable the data to live through the process. So of a total storage of about 86GB raid5 will use 72 GB, 18GB of my data can be on one of the raid5 disks(?), i can move upto 10Gb of data to anywhere that is not going to be a raid5, i can find a few GB of space via the network, and backup whats left on cds. I guess it would be easier to go and buy a tape backup, but im not a big fan of them, and dont know much about them either. Any ideas, comments advice ? Thanks Glenn McGrath
Re: Raid5 with two failed disks?
Its a nice complicated case of semaphores in threaded (multi process?) systems ... ... one system needs to be aware that the other system isn't ready yet, without causing incompatibilities. With RAID, would it be possible for the MD driver to actually accept the mount request but halt the process until the driver was ready to actually give data? Jakob Østergaard wrote: I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? As others already pointed out, this doesn't make sense. The boot sequence uses the mount command to mount your fs, and mount doesn't know that your md device is in any way different from other block devices. Only if the md device doesn't start, the mount program will be unable to request the kernel to mount the device. We definitely need log output in order to tell what happened and why. -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...: -- _/~-=##=-~\_ -=+0+=- Michael T. Babcock -=+0+=- ~\_-=##=-_/~ http://www.linuxsupportline.com/~pgp/ ICQ: 4835018
Re: Raid5 with two failed disks?
On Mon, 03 Apr 2000, Rainer Mager wrote: Hi all, I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? As others already pointed out, this doesn't make sense. The boot sequence uses the mount command to mount your fs, and mount doesn't know that your md device is in any way different from other block devices. Only if the md device doesn't start, the mount program will be unable to request the kernel to mount the device. We definitely need log output in order to tell what happened and why. -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
Adding a spare-disk to a RAID5 array?
I've found some cash, and want to add a spare disk to our raid5 array for added redundancy. Can this be done? It is a matter of 1. raidstop 2. add spare to raidtab 3. raidhotadd spare or is it more a matter of 1. raidstop 2. cry 3. mkraid with the "I really mean it" 4. restore data from backup Thanks! -Darren
Re: Adding a spare-disk to a RAID5 array?
On Tue, 4 Apr 2000, "Gregory" == Gregory Leblanc wrote: + I've found some cash, and want to add a spare disk to our + raid5 array for + added redundancy. + Can this be done? It is a matter of + 1. raidstop + 2. add spare to raidtab + 3. raidhotadd spare Gregory This gives you a hot spare, is that what you want? Yes, I want to add a spare disk. I added the following to my /etc/raidtab: device /dev/hdd1 spare-disk 0 and changed nr-spare-disks from 0 to 1. Then I did a raidhotadd. The disk was larger than the rest by a bit, but I figured that extra space would just be wasted. The disk seemed to add in just fine, but mdstat looked like: [root@osmin /root]# cat /proc/mdstat Personalities : [raid5] read_ahead 1024 sectors md1 : active raid5 hdd1[4] hdk1[3] hdi1[2] hdg1[1] hde1[0] 100251264 blocks level 5, 32k chunk, algorithm 2 [4/4] [] unused devices: none The 4/4 and unused devices none left me wondering if my spare was recognised. I mean I see it in the md1 line above, but did it add as a spare? Anyway, because of the as yet unexplained and very scary: raid5: bug: stripe-bh_new[2], sector 9846456 exists I had done this all in single-user mode, so I had no logs to examine. After my first multi-user boot, I see the following: raid5 personality registered md1: max total readahead window set to 384k md1: 3 data-disks, max readahead per data-disk: 128k raid5: spare disk hdd1 raid5: device hdk1 operational as raid disk 3 raid5: device hdi1 operational as raid disk 2 raid5: device hdg1 operational as raid disk 1 raid5: device hde1 operational as raid disk 0 raid5: allocated 4248kB for md1 raid5: raid level 5 set md1 active with 4 out of 4 devices, algorithm 2 which seems to indicate that everything is kosher. Still not sure about the partition size mismatch, guess I'll find out when one trips and the reconstruction begins. + or is it more a matter of + 1. raidstop + 2. cry + 3. mkraid with the "I really mean it" + 4. restore data from backup Gregory This lets you expand the capacity of the RAID, is that what you want? Gregory Greg No, I definitely wanted a spare. Thanks for the reply! -Darren
Re: Raid5 with two failed disks?
On Sun, 02 Apr 2000, Marc Haber wrote: On Sat, 1 Apr 2000 12:44:49 +0200, you wrote: It _is_ in the docs. Which docs do you refer to? I must have missed this. Section 6.1 in http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/ Didn't you actually mention it yourself ? :) (don't remember - someone mentioned it at least...) -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
Re: Raid5 with two failed disks?
On Sun, 2 Apr 2000 15:28:28 +0200, you wrote: On Sun, 02 Apr 2000, Marc Haber wrote: On Sat, 1 Apr 2000 12:44:49 +0200, you wrote: It _is_ in the docs. Which docs do you refer to? I must have missed this. Section 6.1 in http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/ Didn't you actually mention it yourself ? :) Yes, I did. However, I'd add a sentence mentioning that in this case mkraid probably won't be destructive to the HOWTO. After the mkraid warning, I aborted the procedure and started asking. I think this should be avoided in the future. Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
Re: Raid5 with two failed disks?
On Sun, 02 Apr 2000, Marc Haber wrote: [snip] Yes, I did. However, I'd add a sentence mentioning that in this case mkraid probably won't be destructive to the HOWTO. After the mkraid warning, I aborted the procedure and started asking. I think this should be avoided in the future. I have added this to my FIX file for the next revision of the HOWTO. Thanks, -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
RE: Raid5 with two failed disks?
Hi all, I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? Thanks in advance, --Rainer
RE: Raid5 with two failed disks?
On Mon, 3 Apr 2000, Rainer Mager wrote: I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? Whether or not the array is in sync should not make a difference to the boot process. I have both raid1 and raid 5 systems that run root raid and will boot quite nicely and rsync automatically after a "dumb" shutdown that leaves them out of sync. Do you have your kernel built for auto raid start?? and partitions marked "fd" ? You can reconstruct you existing array by booting with a kernel that supports raid and with the raid tools on the rescue system. Do it all the time. Michael
RE: Raid5 with two failed disks?
Hmm, well, I'm certainly not positive why it wouldn't boot and I don't have the logs in front of me, but I do remember it saying that it couldn't mount /dev/md1 and therefore had a panic during boot. My solution was to specify the root device as /dev/sda1 instead of the configured /dev/md1 from the lilo prompt. The disk is marked to auto raid start and marked as fd. And, it booted just fine until the "dumb" shutdown. As for a rescue disk I'll put one together. Thanks for the advice. --Rainer -Original Message- From: Michael Robinton [mailto:[EMAIL PROTECTED]] Sent: Monday, April 03, 2000 8:50 AM To: Rainer Mager Cc: Jakob Ostergaard; [EMAIL PROTECTED] Subject: RE: Raid5 with two failed disks? Whether or not the array is in sync should not make a difference to the boot process. I have both raid1 and raid 5 systems that run root raid and will boot quite nicely and rsync automatically after a "dumb" shutdown that leaves them out of sync. Do you have your kernel built for auto raid start?? and partitions marked "fd" ? You can reconstruct you existing array by booting with a kernel that supports raid and with the raid tools on the rescue system. Do it all the time. Michael
RE: Raid5 with two failed disks?
Hmm, well, I'm certainly not positive why it wouldn't boot and I don't have the logs in front of me, but I do remember it saying that it couldn't mount /dev/md1 and therefore had a panic during boot. My solution was to specify the root device as /dev/sda1 instead of the configured /dev/md1 from the lilo prompt. Hmm the only time I've seen this message has been when using initrd with an out of sync /dev/md or when the raidtab in the initrd was bad or missing. This was without autostart. Michael The disk is marked to auto raid start and marked as fd. And, it booted just fine until the "dumb" shutdown. As for a rescue disk I'll put one together. Thanks for the advice. --Rainer -Original Message- From: Michael Robinton [mailto:[EMAIL PROTECTED]] Sent: Monday, April 03, 2000 8:50 AM To: Rainer Mager Cc: Jakob Ostergaard; [EMAIL PROTECTED] Subject: RE: Raid5 with two failed disks? Whether or not the array is in sync should not make a difference to the boot process. I have both raid1 and raid 5 systems that run root raid and will boot quite nicely and rsync automatically after a "dumb" shutdown that leaves them out of sync. Do you have your kernel built for auto raid start?? and partitions marked "fd" ? You can reconstruct you existing array by booting with a kernel that supports raid and with the raid tools on the rescue system. Do it all the time. Michael
Re: Raid5 with two failed disks?
On Fri, 31 Mar 2000, Marc Haber wrote: On Thu, 30 Mar 2000 09:20:57 +0200, you wrote: At 02:16 30.03.00, you wrote: Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system hung, no reaction, except ping from the host, nothing to see on the monitor. I rebooted the system and it told me, 2 out of 4 disks were out of sync. 2 Disks have an event counter of 0062, the two others 0064. I hope, that there is a way to fix this. I searched through the mailing-list and found one thread, but it did not help me. Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures". You can recreate the superblocks of the raid disks using mkraid; I had that problem a week ago and chickened out after mkraid told me it would destroy my array. If, in this situation, destruction doesn't happen, this should be mentioned in the docs. It _is_ in the docs. But the message from the mkraid tool is still sane, because it actually _will_ destroy your data *if you do not know what you are doing*. So, for the average Joe-user just playing with his tools as root (*ouch!*), this message is a life saver. For people who actually need to re-write the superblocks for good reasons, well they have read the docs so they know the message doesn't apply to them - if they don't make mistakes. mkraid'ing an existing array is inherently dangerous if you're not careful and know what you're doing. It's perfectly safe otherwise. Having the tool tell the user that ``here be dragons'' is perfectly sensible IMHO. -- : [EMAIL PROTECTED] : And I see the elder races, : :.: putrid forms of man: : Jakob Østergaard : See him rise and claim the earth, : :OZ9ABN : his downfall is at hand. : :.:{Konkhra}...:
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000 10:17:06 -0500, you wrote: On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote: I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). You just can't do that with RAID5. I seem to remember that there's a RAID 6 or 7 that handles 2 disk failures (multiple parity devices or something like that.) You can optionally do RAID 5+1 where you mirror partitions and then stripe across them ala RAID 0+1. You'd have to lose 4 disks minimally before the array goes offline. How about a RAID 5 with a single spare disk? You are dead if two disks fail within the time it takes to resync, though. If you have n spare disks, you can survive n+1 disks failing, provided they don't fail at once. Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000 09:20:57 +0200, you wrote: At 02:16 30.03.00, you wrote: Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system hung, no reaction, except ping from the host, nothing to see on the monitor. I rebooted the system and it told me, 2 out of 4 disks were out of sync. 2 Disks have an event counter of 0062, the two others 0064. I hope, that there is a way to fix this. I searched through the mailing-list and found one thread, but it did not help me. Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures". You can recreate the superblocks of the raid disks using mkraid; I had that problem a week ago and chickened out after mkraid told me it would destroy my array. If, in this situation, destruction doesn't happen, this should be mentioned in the docs. Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000, Martin Bene wrote: At 02:16 30.03.00, you wrote: Hi... I have a Raid5 Array, using 4 IDE HDs. A few days ago, the system hung, no reaction, except ping from the host, nothing to see on the monitor. I rebooted the system and it told me, 2 out of 4 disks were out of sync. 2 Disks have an event counter of 0062, the two others 0064. I hope, that there is a way to fix this. I searched through the mailing-list and found one thread, but it did not help me. Yes I do. Check Jakobs Raid howto, section "recovering from multiple failures". You can recreate the superblocks of the raid disks using mkraid; if you explicitly mark one disk as failed in the raidtab, no automatic resync is started, so you get to check if all works and perhaps change something and retry. Hey all, I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). Three words: Net block device Bill Carlson Systems Programmer[EMAIL PROTECTED]| Opinions are mine, Virtual Hospital http://www.vh.org/| not my employer's. University of Iowa Hospitals and Clinics|
Re: Raid5 with two failed disks?
On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote: I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). You just can't do that with RAID5. I seem to remember that there's a RAID 6 or 7 that handles 2 disk failures (multiple parity devices or something like that.) You can optionally do RAID 5+1 where you mirror partitions and then stripe across them ala RAID 0+1. You'd have to lose 4 disks minimally before the array goes offline. -- Randomly Generated Tagline: "There are more ways to reduce friction in metals then there were release dates for Windows 95."- Quantum on TLC
Re: Raid5 with two failed disks?
Thanks to all, it worked!
Re: Raid5 with two failed disks?
Hi Bill, Thursday, March 30, 2000, 4:36:52 PM, you wrote: I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). Raid 6 is exactly what you are looking for. Raid 5 with double parity info. You lose 2 disks of N. http://www.raid5.com/raid6.html Or you may just take Raid 7 http://www.raid5.com/raid7.html ... Sounds great. :-) Sven
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000, Theo Van Dinter wrote: On Thu, Mar 30, 2000 at 08:36:52AM -0600, Bill Carlson wrote: I've been thinking about this for a different project, how bad would it be to setup RAID 5 to allow for 2 (or more) failures in an array? Or is this handled under a different class of RAID (ignoring things like RAID 5 over mirrored disks and such). You just can't do that with RAID5. I seem to remember that there's a RAID 6 or 7 that handles 2 disk failures (multiple parity devices or something like that.) You can optionally do RAID 5+1 where you mirror partitions and then stripe across them ala RAID 0+1. You'd have to lose 4 disks minimally before the array goes offline. 1+5 would still fail on 2 drives if those 2 drives where both from the same RAID 1 set. The wasted space becomes more than N/2, but it might worth it for the HA aspect. RAID 6 looks cleaner, but that would require someone to write an implementation, whereas you could do RAID 15 (51?) now. My thought here is leading to a distributed file system that is server independent, it seems something like that would solve a lot of problems that things like NFS and Coda don't handle. From what I've read GFS is supposed to do this, never hurts to attack a thing from a couple of directions. Use the net block device, RAID 15 and go. Very tempting...:) Bill Carlson Systems Programmer[EMAIL PROTECTED]| Opinions are mine, Virtual Hospital http://www.vh.org/| not my employer's. University of Iowa Hospitals and Clinics|
Re: Raid5 with two failed disks?
On Thu, 30 Mar 2000, Theo Van Dinter wrote: On Thu, Mar 30, 2000 at 02:21:45PM -0600, Bill Carlson wrote: 1+5 would still fail on 2 drives if those 2 drives where both from the same RAID 1 set. The wasted space becomes more than N/2, but it might worth it for the HA aspect. RAID 6 looks cleaner, but that would require someone to write an implementation, whereas you could do RAID 15 (51?) now. 2 drives failing in either RAID 1+5 or 5+1 results in a still available array: Doh, you're right. Thanks for drawing me a picture...:) Bill Carlson Systems Programmer[EMAIL PROTECTED]| Opinions are mine, Virtual Hospital http://www.vh.org/| not my employer's. University of Iowa Hospitals and Clinics|
Re: RAID5 array not coming up after repaired disk
On Sat, 25 Mar 2000 13:10:13 GMT, you wrote: On Fri, 24 Mar 2000 19:36:18 -0500, you wrote: Ok, maybe I'm on crack and need to lay off the pipe a little while, but it appears that sdf7 doesn't have a partition type of "fd" and as such isn't getting considered for inclusion in md0. Nope, all partitions /dev/sd{a,b,c,d,e,f}7 have type fd. After moving sdf7 on the top in the /etc/raidtab, the array came up in degraded mode and I was able to raidhotadd the new disk. I feel that the RAID should have recovered from this failure without requiring manual intervention. Or maybe I did something wrong? Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
Re: RAID5 array not coming up after repaired disk
On Fri, 24 Mar 2000 23:54:03 +0100 (CET), you wrote: On Fri, 24 Mar 2000, Douglas Egan wrote: When this happened to me I had to "raidhotadd" to get it back in the list. What does your /proc/mdstat indicate? Try: raidhotadd /dev/md0 /dev/sde7 I *think* you should 'raidhotremove' the failed disk-partition first, then you can 'raidhotadd' it back. Since the array is not even coming up, I can't raidhotadd/raidhotremove in that situation. /proc/mdstat says that no md devices are active. Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
Re: RAID5 array not coming up after repaired disk
On Fri, 24 Mar 2000 19:36:18 -0500, you wrote: [Marc Haber] |autorun ... |considering sde7 ... |adding sde7 ... |adding sdd7 ... |adding sdc7 ... |adding sdb7 ... |adding sda7 ... |created md0 Ok, maybe I'm on crack and need to lay off the pipe a little while, but it appears that sdf7 doesn't have a partition type of "fd" and as such isn't getting considered for inclusion in md0. Nope, all partitions /dev/sd{a,b,c,d,e,f}7 have type fd. Greetings Marc -- -- !! No courtesy copies, please !! - Marc Haber | " Questions are the | Mailadresse im Header Karlsruhe, Germany | Beginning of Wisdom " | Fon: *49 721 966 32 15 Nordisch by Nature | Lt. Worf, TNG "Rightful Heir" | Fax: *49 721 966 31 29
raid5 and the 2.4 kernel
I tried to upgrade to the 2.4[pre] kernel, but my system hangs when tring to mount the raid5 array. After perusing this list a bit I discovered that raid5 doesn't yet exist for 2.4. Grr. What can I do to boot 2.4?with or without raid5? I tried commenting out the /dev/md line in my /etc/fstab, but the booting kernel still tires to install a raid5 module which doesn't exist. Obviously I could pull all my disks out (SCA so it is easy).I had a funny thought, what would the 5 disks (1GB each) that make up my raid5 array do if I put them back in the wrong order?? I am beginning to think that I should shitcan the 1GB drives and just get a 9 or 18 instead and not run raid. Brynn -- http://triplets.tonkaland.com/ to see my triplets !
Re: RAID5 array not coming up after repaired disk
On Fri, 24 Mar 2000, Douglas Egan wrote: When this happened to me I had to "raidhotadd" to get it back in the list. What does your /proc/mdstat indicate? Try: raidhotadd /dev/md0 /dev/sde7 I *think* you should 'raidhotremove' the failed disk-partition first, then you can 'raidhotadd' it back. D.
Re: RAID5 array not coming up after repaired disk
[Marc Haber] |autorun ... |considering sde7 ... |adding sde7 ... |adding sdd7 ... |adding sdc7 ... |adding sdb7 ... |adding sda7 ... |created md0 Ok, maybe I'm on crack and need to lay off the pipe a little while, but it appears that sdf7 doesn't have a partition type of "fd" and as such isn't getting considered for inclusion in md0. sde7 failure + lack of available sdf7 == 2 "failed" disks == dead raid5 James, waiting for the inevitable smack of being wrong
Re: raid5 checksumming chooses wrong function
On Tue, 14 Mar 2000, Malcolm Beattie wrote: Benchmarking it on a stripeset of 7 x 9GB disks on a Ultra3 bus with one of the Adaptec 7899 channels, it's impressively fast. 81MB/s block reads and 512 seeks/s in bonnie and 50MB/s (500 "netbench" Mbits/sec) running dbench with 128 threads. I've done tiotest runs too and I'll be doing more benchmarks on RAID5 soon. If anyone wants me to post figures, I'll do so. Go ahead and post the tiobenches as well! Cheers, -- _/\ Christian Reis is sometimes [EMAIL PROTECTED] \/~ suicide architect | free software advocate | mountain biker
Re: raid5 on 2.2.14
If the partition types are set to "fd" and you selected the "autorun" config option in block devices (it should be turned on on a rawhide-type kernel), raidstart shouldn't be necessary. (the kernel will have already started the md arrays itself, and the later initscripts raidstart call won't be necessary). Could you paste any "autorun" section of md initialization during boot? does the same problem appear even if you build-in raid5? (first-pass debugging of building-in all raid-related scsi and md modules just to get initrd and module ordering issues out of the way might help) after you boot, does /proc/mdstat show the array? active? if you boot into single-user mode, is the array already active? what's the raidtab contents? Note that as coded, the initscripts should only be attempting to raidstart inactive arrays, but I never checked to make sure that the code actually worked as intended. Given that, I don't really think any of the above really helps, but it's something to throw out there :) I think I figured it out. the drives came off of an older sun. They still had the sun disklabels on them. I never remade the new disk labels before repartitioning. I think when I rebooted the disklabels got in the way of the disks being recognized correctly and it ate the drive. I also found out later than one of the drives I was using had somesort of fairly heinous fault. It would detect but would only occasionally be found by linux. I took it out of the array I think I'm going to rma it. thanks for the help. As an additional question. What sort of numbers should I be seeing (performance wise) on a u2w 4 disk array in raid5. I'm getting about 15MB/s write and 25MB/s read but I wouldn't mind getting those numbers cranked up some. I'm using 32K chunksize with the stride setting correctly set (as per jakob's howto). I'm testing with 500MB/1000MB/1500MB/2000MB bonnie tests. The machine is a k6-2 500 with 128MB of ram Scsi controller is a tekram 390U2W The disks are seagate 7200RPM's baracudda (18 and 9 gig versions) I'm using 1 9gig partition of each of the 18 gig drives and the whole drive on the 2 9 gig drives. thanks -sv
Problems with IDE RAID5
Having some problems setting up IDE RAID5 on Kernel 2.2.14 Kernel: 2.2.14 Patches: ide_2_2_14_2124_patch.gz raid-2_2.14-B1 (Encountered Hunk problems, but I've heard this is normal) Tools: raidtools-19990824-0_90_tar.gz I have three Segate 6GB ATA66 Harddrives, two of which are hanging off of a Promise 66 card and the third is sitting off of the second IDE controller on the mother board. My problem is this: - mkraid --**-force /dev/md0 DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure! handling MD device /dev/md0 analyzing super-block disk 0: /dev/hdc1, 6244528kB, raid superblock at 6244416kB disk 1: /dev/hde1, 6250198kB, raid superblock at 6250112kB disk 2: /dev/hdg1, 6250198kB, raid superblock at 6250112kB mkraid: aborted, see the syslog and /proc/mdstat for potential clues. - syslog reports no errors and my /proc/mdstat just reports: Personalities : [4 raid5] read_ahead not set md0 : inactive md1 : inactive md2 : inactive md3 : inactive I'm not sure where to go from here. Anyone have any ideas? I'll list some additional information below. Oh yeah, I have no problems talking to the drives themselves. I setup a filesystem on each one and mounted it just to see if they worked without a problem. Here is my /etc/raidtab file: raiddev /dev/md0 raid-level 5 nr-raid-disks 3 persistent-superblock 1 chunk-size 128 parity-algorithm left-symmetric device /dev/hdc1 raid-disk 0 device /dev/hde1 raid-disk 1 device /dev/hdg1 raid-disk 2 Here is some information from dmesg: PIIX3: IDE controller on PCI bus 00 dev 39 PIIX3: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xff90-0xff97, BIOS settings: hda:pio, hdb:pio ide1: BM-DMA at 0xff98-0xff9f, BIOS settings: hdc:pio, hdd:pio PDC20262: IDE controller on PCI bus 00 dev 58 PDC20262: not 100% native mode: will probe irqs later PDC20262: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode. ide2: BM-DMA at 0xff00-0xff07, BIOS settings: hde:pio, hdf:pio ide3: BM-DMA at 0xff08-0xff0f, BIOS settings: hdg:pio, hdh:pio hda: WDC AC33100H, ATA DISK drive hdb: FX120T, ATAPI CDROM drive hdc: ST36422A, ATA DISK drive hde: ST36422A, ATA DISK drive hdg: ST36422A, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ide1 at 0x170-0x177,0x376 on irq 15 ide2 at 0xfff0-0xfff7,0xffe6 on irq 11 ide3 at 0xffa8-0xffaf,0xffe2 on irq 11 hda: Disabling (U)DMA for WDC AC33100H hda: DMA disabled hda: WDC AC33100H, 3020MB w/128kB Cache, CHS=767/128/63 hdc: ST36422A, 6103MB w/256kB Cache, CHS=13228/15/63, (U)DMA hde: ST36422A, 6103MB w/256kB Cache, CHS=13228/15/63, UDMA(33) hdg: ST36422A, 6103MB w/256kB Cache, CHS=13228/15/63, UDMA(33) Partition check: hda: hda1 hda2 hda5 hda6 hda7 hdc: [PTBL] [826/240/63] hdc1 hde: hde1 hdg: hdg1
RE: Problems with IDE RAID5
-Original Message- From: root [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 15, 2000 2:11 PM To: [EMAIL PROTECTED] Subject: Problems with IDE RAID5 Having some problems setting up IDE RAID5 on Kernel 2.2.14 Kernel: 2.2.14 Patches: ide_2_2_14_2124_patch.gz raid-2_2.14-B1 (Encountered Hunk problems, but I've heard this is normal) What problems did you have? That patch should apply cleanly, or, at least, it has for me. [snip] PIIX3: IDE controller on PCI bus 00 dev 39 PIIX3: not 100% native mode: will probe irqs later ide0: BM-DMA at 0xff90-0xff97, BIOS settings: hda:pio, hdb:pio ide1: BM-DMA at 0xff98-0xff9f, BIOS settings: hdc:pio, hdd:pio PDC20262: IDE controller on PCI bus 00 dev 58 PDC20262: not 100% native mode: will probe irqs later PDC20262: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode. ide2: BM-DMA at 0xff00-0xff07, BIOS settings: hde:pio, hdf:pio ide3: BM-DMA at 0xff08-0xff0f, BIOS settings: hdg:pio, hdh:pio I don't know what the second controler is, but you're getting all drives in PIO mode. :( I have the same problem on my PIIX3 controllers when running with the IDE patch, but I haven't had time to talk to the authors. My suspicion is that your kernel didn't get everything patched correctly. If you can send the output from that patch command, or try downloading it again (note: download using lynx and 'print' to a file does NOT work). Greg
raid5 on 2.2.14
Hi folks, got a small problem. I'm running redhat 6.1+ (2.2.14-5.0 kernels from rawhide and new raidtools 0.90-6) I've checked and the 2.2.14-5.0 are using the B1 patch from mingo's page. I think the raidtools they are using (mentioned above) are the correct version. Here is what happens: I build a raid 5 array (5 disks) it builds and I can mount and write things to it. I'm not doing root fs on it but I build a new initrd anyway - it builds and includes the raid5 modules - I rerun lilo. I boot. I get raidstart /dev/md0 invalid argument /dev/md0 I've checked the archives and it looks like others have experienced this problem but they've all been related to other issues. is there something i'm missing? I think I've covered all the bases. any ideas? thanks -sv
raid5 checksumming chooses wrong function
When booting my new Dell 4400, pre-installed with Red Hat 6.1, the raid5 checksumming function it chooses is not the fastest. I get: raid5: measuring checksumming speed raid5: KNI detected,... pIII_kni: 1078.611 MB/sec raid5: MMX detected,... pII_mmx : 1304.925 p5_mmx : 1381.125 8 regs : 1029.081 32 regs : 584.073 using fastest function: pIII_kni (1078.611 MB/sec) Is there a good reason for it choosing pIII_kni (in which case the wording of the message "fastest" needs changing) or is it a bug? If noone else sees this, I'll dig in and see if I can fix it: maybe it's because the two sets of function lists are dependent on particular hardware (first for KNI, then for MMX) and something isn't getting zeroed or set to the max in between. Benchmarking it on a stripeset of 7 x 9GB disks on a Ultra3 bus with one of the Adaptec 7899 channels, it's impressively fast. 81MB/s block reads and 512 seeks/s in bonnie and 50MB/s (500 "netbench" Mbits/sec) running dbench with 128 threads. I've done tiotest runs too and I'll be doing more benchmarks on RAID5 soon. If anyone wants me to post figures, I'll do so. --Malcolm -- Malcolm Beattie [EMAIL PROTECTED] Unix Systems Programmer Oxford University Computing Services
Raid5 on root partition and swap
hello! a small question... I need this configuration : 3 disks scsi -- 3 partion ext2 in Raid 5 (mounted on / ) of 4081mb [md0] -- 3 partition swap in Raid 5 of 250mb [md1] it's possible and it is performing? thanks p.s.: sorry for my bad english... i'm italian.. Matteo Sgalaberni [EMAIL PROTECTED] --- www.sgala.com -- Microscottex winpolish 98... clean your window, use it once and throw it away! Postulato di Boling Se sei di buon umore, non ti preoccupare. Ti passerà
Re: Raid5 on root partition and swap
At 15:43 04.03.00, you wrote: 3 disks scsi -- 3 partion ext2 in Raid 5 (mounted on / ) of 4081mb [md0] -- 3 partition swap in Raid 5 of 250mb [md1] it's possible and it is performing? Almost - you can't boot off a raid5; however, you CAN boot off raid1. Also, I wouldn't use raid5 for swap - raid5 gives bad performance for many small updates (which you'll probably have with swap. Onlöy reason for swap on raid is redundancy, I'd sugest the followin configuration: -- 3 partition ext2 in Raid 1 (mounted on /boot), size 3x20 MB -- 3 partition ext2 in Raid 5 (mounted on /), -- 3 partition swap in Raid 1 You'll need to use the raid-enabled lilo provided by redhat 6.1 or patch your lilo sources to enable it to boot off raid1 (patch at ftp://ftp.sime.com/pub/linux/lilo.raid1.gz). The one thing you must be careful about is swap on raid: you MUST NOT have swap active on a raid device WHILE it's RESYNCHRONIZING. So, you should remove swapon -a from your startup files and instead insert a script that waits for resync to finish before turning on swap. Put the script right behind mounting the other filesystems proc - it needs to read /proc/mdstat to get status of raiddevices and /var/log to write status messages. Important - let the script run in the background, it might have to wait quite a long time if you've got many big raid devices). Here's the raidswapon script I'm using (slightly adapted from a script posted on the list - sorry, I don't remember the original author): --- #!/bin/sh # start swap on raid devices RAIDDEVS=`grep swap /etc/fstab | grep /dev/md|cut -d" " -f1|cut -d/ -f3` for raiddev in $RAIDDEVS do while grep $raiddev /proc/mdstat | grep -q "resync=" do echo "`date`: $raiddev resyncing" /var/log/raidswap-status sleep 20 done /sbin/swapon /dev/$raiddev done exit 0 --- Bye, Martin "you have moved your mouse, please reboot to make this change take effect" -- Martin Bene vox: +43-316-813824 simon media fax: +43-316-813824-6 Andreas-Hofer-Platz 9 e-mail: [EMAIL PROTECTED] 8010 Graz, Austria -- finger [EMAIL PROTECTED] for PGP public key
Re: SV: SV: raid5: bug: stripe-bh_new[4]
Johan, Thanks for sending the bulk information about this bug. I have never seen the buffer bug when running local loads, only when using nfs. The bug appears more often when running with 64MB of RAM or less, but has been seen when using more memory. Below is a sample of the errors seen while doing tests. Very interesting is that the same buffer had a problem within 5 minutes with all having different buffers. These all look like potential data corruption since multiple buffers are assigned to the same physical block. I have seen corruption, but the corruption seems to be because of the nfs client, not the server side. Hopefully, this problem will get resolved soon, but it looks like it has been with us for some time now (2 years.) Lance. Mar 1 22:33:10 src@lance-v raid5: bug: stripe-bh_new[2], sector 26272 exists Mar 1 22:33:10 src@lance-v raid5: bh c100b680, bh_new c0594bc0 Mar 1 22:37:32 src@lance-v raid5: bug: stripe-bh_new[2], sector 26272 exists Mar 1 22:37:32 src@lance-v raid5: bh c2d1be60, bh_new c1edcea0 Mar 1 22:42:41 src@lance-v raid5: bug: stripe-bh_new[3], sector 360880 exists Mar 1 22:42:41 src@lance-v raid5: bh c1777840, bh_new c180 Mar 2 03:26:37 src@lance-v raid5: bug: stripe-bh_new[2], sector 1792 exists Mar 2 03:26:37 src@lance-v raid5: bh c0549240, bh_new c0ed30c0 Mar 2 09:07:38 src@lance-v raid5: bug: stripe-bh_new[0], sector 293016 exists Mar 2 09:07:38 src@lance-v raid5: bh c20150c0, bh_new c2015600 Mar 2 14:10:08 src@lance-v raid5: bug: stripe-bh_new[2], sector 42904 exists Mar 2 14:10:08 src@lance-v raid5: bh c084c5c0, bh_new c262b8a0
SV: SV: raid5: bug: stripe-bh_new[4]
This topic has come up a few times. Can you post Gadi's comments to linux-raid Ok, here we go: === From [EMAIL PROTECTED] Fri Feb 27 16:51:19 1998 Date: Fri, 27 Feb 1998 16:51:17 +0300 (IST) From: Gadi Oxman [EMAIL PROTECTED] X-Sender: gadio@localhost To: Richard Jones [EMAIL PROTECTED] cc: [EMAIL PROTECTED] Subject: Re: 2.1.88: rpc.nfsd hanging in __wait_on_page, the saga continues In-Reply-To: [EMAIL PROTECTED] Message-ID: Pine.LNX.3.91.980227161918.210A-10@localhost MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO X-Status: Hi, Hi all: I still have this bug. I've got a few questions and observations that I'd like to pose to cleverer kernel hackers than me on this list. Any suggestions would be a great help. - Every time I've seen the bug, just before rpc.nfsd hangs, I see the following message printed by the kernel: raid5: bug: stripe-bh_new[5], sector XXX exists The dd_idx argument is always 5 in the cases observed so far, but the sector number XXX is different. - This message seems to be provoked by the function add_stripe_bh() in raid5.c. What does this bug message mean? If I see this message, is it possible that it could be triggering the hang in __wait_on_page? This "bug: stripe-bh_new[5], sector XXX exists" is an old bug and has been reported on several RAID-5 configurations, from the very first RAID-5 release. Unfortunately, we don't have a theory which might explain this. Whenever a bh is requested from the MD device by ll_rw_block(), the bh is locked and registered in a sh structure. A sh manages a single row in the RAID array, and contains bh's which share the same sector number, one for each device, such that performing XOR on all the bh's results in 0. The above error message indicates that a buffer which is now being requested from the RAID device was already requested in the near past, is currently locked, registered in a sh, and in the process of being serviced, yet it is being requested again by the high level layers.. The code displays the "bug" error message, but otherwise ignores the condition and overwrites the corresponding position in the sh with the new buffer. If the new buffer is not actually the old buffer, and a process is sleeping on the old buffer, this process will be locked in uninterruptible sleep. It is a rare bug, and can't be easily reproduced. The real cause might not even be in the RAID-5 code (just detected by the RAID-5 code), which makes it even harder to understand. - Sometimes I see this message, and rpc.nfsd doesn't hang (unless it hangs *much* later ...) - In __wait_on_stripe in the same file, should sh-count++; and sh-count-- be replaced by atomic_inc(sh-count) and atomic_dec(sh-count)? If not, why not? Seems like there's a possible race condition here. No, since sh-count is never modified from an interrupt context, only from a process context. However, the code was coded with UP in mind, and I'm afraid that the code might not be SMP safe, and there might be additional issues which are only visible on SMP. SMP is probably not the reason for this bug, however, as it has been reported with the 2.0.x kernels as well, were the kernel is protected on SMP behind a global kernel lock. Gadi From [EMAIL PROTECTED] Tue Apr 21 22:13:53 1998 Date: Tue, 21 Apr 1998 22:13:52 +0400 (IDT) From: Gadi Oxman [EMAIL PROTECTED] To: Richard Jones [EMAIL PROTECTED], Linus Torvalds [EMAIL PROTECTED], MOLNAR Ingo [EMAIL PROTECTED], Miguel de Icaza [EMAIL PROTECTED], "Theodore Y. Ts'o" [EMAIL PROTECTED] Subject: Re: RAID5 (or ext2_truncate?) bug In-Reply-To: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO X-Status: Hi, Gadi: We just got a message from your patch ... raid5: bug: stripe-bh_new[1], sector 13749952 exists raid5: bh cfb37720, bh_new c9119620 The NFS server kept on running this time. Thanks :-) However, of course we have no stack trace, so we don't know if it hit the (possible) bug in ext2_truncate, or if it was caused by something else. Nevertheless, the timing and symptoms were very similar to what happened previously. I can now fairly reliably reproduce the report by: . having lots of NFS traffic . running htmerge (the final stage of the htdig search program) At some point during the htmerge, htmerge spawns off a sort process, and somewhere around this time we hit the bug. Of course, HTDIG takes about 6 hours to run, so it's not a great test ... Rich. Thanks for the report Rich; looks like we are finally about to resolve this long-standing cause of crashes with the RAID-5 driver :-) The fact that bh != bh_new seems to confirm our assumption that at least some cases of that problem are not caused directly by RAID-5 driver -- we are receiving two I/O requests for
Re: raid5: bug: stripe-bh_new[4]
On Sun, 13 Feb 2000, Johan Ekenberg wrote: 0.90. Every server has a Raid-5 array consisting of 5 large IBM scsi disks + one spare. It works like a charm, extremely fast and no trouble at all with How fast are the IBM disks? We're using Quantums here and they suck! Software-RAID during months of heavy usage. However, yesterday I saw some disturbing lines on one of the consoles: raid5: bug: stripe-bh_new[4], sector 8419312 exists raid5: bh e2927d20, bh_new d2f1eaa0 raid5: bug: stripe-bh_new[3], sector 8421384 exists raid5: bh efbc6d40, bh_new e563cc60 How many lines showed up? And did this go on for a while? Cheers, -- _/\ Christian Reis is sometimes [EMAIL PROTECTED] \/~ suicide architect | free software advocate | mountain biker
SV: raid5: bug: stripe-bh_new[4]
0.90. Every server has a Raid-5 array consisting of 5 large IBM scsi disks + one spare. It works like a charm, extremely fast and no trouble at all with How fast are the IBM disks? We're using Quantums here and they suck! I haven't done any actual, documented testing, but they're the fastest SCSI disks that IBM got. I don't have access to the exact modelnumbers right now since I'm not on site, but I believe they have a 2Mb cache and use LVD technology w/ SCSI-UW2 on AIC7890. The machines work mainly as web/mailservers with a lot of dynamic content, so there's a lot of disk activity going on, and my experience is that the RAID5 arrays are Very Very Fast. We have some 50 disks from IBM, and have experienced hardware failure on 2, I think that's pretty normal. With RAID5 and an extra disk in each array, these disk failures are not a big problem. disturbing lines on one of the consoles: raid5: bug: stripe-bh_new[4], sector 8419312 exists How many lines showed up? And did this go on for a while? Only four. I got a very nice reply from Gadi, basically explaining that this is nothing to worry about. Drop me a note if you want to read it, it's a little lengthy so I'm refraining from mailing it to the list without being asked to. Makes good reading though. The machine has continued to work without any problems. We generally have uptimes 1 month, and have not had any stability problems related to software (Linux, RAID etc) for many months. This is under pretty heavy load with some 1200 - 1500 users on each box. We used to experience soft-reboots every 8-10 days or so, but when I increased the maximum number of allowed tasks and the number of tasks reserved for root, this problem disappeared: (from linux/tasks.h: #define NR_TASKS2048 ... #define MIN_TASKS_LEFT_FOR_ROOT 32 ) Regards, /Johan Ekenberg
adding more drives to a RAID5
Is it possible to add more drives to a current raid5 array (software raid) without taking down the array and starting from scratch? gary hostetler
raid5: bug: stripe-bh_new[4]
Hi! I'm running 4 large web/mail servers w/ 2.2.14 + mingo patch and raidtools 0.90. Every server has a Raid-5 array consisting of 5 large IBM scsi disks + one spare. It works like a charm, extremely fast and no trouble at all with Software-RAID during months of heavy usage. However, yesterday I saw some disturbing lines on one of the consoles: raid5: bug: stripe-bh_new[4], sector 8419312 exists raid5: bh e2927d20, bh_new d2f1eaa0 raid5: bug: stripe-bh_new[3], sector 8421384 exists raid5: bh efbc6d40, bh_new e563cc60 What does this mean? Should I be worries? Anything I can do about it? Any additional info I should post about my setup etc? Grateful for the fine work you developers do, and for any advice on the above. Best regards, Johan Ekenberg
Running RAID5 under SuSE 6.3 with raidtools 0.90
Hi, for the past week I've been fighting to get a RAID5 System running under SuSE 6.3 linux. I have finally met success and want to share my Odysee with you and future foolhardy people. First thing: The Raidtools 0.90 HOWTO oversimplifies certain aspects. The one-liner on unpacking and installing the tools and the path are an understatement! I like the way it's done and all, I think a few lines should be added concerning the patch and stuff. I've seen several questions in the mailing list which can be traced to the fact, that they do not have the right patch installed. I'm not a pro at linux things as I've first become aquainted with linux about a year ago and have enough other things going on. Should I mistake in any sense feel free to write me, I take no warranty for anything stated here. SuSE 6.3 has the old mdtools patch installed and on top of that it uses Kernel 2.2.13 with an LVM patch which doesn't make life much easier. To get good results do following: get the most recent patch for raidtools 0.90. For SuSE linux with the kernel 2.2.13 it is raid014519990824.gz (I'm not sure that's quite right, I'm citing from memory). make sure the SuSE kernel sources (not the originals! we want all the other fun patches ;-) and the raidtools 0.90 are installed. they can both be found on the CD's from SuSE. the next step is to apply the patch to the kernel, even though there seems to already be raid support installed. you will receive some warnings, go ahead and ingnore them. aside from that you should have three rejects. One in the asm-ppc directory, which I didn't bother about researching, one in arch/i386/defconfig and one in drivers/block/ll_rw_blk.c The next step is to correct the ll_rw_blk.c file. obviously the LVM patch somehow conflicts with the RAID patch, so don't do anything if you're using a LVM! One has to apply the ll_rw_blk.c patch by hand. find the corresponding *.rej file, then go finding the #ifdef CONFIG_BLK_DEV_MD and patch the sections to fit to the patch. take care, make backups, etc. patch the arch/i386/defconfig by hand (I'm not sure how nessecary that is, but I did it anyways) and then compile the whole thing. one should have a working kernel which can then made a further boot in lilo. boot it and if you did everything right one should find a file called /proc/mdstat, giving a list of personalities listed. Don't be confused, that the old mdstat showed free devices and the new one doesn't, it's working anyways. (that took me a while, sending me off on a wild goose-hunt) now one can proceed as in the HOWTO and do all the mkraid stuff, it should work fine. By the way don't forget to do a raidstop before shutting down. My system would tell me that my /dev/hda1 has changed to /dev/hdb1 or something if one doesn't call raidstop. Let me add following. I have a working knowledge of C, not of linux kernels, so what I did was half guessing. If the raid gurus can explain why or maybe a better way, I would gladly like to see it. Hope this helps, Eduard Ralph -- Sent through Global Message Exchange - http://www.gmx.net
Thanks: Trying to install Red Hat 6.1 on DPT Raid5
Thank you for all the reply's I received. I figured out what the problem was, and of course it was something that makes you want to kick yourself right in the %*@#^! Anyway, I was using the wrong boot disk! I was using the boot and suplemental disks for RH6.0 for a RH6.1 installation. The thing is that these disks would allow me to do a full install to the RAID Array. Then upon reboot the system would hang on Kernel panic. Solution: There is a file called "i2orh61.zip" located on the ftp.dpt.com site. This is the driver disk used for the I2O RAID V install. The boot disk is the disk that comes with the RH6.1 retail box. When this method is used it installs without a hitch! Jon Preston.
Re: RAID5 and 2.2.14
I took the patch I grabbed at work on a SUN box and loaded it... it was 60K smaller than the one I was loading last night. Patched a fresh 2.2.14 kernel with no problems and the raid is up and running! Thanks for everyone's help, and Damn you, Bill Gates for your Kludged 8 bit GUI OS! At 10:23 PM 1/24/00 +0800, Gary Allpike wrote: I have put up a pre-patched kernel source at : http://spice.indigo.net.au/linux-2.2.14+raid-2.2.14-B1.tar.bz2 === David Cooley N5XMT Internet: [EMAIL PROTECTED] Packet: N5XMT@KQ4LO.#INT.NC.USA.NA T.A.P.R. Member #7068 We are Borg... Prepare to be assimilated! ===