Re: Journaling FS and RAID
Krzysztof SIEROTA wrote: As far as I know the issue has been fixed in 2.4.* kernel series. ReiserFS and software RAID5 is NOT safe in 2.2.* Chris Hi, but Stephen Tweedie (some time ago) pointed out that , the only way to make a software raid system that survives (without data corruption) a power failure while in degraded mode ( this case is rare but it COULD happen), is to make a big RAID5 partition where you store the data and a small RAID1 parition where you keep the journal of the RAID5 partition. He said ext3fs can be adapted for this, what is the current status ? Regarding ReiserFS and RAID5 : does allow reiserfs to put the journal on a different partition ( eg RAID1) ? If not consider it, because people want to run software-RAID5 arrays expecting the same reliability of HW ones. last questions: are the current ext3 and reiserfs raid-reconstruction safe ? thanks for infos. (waiting for the moment where I will be able to powercycle a 100GB journaled soft-RAID5 array which comes back up within few secs, instead of dozen of mins up to hours because of fsck. :-) ) PS: do you think we will see that before the end of the year ? Benno.
Re: RAID1 on IDE
There is a possibility that the slave can die if the master goes down. Therefore the purpose of RAID1 availability goes away in this situation. What's still left is RAID1 data protection, that means even if your box stops or do not boot anymore your data should still be ok, since only one of the two disks failed. Therefore adding a fresh disk, and reconstructing the array will just work fine. But after hearing all the discussions about the drawbacks (both from a reliability and performance point of view) of IDE raid in master/slave configurations, I would never use that kind of config. If you need more IDE channels in your box add a promise card, or a 3Ware card, which supports up to 8 IDE channels (8 disks in master-only configuration). Benno. Edward Schernau wrote: If I have a RAID1 set on a single IDE channel, i.e. master slave, will the box keep running if a drive goes down? -- Edward Schernau, mailto:[EMAIL PROTECTED] Network Architect http://www.schernau.com RC5-64#: 243249 e-gold acct #:131897
3WARE IDE cards questions and thoughts ..
Hi, I went to the 3WARE site. really nice the 8 IDE channel version and cheap too. :-) I guess they do not support master/slave configurations for performance and reliability reasons. (At least I am assuming this because they say up to 8 drives and there are 8 connectors) I noticed that they do not support UDMA/66. (at least the PDF says so). Do you think that the impact is negligible when there is only one drive per chanel ? (At least I think that there are not that many EIDE disks which can sustain 33MB/sec all the time). The load (in terms of bandwidth) generated on the bus/CPU by multiple (8) disks is quite high IMHO, therefore 66MB/sec * 8 (even if every drive would be able to deliver that kind of bandwidth) , would likely saturate your mobo/CPU, shifting the bottleneck from the disks to the memory/CPU subsystem. BTW, do you need a 2.3.x to work with these 3WARE monsters ? Or are there patches backported to 2.2.x floating around ? Benno.
Re: RAID-0 - RAID-5
Jakob Østergaard wrote: On Thu, 27 Apr 2000, Mika Kuoppala wrote: [snip] I think Jakob Østergaard has made raid-reconf utility which you can use to grow raid0 arrays. But i think i didn't support converting from raid0 to raid5. Or perhaps it alraeady does =? :) It doesn't (yet). And unfortunately, with examns coming up, it's not likely that it will for the next month or two. I haven't abandoned the idea though. With the new raid in 2.4 the demand for such a utility will be even greater. Sorry , I am not very up to date: do you plan both, an increase of the individual partitions, and resizing by adding more disks ? That would be too cool, having for example a 4 disks soft-RAID5 array and when you run out of space, add one more disk, and let the resize tool recalculare all parities etc, in order to take the fifth disk into account. Is that possible form a pratical POV ? Benno.
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
Chris Wedgwood wrote: In the power+disk failure case, there is a very narrow window in which parity may be incorrect, so loss of the disk may result in inability to correctly restore the lost data. For some people, this very narrow window may still be a problem. Especially when you consider the case of a disk failing because of a power surge -- which also kills a drive. This may affect data which was not being written at the time of the crash. Only raid 5 is affected. Long term -- if you journal to something outside the RAID5 array (ie. to raid-1 protected log disks) then you should be safe against this type of failure? -cw wow, really good idea to journal to a RAID1 array ! do you think it is possible to to the following: - N disks holding a soft RAID5 array. - reserve a small partition on at least 2 disks of the array to hold a RAID1 array. - keep the journal on this partition. do you think that this will be possible ? is ext3 / reiserfs capable of keeping the journal on a different partition than the one holding the FS ? That would really be great ! Benno.
Re: large ide raid system
Thomas Davis wrote: JMy 4way IDE based, 2 channels (ie, master/slave, master/slave) built using IBM 16gb Ultra33 drives in RAID0 are capable of about 25mb/sec across the raid. nice to hear :-) not a very big performance degradation Adding a Promise 66 card, changing to all masters, got the numbers up into the 30's range (I don't have them at the moment.. hmm..) I was also wondering about the reliability of using slaves. Does anyone know about the likelihood of a single failed drive bringing down the whole master/slave pair? Since I have tended to stay away from slaves, for performance reasons, I don't know how they influence reliability. Maybe it's ok. When the slave fail, the master goes down. My experience has been, when _ANY_ IDE drive fails, it takes down the whole channel. Master or slave. The kernel just gives fits.. hmm .. strange .. I got an old Pentium box, and disconnected the slave and the raid5 array continued to work after a TON of syslog messages. Anyway, I agree that the master-only configuration is much more reliable from an electrical point of view. I was wondering how much IDE channels linux 2.2 can handle, can it handle 8 channels ? would an Abit with 4 channels + 2 promise ultra 66 cards work ? or a normal BX mainboard (2 channels) + 3 promise ultra 66 ? thanks for infos, Benno.
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
James Manning wrote: [ Tuesday, January 11, 2000 ] Benno Senoner wrote: The problem is that power outages are unpredictable even in presence of UPSes therefore it is important to have some protection against power losses. I gotta ask dying power supply? cord getting ripped out? Most ppl run serial lines (of course :) and with powerd they get nice shutdowns :) Just wanna make sure I'm understanding you... James -- Miscellaneous Engineer --- IBM Netfinity Performance Development yep, obviously the UPS has a serial line to shut down the machine nicely before a failure, but it happened to me that the serial cable was disconnected and the power outage lasted SEVERAL hours during a weekend , where no one was in the machine room (of an ISP). you know murphy's law ... :-) But I am mainly interested in the power-failure-protection in the case where you want to setup a workstation with a reliable disk array (soft raid5), and do not have always an UPS handy, you will loose the file that was being written, but the important thing is that the disk array remains in a safe state , just like a single disk + journaled FS. Sthephen Tweedie said that this is possible (by fixing the remaining races in the RAID code), if these problems will be fixed sometime, then our fears of a corrupted soft-RAID array in the case of a power-failure on a machine without UPS will completely go away. cheers, Benno.
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
"Stephen C. Tweedie" wrote: Ideally, what I'd like to see the reconstruction code do is to: * lock a stripe * read a new copy of that stripe locally * recalc parity and write back whatever disks are necessary for the stripe * unlock the stripe so that the data never goes through the buffer cache at all, but that the stripe is locked with respect to other IOs going on below the level of ll_rw_block (remember there may be IOs coming in to ll_rw_block which are not from the buffer cache, eg. swap or journal IOs). We are '100% journal-safe' if power fails during resync. Except for the fact that resync isn't remotely journal-safe in the first place, yes. :-) --Stephen Sorry for my ignorance I got a little confused by this post: Ingo said we are 100% journal-safe, you said the contrary, can you or Ingo please explain us in which situation (power-loss) running linux-raid+ journaled FS we risk a corrupted filesystem ? I am interested what happens if the power goes down while you write heavily to a ext3/reiserfs (journaled FS) on soft-raid5 array. After the reboot if all disk remain intact physically, will we only lose the data that was being written, or is there a possibility to end up in a corrupted filesystem which could more damages in future ? (or do we need to wait for the raid code in 2.3 ?) sorry for re-asking that question, but I am still confused. regards, Benno.
Re: large ide raid system
Jan Edler wrote: On Mon, Jan 10, 2000 at 12:49:29PM -0800, Dan Hollis wrote: On Mon, 10 Jan 2000, Jan Edler wrote: - Performance is really horrible if you use IDE slaves. Even though you say you aren't performance-sensitive, I'd recommend against it if possible. My tests indicate UDMA performs favorably with ultrascsi, at about 1/6 the cost. Cost is often a big factor. I wasn't advising against IDE, only against the use of slaves. With UDMA-33 or -66, masters work quite well, if you can deal with the other constraints that I mentioned (cable length, PCI slots, etc). Do you have any numbers handy ? will the performance of master/slave setup be at least HALF of the master-only setup. For some apps cost is really important, and software IDE RAID has a very low price/Megabyte. If the app doesn't need killer performance , then I think it is the best solution. now if we only had soft-RAID + journaled FS + power failure safeness right now ... cheers, Benno.
Re: soft RAID5 + journalled FS + power failure = problems ?
"Stephen C. Tweedie" wrote: Hi, On Fri, 07 Jan 2000 13:26:21 +0100, Benno Senoner [EMAIL PROTECTED] said: what happens when I run RAID5+ jornaled FS and the box is just writing data to the disk and then a power outage occurs ? Will this lead to a corrupted filesystem or will only the data which was just written, be lost ? It's more complex than that. Right now, without any other changes, the main danger is that the raid code can sometimes lead to the filesystem's updates being sent to disk in the wrong order, so that on reboot, the journaling corrupts things unpredictably and silently. There is a second effect, which is that if the journaling code tries to prevent a buffer being written early by keeping its dirty bit clear, then raid can miscalculate parity by assuming that the buffer matches what is on disk, and that can actually cause damage to other data than the data being written if a disk dies and we have to start using parity for that stripe. do you know if using soft RAID5 + regular etx2 causes the same sort of damages, or if the corruption chances are lower when using a non journaled FS ? is the potential corruption caused by the RAID layer or by the FS layer ? ( does need the FS code or the RAID code to be fixed ?) if it's caused by the FS layer, how does behave XFS (not here yet ;-) ) or ReiserFS in this case ? cheers, Benno. Both are fixable, but for now, be careful... --Stephen
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure = problems ?
"Stephen C. Tweedie" wrote: (...) 3) The soft-raid backround rebuild code reads and writes through the buffer cache with no synchronisation at all with other fs activity. After a crash, this background rebuild code will kill the write-ordering attempts of any journalling filesystem. This affects both ext3 and reiserfs, under both RAID-1 and RAID-5. Interaction 3) needs a bit more work from the raid core to fix, but it's still not that hard to do. So, can any of these problems affect other, non-journaled filesystems too? Yes, 1) can: throughout the kernel there are places where buffers are modified before the dirty bits are set. In such places we will always mark the buffers dirty soon, so the window in which an incorrect parity can be calculated is _very_ narrow (almost non-existant on non-SMP machines), and the window in which it will persist on disk is also very small. This is not a problem. It is just another example of a race window which exists already with _all_ non-battery-backed RAID-5 systems (both software and hardware): even with perfect parity calculations, it is simply impossible to guarantee that an entire stipe update on RAID-5 completes in a single, atomic operation. If you write a single data block and its parity block to the RAID array, then on an unexpected reboot you will always have some risk that the parity will have been written, but not the data. On a reboot, if you lose a disk then you can reconstruct it incorrectly due to the bogus parity. THIS IS EXPECTED. RAID-5 isn't proof against multiple failures, and the only way you can get bitten by this failure mode is to have a system failure and a disk failure at the same time. --Stephen thank you very much for these clear explanations, Last doubt: :-) Assume all RAID code - FS interaction problems get fixed, since a linux soft-RAID5 box has no battery backup, does this mean that we will loose data ONLY if there is a power failure AND successive disk failure ? If we loose the power and then after reboot all disks remain intact can the RAID layer reconstruct all information in a safe way ? The problem is that power outages are unpredictable even in presence of UPSes therefore it is important to have some protection against power losses. regards, Benno.
Re: Swap on Raid1 : safe on EIDE disks ? =no hangs ?
Luca Berra wrote: According to Stephen Tweedie the problem happens in both case, writes to the swap file do bypass the buffer cache in ANY case. only way to be safe is: do not use spare drives swapoff before raidhotadding replace swapon with something like while grep -q "^${1#/dev/} .* resync=" /proc/mdstat;do sleep 1;done swapon $1 (needs to be smarter than that, to support swapon -a and swap on file) I was wondering if this procedure (swapping on Raid1 + waiting for the resync), is safe on an IDE only system ? I don't need hot-swapping, the only thing I need is that if one disk dies, swapping will not take down the disk (hangs like in the SCSI case etc). If this is safe I can easily live with the "wait for resync" issue. regards, Benno.
soft RAID5 + journalled FS + power failure = problems ?
Hi, I was just thinking about the following : There will soon be available at least one stable journaled FS for linux out of the box. Of course we want to run our soft-RAID5 array with journaling in order to prevent large fsck and speed up the boot process. My question: what happens when I run RAID5+ jornaled FS and the box is just writing data to the disk and then a power outage occurs ? Will this lead to a corrupted filesystem or will only the data which was just written, be lost ? I know that a single disk + journaled FS do not lead to a corrupted disk, but in the RAID array case ? The Software-RAID HOWTO says: "In particular, note that RAID is designed to protect against *disk* failures, and not against *power* failures or *operator* mistakes." What happens if a new written block was only committed to 1 disk out of 4disks present in an array (RAID5) ? Will this block be marked as free after the array resync or will it lead to problems making the md device corrupted ? If the software RAID in linux doesn't already guarantee this, could this be added with a similar technique just like a journaled FS does ? I am thinking about keeping a journal of the committed blocks to disk, and if a power failure occurs, then just wipe out these blocks and make them free again. A journaled FS on top of the raid device will automatically avoid files being placed in these "uncommitted disk blocks". But since md is a "virtual device" the situation might be more complicated. I think quite a few of us are interested this "raid on powerfailure during diskwrites reliability" topic. Thank you in advance for your explanations. regards, Benno.
Re: linux raid patch
John Barbee wrote: I'm running Redhat 6 which comes with the 2.2.5 kernel. Does that mean I should be using the raid0145-2.2.6 patch from ftp.kernel.org? How do I incorporate this patch into the kernel. I couldn't find any directions. The patch starts with a lot of +'s and -'s. I presume those mark the differences between what the block_device.c file is and what it should be, but what is the proper way for me to update that file? john. you don't need the patches, the 2.2.5 kernel which comes with Redhat 6.0 already have thes patches, and the raidtools-0.90 RPM is included too. the raid devices are built as modules, therefore you may load them with insmod. So if you want a root-raid device, you must rebuild your initrd and add the raid*.o module regards, Benno.
IDE: Master/Slave failure = entire channel down ?
Can someone of you explain us please, if a failure of a the master or slave drive, causes both drives on the same IDE channel to be inaccessible ? I tested this with my Pentium board (Tyan Tomcat HX chipset), and had no problems while disconnecting either a master or a slave drive. Does this happen only on certain IDE controllers ? regards, Benno.
Re: installing root raid non-destructively
Luca Berra wrote: Third, (naive questions) if raid1 supports on-the-fly disk "reconstruction" why can't I simply add another identical disk alongside my present one, activate raid1 non-destructively and have disk2 be "reconstructed" as the mirror image of disk1? because linux-raid keeps a 4K raid superblock at the end of the partition so if you already created the filesystem you have no room for the superblock. i have not yet tested this, but you could get resize2fs (you need a partition magic license for that, shrink the filesystem by 4K, dd one disk over the other, then create a new raid. I tested this some time ago ( dd if=/dev/hda1 of=/dev/hdc1 in single user mode) but did not shrink the ext2fs, but it worked. Will the superblock get corrupted when I use up my last 4k of space on the ext2fs ? regards, Benno.
Re: RAID and RedHat 6.0
Ingo Molnar wrote: On Sun, 9 May 1999, Giulio Botto wrote: I downloaded the latest version of the raidutils and compiled them but still the same error, is there something else I should have goten? My guess is the "latest" raidtools are already installed, the problem lies with the kernel: they probably put the mainstream kernel without the correct patches for raid [...] no, RedHat 6.0 has the newest (ie. most stable) RAID driver. The problem in the config probably is the missing 'persistent-superblock' and 'chunksize' parameter. I think Redhat should compile fix in the kernel all the MD driver options, It will not take up very much memory, and if Redhat will put the raidtools on the CD instimage , people will be able to install a fresh redhat distribuion on a soft-RAID array, and even upgrade on older distribution sitting on a root RAID array. This is a often requested feature. The installer should be able to recognize md devices, and do the proper handling. diskdruid sould allow a simple raid-ed setup too. All this would lead in a very easy installation/upgrade procedure of Redhat on a machine on soft-raid array. comments ? PS: is there a way to boot of an autodetected root-raid array using the kernel shipped with RH6.0 (without recompiling) by generating a new initrd which loads the the raid* modules at bootup. or does need the boot process the raid* drivers before the initrd is started ? regards, Benno. -- mingo
SWAP CRASHES LIUNX 2.2.6 = malloc() design problem ?
Hi, My system is a Redhat 5.2 running on linux 2.2.6 + raid0145-19990421. I tested if the system is stable while swapping heavily. I tested a regular swap area and a soft-RAID1 (2 disks) swaparea. So I wrote e little program wich does basically the following: allocate as much as possible blocks of about 4MB, then begin to write data in a linear way to each block, and displaying the write-performance in MB/sec. The program writes a dot on the screen after the write of each 1024bytes. NOW TO THE VERY STRANGE RESULTS : 1) my frist BIG QUESTION is if there is a design flaw in malloc() or not: when I do (number of successfully allocated blocks)* 4MB then I get 2GB of *SUCCESSFULLY* malloc()ed memory, but my system has only about 100MB of virtual mem (64MB RAM + 40MB swap). How hopes the kernel to squeeze the allocated 2GB in 100MB of virtual mem ? :-) Does anyone know why the kernels does not limit the maximum malloc()ed memory to the amount of RAM+SWAP ? Will this be changed in the future ? 2) At beginning the program runs fine and when the RAM is used up, swapping activity begins, and the mem-write performance drops to about the write performance of the disk or RAID1 array. Now the problem: When all RAM + SWAP are used up, then the system begin to freeze, and every 10-20secs , there appear messages on the console like: out of memory of syslog out of memory of klogd . . and after a while my swapstress program exits with a Bus Error. Sometimes even "update" get killed, or "init" which writes: PANIC SEGMENT VIOLATION ! after the exit of with "Bus Error" of my swapstress program the system continues to work, but since init got killed , you can not reboot or shutdown the machine anymore. Note that I ran swapstress as normal user. This means it's easy to crash/render unstable Linux with heavy malloc()ing / swapping. You can find my swapstress program at http://www.gardena.net/benno/linux/swapstress.tgz please let me know your result. (crash / lockup .. ? ) comments please, especially form the kernel gurus ! regards, Benno.
Re: SWAP CRASHES LIUNX 2.2.6 = malloc() design problem ?
Alvin Starr wrote: On Tue, 27 Apr 1999, Benno Senoner wrote: Hi, My system is a Redhat 5.2 running on linux 2.2.6 + raid0145-19990421. I tested if the system is stable while swapping heavily. I tested a regular swap area and a soft-RAID1 (2 disks) swaparea. So I wrote e little program wich does basically the following: allocate as much as possible blocks of about 4MB, then begin to write data in a linear way to each block, and displaying the write-performance in MB/sec. The program writes a dot on the screen after the write of each 1024bytes. NOW TO THE VERY STRANGE RESULTS : 1) my frist BIG QUESTION is if there is a design flaw in malloc() or not: when I do (number of successfully allocated blocks)* 4MB then I get 2GB of *SUCCESSFULLY* malloc()ed memory, but my system has only about 100MB of virtual mem (64MB RAM + 40MB swap). I took a look at your program. It looks as if you are not using the memory that you malloc. If I remember correctly Linux will not allocate the memory that you have requested until you use it. This is a really handy feature when you have sparce arrays. I now this, fact, without the fragment below, the program would do nothing, just run the program and see that is writes to mem, look at your HD LED :-) What you need to do is to have your program touch every page in the virtual memory allocated. If the page size is 4096 bytes you will have to write every 4096'th byte to insure that the page is brought into existance. this is the fragment that writes to the memory: in particular buffer[u]=u does the actual write, buffer=block_arr[i]; for(u=0;uMYBUFSIZE;u++) { buffer[u]=u; if((u 1023)==0) { printf("."); fflush(stdout); } } Does anyone know why the kernels does not limit the maximum malloc()ed memory to the amount of RAM+SWAP ? This is a design choice. If you do this you will limit programms that use virtual memory in a way that is sparsly populated. Will this be changed in the future ? Other systems pre-commit memory and this can at times cause your system to stop allowing new processes to run up even though you are not using anywhere near the total of ram+swap avilable. The choice of using one allocation method or another may be a possible place for a kernel tuneing feature. But for me I like the current choice. 2) At beginning the program runs fine and when the RAM is used up, swapping activity begins, and the mem-write performance drops to about the write performance of the disk or RAID1 array. Now the problem: When all RAM + SWAP are used up, then the system begin to freeze, and every 10-20secs , there appear messages on the console like: out of memory of syslog out of memory of klogd . . and after a while my swapstress program exits with a Bus Error. Sometimes even "update" get killed, or "init" which writes: PANIC SEGMENT VIOLATION ! after the exit of with "Bus Error" of my swapstress program the system continues to work, but since init got killed , you can not reboot or shutdown the machine anymore. Note that I ran swapstress as normal user. This means it's easy to crash/render unstable Linux with heavy malloc()ing / swapping. resource exhaustion can cause a number of problems one of which is system crashes. One solution is to limit the per user memory maximum so that a single user cannot burn up all the system memory but that still will not stop the problem. One possible answer is for the kernel to allways spare some swap space for tasks running as root and to suspend any user tasks that request memory when the swap limit is reached. the creation of new user processes should also be suspended when this limit is reached. at this point an administrator would be able to login to the system and kill the offending processes or take some other remedial action. I agree, root processes should have some spare ressurces. Alvin Starr || voice: (416)585-9971 Interlink Connectivity|| fax: (416)585-9974 [EMAIL PROTECTED] ||
Re: SWAP CRASHES LIUNX 2.2.6 = malloc() design problem ?
David Guo wrote: Hi. If you read the document of the raid. You'll know swap on raid is not safe. And you don't have any reason to use swap on raid. Because kernel handles the swap on different disk will not be worse then raid. I think you can checkout the docs with raid. Yours David. Not true: The document is pretty outdated, older raid patches crashes easily, with my program, free_page not on list ... and so on ... But with the latest patches +2.2.6 , swapping on a soft-RAID1 swaparea gives me the same behavoiur as swapping on a regular disk. (no more free_page errors) Therefore it's a kernel design issue. ciao Benno.
Re: SWAP CRASHES LIUNX 2.2.6 = malloc() design problem ?
[EMAIL PROTECTED] wrote: I don't see the relevance to linux-raid either, but the 2.2.x kernel does have /proc/sys/vm/overcommit_memory which will enable the below behaviour. It's off by default though... thanks, I will try this /proc setting, I am using a standard RH5.2 box with a 2.2.6 kernel + 0145 raid patches but strange when the /proc/.. setting is off by default, why it's on on my machine I don't think RH5.2 will specifically turn on the switch, because the init-scripts are for kernel 2.0.36. the relevance to linux-raid is the following: I was testing the stability of swapping over soft-RAID1, and got a a "side-effect" the malloc problem ... :-) earlier raid patches creashed the machine because the free_page problems ( memory starvation) the never patches gave me the same results while swapping on soft-RAID1 as swapping on a regular swaparea. regards, Benno.
auto-partiton new blank hotadded disk
This brings up another question, partitioning. The above (I don't think) would work currently anyway due to the disks having to be partitioned out first (correct?). How hard would it be to have the raid code itself write the required partition information and whatever it requires to get a raw disk (which was marked as a hot spare) up and running? I am interested more in the idea of automatically repartition a new blank disk while it is hot-added. A simple solution would be: assume that all disks in the array are partitioned in the same way, assume you have a command like myraidhotadd /dev/md0 /dev/hdb which does the following: scans /etc/raidtab and sees wich disks are part of the md0 device choses a a non-failed disk, for example /dev/hda reads the partition-table from hda using fdisk /dev/hda inputcommands outputfile then myraidhotadd parses the contens of outputfile (partitiontable of hda), and invokes fdisk on hdb and recreates the exact partition table (perhaps deleting first eventual partitions found on hdb) finally you can run raidhotadd to add the disk to the array The fun thing is that you can write this program with little effort because its basically only parsing of text and calling of external programs. If I find some free time I will write such a script and post it here. or am I reinventing the wheel ? :-) ciao Benno.
Re: auto-partiton new blank hotadded disk
Ingo Molnar wrote: On Mon, 26 Apr 1999, Benno Senoner wrote: I am interested more in the idea of automatically repartition a new blank disk while it is hot-added. no need to do this in the kernel (or even in raidtools). I use such scripts to 'mass-create' partitioned disks: that's ok, but it's not unsafe to overwrite the partition-table of disks which are actually part of a soft-raid array and in use ? I think for hot-adding it would be better to use a script which takes as parameter the disk you want to partition, and partition only this disk. An other idea would be , reading the partiton-table of a non-failed disk part of the array and recreate this partiton on the new disk. or a 3rd possiblity would be creating a nice cmdline tool, where you give the following informations: the disks which you want to include in the array and how much space of the disk you want to use for the array. then the tool partition all disks and write a sort of "template" script for fdisk so that you can write a wrapper for raidhotadd which first calls this script before doing the hot-add. opinions ? regards, Benno. [root@moon root]# cat dobigsd if [ "$#" -ne "1" ]; then echo 'sample usage: dobigsd sda' exit -1 fi echo "*** DESTROYING /dev/$1 in 5 seconds!!! ***" sleep 5 dd if=/dev/zero of=/dev/$1 bs=1024k count=1 (for N in `cat domanydisks`; do echo $N; done) | fdisk /dev/$1 [root@moon root]# cat domanydisks n e 1 1 200 n l 1 25 n l 26 50 n l 51 75 n l 76 100 n l 101 125 n l 126 150 n l 151 175 n l 176 200 n p 2 300 350 n p 3 350 400 n p 4 450 500 t 2 86 t 3 83 t 4 83 t 5 83 t 6 83 t 7 83 t 8 83 t 9 83 t 10 83 t 11 83 t 12 83 w thats all, fdisk is happy to be put into scripts. -- mingo
Re: auto-partiton new blank hotadded disk
Ingo Molnar wrote: On Mon, 26 Apr 1999, Benno Senoner wrote: no need to do this in the kernel (or even in raidtools). I use such scripts to 'mass-create' partitioned disks: but it's not unsafe to overwrite the partition-table of disks which are actually part of a soft-raid array and in use ? it's unsafe, and thus the kernel does not allow it at all. Why dont you create the partitions before hot-adding the disk? -- mingo Ok, I misunderstanded your script: your script acts on only 1 disk, I thought it partitions your whole array. I will try to do some raid init-tool which automates the task of creating the partitions on the array, and when you hotadd a new disk, it partitions a blank disk before the actual hotadd. PS: does someone know, why the linux disk driver blocks for so much time during heavy disk writes. This is very bad for playing low-latency audio. (50-100ms) all processes block up to 2-3 secs during syncs of the buffer cache. Is there something planned for future to avoid this ? regards, Benno.
Re: Hot Swap
Helge Hafting wrote: [question on how to hot-swap IDE] I have an IDE drive that I can disconnect and reconnect with no problems. It is not on a RAID though. The controller is nothing special, a ultra-dma thing built into the shuttle motherboard. The trick seems to be unloading the IDE kernel module before reconnecting. It probes the drive just fine upon reloading. Unfortunately you can't do that with a IDE root file system. Helge Hafting I did some hot-swap tests with IDE drives successfully: normally the IDE driver is compiled fix in the kernel , and even if you compile it as a module in the 2.2.x kernels , you can not unload it because its busy in the case the root-fs is on the IDE disk. The kernel probes the attached disks at boot time, so it has a static table of the disks which are present in your machine. I tested the following: a soft-RAID5 array consisting of 3 disks , hda1 , hdb1 , hdc1 diring heavy writes on the raid array, I disconnected the power hdb for example. At this point the kernel sees that the device does not respond and writes many messages in the syslog. The raid layer detects the error condition , marks the device as faulty (you can see this in /proc/mdstat ) , and continues in degrated mode. at this point you must issue a: raidhotremove /dev/md0 /dev/hdb1 which removes the hdb1 device from the array (again, check your /proc/mdstat ) now you can reconnect the disk if it's a new blank disk you have to repartition the disk like the old one , creating hdb1 for example, and you can do this on the fly without a reboot (it worked for me, I know that there are disks where you must reboot to ensure that the new partition table is updated properly) now simply type: raidhotadd /dev/md0 /dev/hdb1 et voila': the device is added to the array and hot-reconstruction begins in background and in /proc/mdstat you can see the progress status of the reconstruction. regards, Benno.
Re: Swap on RAID
Stephen C. Tweedie wrote: Hi, On Wed, 14 Apr 1999 21:59:49 +0100 (BST), A James Lewis [EMAIL PROTECTED] said: It wasn't a month ago that this was not possible because it needed to allocate memory for the raid and couldn't because it needed to swap to do it? Was I imagining this or have you guys been working too hard! There may well have been a few possible deadlocks, but the current kswapd code is pretty careful to avoid them. Things should be OK. --Stephen So what must one use to swap over a soft-RAID1 partition ? a standard 2.2.x kernel or a 2.2.x kernel patched with the latest alpha-raid patches ? I want to use the alpha-patches because the raid 0145 driver has hot-reconstruciton. can one you you please explain us exactly which patches can do SWAP over soft-RAID1 ? thanks, regards, Benno.
raid patches for linux 2.2.4 available ?
Hi, do you know if there are already the raid alpha patches available for kernel 2.2.4 ? at www.us.kernel.org/ the latest I found is for 2.2.3. or can I apply the raid-2.2.3 patch over a 2.2.4 kernel ? thank you for the info. regards, Benno.
most RAID crashproof setup = BOOT FROM FLOPPY DISK
Hello, I was looking for the most crashproof setup of a Software Root-RAID array: my conclusion: assume one wants to setup a machine with Root RAID5 array, the problem is the booting of the kernel, since LILO uses the BIOS routines the kernel must reside on a standard partition (non software-raid), on the usual 1024 cylinders ecc.. assume I set up LILO to load the kernel off the first disk (where the /boot dir resides too) when the first disk crashes , the system won't boot anymore. Solution: I use Redhat 5.2 , I prepared a bootdisk, which contains a 2.0.36+raid0145 kernel plus the initrd image then using as partition id 0xFD on the harddrives, the kernel autodetects the RAID array and starts the root /dev/md0 device and boots the system. So even if the first disk crashes, the system is still functional. My recommendations: make 2-3 copies of the bootdisk, and buy a spare 3.5inch floppy drive (very cheap) , in the case that the floppy drive breaks. :-) Yes , I know it is not very elegant to boot of a floppy drive, but it saves a lot of trouble. Or are there better methods, to boot realiably a pure software RAID5 array ? any comments ? PS: do you know if switching on/off the power of the computer with the floppy disk in the floppy drive, can accidentally (through electromagnetic discharges) erase the data of the floppy disk ? ( this is the cause why I recommend to make some copies of the bootdisk). best regards, Benno.
Re: hot-add features of alpha-code IDE drives , is this possible ?
m. allan noah wrote: the eide electrical interface does not support hot-swap. i have done hot-remove many times, but not hot-add. seems like a nice way to lose a drive. I tested the following: Redhat 5.2 kernel 2.0.36 + raid 0145 + raidtools 0.90 RAID5 setup consisting of 3 partitions: /dev/hda1 /dev/hda2 /dev/hdc1 ( yes the first 2 partitions are physically on the same drive, but at the testing time I had only 2 drives -:) ) I wrote a script that copies many files to the RAID5 array, at some point I disconnected the data cable of the hdc drive (but not the power cable) (secondary master), then the kernel gives many errors like eide: IntrCmd and then the usual I/O errors on /dev/hdc1 . the RAID code spit out many debugging messages, plus a strange "md: raid bug found in line 666 ( calls the procedure MD_BUG ) what does means this ? the kernel disabled the drive: in /proc/mdstat is can see: /dev/hdc1[0] (F) raidhotremove /dev/md0 /dev/hdc1 works without problems I reconnected the hdc drive , ran fdsik /dev/hdc and the kernel says me: ide1: reset I removed the hdc1 partition and recreated it with id 0xFD , and then leaved fdisk. raidhotadd /dev/md0 /dev/hdc1 the kernel began to reconstruct the parity on the disk, and all seemed ok. but I am very courious if a one can really damage an EIDE drive doing hot-adds ? As you know the price/performace of the latest EIDE drives are very good, especially if you look at the IBM Deskstar 16GB or the upcoming IBM 24GB drives, using 4 drives you can build a huge Raid array for a very competitive price, and the performance is not so bad ( the IBM 16GB drives does a nice 12MB/sec read transfer). Yes SCSI is superiour to IDE, but software RAID adds much realiability to these drives. the maximum would be having even (realiable) hot-swap on IDE drives, but I if you say it is not possible in a realiable manner . :-( Do you know precisely what can be the factor which can damage the IDE drives ? thank you for the infos, best regards, Benno.
Re: most RAID crashproof setup = BOOT FROM FLOPPY DISK
In this case the bootfloppy has the advantage that if it is corrupt you can quickly replace it with a new. This is why you'd have entries for all the disks in the lilo config files. You'd use lablel like Linux_d1, Linux_d2 ... or whatever. If loading the image from the first disk fails, you just have to enter the label of another kernel at the lilo prompt. Hmm, you are right, floppy has security problem (but the customer in my case IS root, and there are no additional users) I forget the fact that you can setup multiple labels, but do you think that there is a possibility that the disk gets corrupted at a point which, LILO begins loading itself (prior kernel loading) and then stops due to disk I/O error ? thank you again for you infos, best regards, Benno. Plus there is another additional work when a disk fails,using your solution: you have to set up the /bootn dir and rerun lilo. Since you must repartition and raidhotadd the new disk anyway I don't consider this to be much of a problem. i plan to install software-RAID Linux PCs, where the user is a joe-average user, so I think for my purposes I think I will consider the boot-floppy option. I don't particularely like booting off floppy for several reasons: * security problems - by booting with some sort of rescue disk you can get around any security on the system. * I've had more problems with dust-clogged or otherwise inoperative floppy disks than with failing harddisks - admittedly, this may not be typical. * additional single point of failure. If the floppy drive or controller breaks, you're down till you get a replacement. Bye, Martin -- Martin Bene vox: +43-664-3251047 simon media fax: +43-316-813824-6 Andreas-Hofer-Platz 9 e-mail: [EMAIL PROTECTED] 8010 Graz, Austria -- finger [EMAIL PROTECTED] for PGP public key