Re: raid problem: after every reboot /dev/sdb1 is removed?
Berni wrote: Hi! I have the following problem with my softraid (raid 1). I'm running Ubuntu 7.10 64bit with kernel 2.6.22-14-generic. After every reboot my first boot partition in md0 is not synchron. One of the disks (the sdb1) is removed. After a resynch every partition is synching. But after a reboot the state is removed. The disks are new and both seagate 250gb with exactly the same partition table. Did you create the raid arrays and then install on them? Or add them after the fact? I have seen this type of problem when the initrd doesn't start the array before pivotroot, usually because the raid capabilities aren't in the boot image. In that case rerunning grub and mkinitrd may help. I run raid on Redhat distributions, and some Slackware, so I can't speak for Ubuntu from great experience, but that's what it sounds like. When you boot, is the /boot mounted on a degraded array or on the raw partition? Here some config files: #cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sda6[0] sdb6[1] 117185984 blocks [2/2] [UU] md1 : active raid1 sda5[0] sdb5[1] 1951744 blocks [2/2] [UU] md0 : active raid1 sda1[0] 19534912 blocks [2/1] [U_] this is the problem: looks like U_ after reboot unused devices: none #fdisk /dev/sda Device Boot Start End Blocks Id System /dev/sda1 1243219535008+ fd Linux raid autodetect /dev/sda22433 17264 1191380405 Extended /dev/sda3 * 17265 2045125599577+ 7 HPFS/NTFS /dev/sda4 20452 3040079915342+ 7 HPFS/NTFS /dev/sda524332675 1951866 fd Linux raid autodetect /dev/sda62676 17264 117186111 fd Linux raid autodetect #fdisk /dev/sdb Device Boot Start End Blocks Id System /dev/sdb1 1243219535008+ fd Linux raid autodetect /dev/sdb22433 17264 1191380405 Extended /dev/sdb3 17265 30400 1055149207 HPFS/NTFS /dev/sdb524332675 1951866 fd Linux raid autodetect /dev/sdb62676 17264 117186111 fd Linux raid autodetect # mount /dev/md0 on / type reiserfs (rw,notail) proc on /proc type proc (rw,noexec,nosuid,nodev) /sys on /sys type sysfs (rw,noexec,nosuid,nodev) varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755) varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777) udev on /dev type tmpfs (rw,mode=0755) devshm on /dev/shm type tmpfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) lrm on /lib/modules/2.6.22-14-generic/volatile type tmpfs (rw) /dev/md2 on /home type reiserfs (rw) securityfs on /sys/kernel/security type securityfs (rw) Could anyone help me to solve this problem? thanks greets Berni - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Bill Davidsen [EMAIL PROTECTED] Woe unto the statesman who makes war without a reason that will still be valid when the war is over... Otto von Bismark - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
draft howto on making raids for surviving a disk crash
This is intended for the linux raid howto. Please give comments. It is not fully ready /keld Howto prepare for a failing disk The following will describe how to prepare a system to survive if one disk fails. This can be important for a server which is intended to always run. The description is mostly aimed at small servers, but it can also be used for work stations to protect it for not losing data, and be running even if a disk fails. Some recommendations on larger server setup is given at the end of the howto. This requires some extra hardware, especially disks, and the description will also touch how to mak the most out of the disks, be it in terms of available disk space, or input/output speed. 1. Creating of partitions We recommend creating partitions for /boot, root, swap and other file systems. This can be done by fdisk, parted or maybe a graphical interface like the Mandriva/PClinuxos harddrake2. It is recommended to use drives with equal sizes and performance characteristics. If we are using the 2 drives sda and sdb, then sfdisk may be used to make all the partitions into raid partitions: sfdisk -c /dev/sda 1 fd sfdisk -c /dev/sda 2 fd sfdisk -c /dev/sda 3 fd sfdisk -c /dev/sda 5 fd sfdisk -c /dev/sdb 1 fd sfdisk -c /dev/sdb 2 fd sfdisk -c /dev/sdb 3 fd sfdisk -c /dev/sdb 5 fd Using: fdisk -l /dev/sda /dev/sdb The partition layout could then look like this: Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 37 297171 fd Linux raid autodetect /dev/sda2 381132 8795587+ fd Linux raid autodetect /dev/sda311331619 3911827+ fd Linux raid autodetect /dev/sda41620 121601 9637554155 Extended /dev/sda51620 121601 963755383+ fd Linux raid autodetect Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 37 297171 fd Linux raid autodetect /dev/sdb2 381132 8795587+ fd Linux raid autodetect /dev/sdb311331619 3911827+ fd Linux raid autodetect /dev/sdb41620 121601 9637554155 Extended /dev/sdb51620 121601 963755383+ fd Linux raid autodetect 2. Prepare for boot The system should be set up to boot from multiple devices, so that if one disk fails, the system can boot from another disk. On Intel hardware, there are two common boot loaders, grub and lilo. Both grub and lilo can only boot off a raid1. they cannot boot off any other software raid device type. The reason they can boot off the raid1 is that hey see the raid1 as a normal disk, they only then use one of the dishs when booting. The boot stage only involves loading the kernel with a initrd image, so not much data is needed for this. The kernel, the initrd and other boot files can be put in a small /boot partition. We recommend something like 200 MB on an ext3 raid1. Make the raid1 and ext3 filesystem: mdadm --create /dev/md0 --chunk=256 -R -l 1 -n 2 /dev/sda1 /dev/sdb1 mkfs -t ext3 -f /dev/md0 Make each of the disks bootable by lilo: lilo -b /dev/sda /etc/lilo.conf1 lilo -b /dev/sdb /etc/lilo.conf2 Make each of the disks bootable by grub (to be described) 3. The root file system The root file system can be on another raid tah the /boot partition. We recommend an raid10,f2, as the root file system will mostly be reads, and the raid10,f2 raid type is the fastest for reads, while also sufficient fast for writes. Other relevant raid types would be raid10,o2 or raid1. It is recommended to use the udev file system, as this runs in RAM, and you thus can avoid a number of read and writes to disk. It is recommended that all file systems are mounted with the noatime option, this avoids writing to the filesystem inodes every time a file has been read or written. Make the raid10,f2 and ext3 filesystem: mdadm --create /dev/md1 --chunk=256 -R -l 10 -n 2 -p f2 /dev/sda2 /dev/sdb2 mkfs -t ext3 -f /dev/md1 4. The swap file system If a disk fails, where processes are swapped to, then all these processes fail. This may be vital processes for the system, or vital jobs on the system. You can prevent the failing of the processes by having the swap partitions on a raid. The swap area needed is normally relatively small compared to the overall disk space available, so we recommend the faster raid types over the more space economic. The raid10,f2 type seems to be the fastest here, other relevant raid types could be raid10,o2 or raid1. Given that you have created a raid array, you can just make the swap partition
Re: In this partition scheme, grub does not find md information?
Bill Davidsen wrote: Moshe Yudkowsky wrote: Michael Tokarev wrote: To return to that peformance question, since I have to create at least 2 md drives using different partitions, I wonder if it's smarter to create multiple md drives for better performance. /dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin /dev/sd[abcd]2 -- RAID5, most of the rest of the file system /dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading (writes) I think the speed of downloads is so far below the capacity of an array that you won't notice, and hopefully you will use things you download more than once, so you still get more reads than writes. For typical filesystem usage, raid5 works good for both reads and (cached, delayed) writes. It's workloads like databases where raid5 performs badly. Ah, very interesting. Is this true even for (dare I say it?) bittorrent downloads? What do you have for bandwidth? Probably not more than a T3 (145Mbit) which will max out at ~15MB/s, far below the write performance of a single drive, much less an array (even raid5). It has been pointed out that I have a double typo there, I meant OC3 not T3, and 155Mbit. Still, the most someone is likely to have, even in a large company. Still not a large chance of being faster than the disk in raid-10 mode. -- Bill Davidsen [EMAIL PROTECTED] Woe unto the statesman who makes war without a reason that will still be valid when the war is over... Otto von Bismark - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: draft howto on making raids for surviving a disk crash
Keld Jørn Simonsen said: (by the date of Sat, 2 Feb 2008 20:41:31 +0100) This is intended for the linux raid howto. Please give comments. It is not fully ready /keld very nice. do you intend to put it on http://linux-raid.osdl.org/ As wiki, it will be much easier for our community to fix errors and add updates. -- Janek Kozicki | - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: draft howto on making raids for surviving a disk crash
On Sat, Feb 02, 2008 at 09:32:54PM +0100, Janek Kozicki wrote: Keld Jørn Simonsen said: (by the date of Sat, 2 Feb 2008 20:41:31 +0100) This is intended for the linux raid howto. Please give comments. It is not fully ready /keld very nice. do you intend to put it on http://linux-raid.osdl.org/ Yes, that is the intention. As wiki, it will be much easier for our community to fix errors and add updates. Agreed. But I will not put it up before I am sure it is reasonably flawless, ie. it will at least work. I found myself a few errors already. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux md and iscsi problems
On Friday February 1, [EMAIL PROTECTED] wrote: Summarizing, I have two questions about the behavior of Linux md with slow devices: 1. Is it possible to modify some kind of time-out parameter on the mdadm tool so the slow device wouldn't be marked as faulty because of its slow performance. No. md doesn't do timeouts at all. The underlying device does. So if you are getting time out errors from the iscsi initiator, then you need to change the timeout value used by the iscsi initiator. md has no part to play in this. It just sends a request and eventually gets either 'success' or 'fail'. 2. Is it possible to control the buffer size of the RAID?, in other words, can I control the amount of data I can write to the local disc before I receive an acknowledgment from the slow device when I am using the write-behind option. No. md/raid1 simply calls 'kmalloc' to get space to buffer each write as the write arrives. If the allocation succeeds, it is used to perform the write lazily. If the allocation fails, the write is performs synchronously. What did you hope to achieve by such tuning? It can probably be added if it is generally useful. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Assemble Same RAID0 More than Once?
Is it possible to assemble the same Linux Software RAID0 array two or more times simultaneously? The idea would be to let one machine assemble the block devices with Read/Write Access and to let additional machines assemble the block devices with Read Only Access. You might think this is a ridiculous question but I have discovered the following. If I: -- take a Server with two sets of 12 drives, each set connected to a Hardware RAID controller and configured as a RAID-5 Array -- export each 12-drive array as an individual iSCSI Target with IET or SCST -- connect the Targets to a Client machine and stripe them together as a Software RAID0 ON THE CLIENT MACHINE I can get about 700 MB/sec writing to the Server and over 600 MB/sec reading -- with blockdev and other settings optimized. I'm talking about writing one stream of data and reading one stream back. On the other hand, if I stripe the two Hardware RAID devices together ON THE SERVER and export them as a single Target, I only get about half the read and write performance on the Client. Ideally, I would like to get to Two Target Performance with just One Target and I'm attacking this problem at the level of the iSCSI Target and Initiator software, as well as at the Network level. BUT if I can't solve it in any of these places, I'm wondering if there might be a way to Assemble the same Software RAID two or more times simultaneously. Currently with the iSCSI Enterprise Target (and a couple of other Linux Targets), it is possible to allow one Client to connect to a target with Read/Write access and to give all other Clients Read Only Access. SO, if I create a Software RAID on the Server and export that as a Single Target, it is possible to give multiple users simultaneous access to the Target. However, as I said above, the performance is not optimal. ' ' If I create a Software RAID on the Client out of TWO targets, the performance is great. But now it seems much more complicated for two or more clients to access the data, because each client has to Assemble the Software RAID out of the same two Targets. And only one can have Write Access. My question is, is it possible to Assemble a RAID if you can't write anything to the block device or touch its metadata (i.e., to mark that it is clean or dirty whatever gets written when the RAID is Assembled). In my first attempt to test this I tried making the Targets RO, but mdadm gave me a segmentation fault when I tried to Assemble the RAID0. And then when I made the Targets R/W again, one of them was missing its raid superblock and mdadm couldn't assemble it. Alternately, is there a safe way to Assemble the same RAID 0 two or more times but only mount it R/W ONCE, and in all other instances mount it RO. What happens to the RAID metadata if you do that? Andrew -- EditShare -- The Smart Way to Edit Together 119 Braintree Street Suite 402 Boston, MA 02134 Tel: +1 617.782.0479 Fax: +1 617.782.1071 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 1 and grub
On Wed, Jan 30, 2008 at 06:47:19PM -0800, David Rees wrote: On Jan 30, 2008 6:33 PM, Richard Scobie [EMAIL PROTECTED] wrote: FWIW, this step is clearly marked in the Software-RAID HOWTO under Booting on RAID: http://tldp.org/HOWTO/Software-RAID-HOWTO-7.html#ss7.3 A good an extesive reference, but somewhat outdated. BTW, I suspect you are missing the command setup from your 3rd command above, it should be: # grub grub device (hd0) /dev/hdc grub root (hd0,0) grub setup (hd0) I do not grasp this. How and where is it said that two disks are involved? hda and hdc should both be involved. Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid1 and raid 10 always writes all data to all disks?
I found a sentence in the HOWTO: raid1 and raid 10 always writes all data to all disks I think this is wrong for raid10. eg a raid10,f2 of 4 disks only writes to two of the disks - not all 4 disks. Is that true? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
non-fresh: what?
[ 40.671910] md: md0 stopped. [ 40.676923] md: bindsdd1 [ 40.677136] md: bindsda1 [ 40.677370] md: bindsdb1 [ 40.677572] md: bindsdc1 [ 40.677618] md: kicking non-fresh sdd1 from array! When is a disk non-fresh and what might lead to this? Happened about 15 times now since I built the array. Dex -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h++ r* y? --END GEEK CODE BLOCK-- http://www.vorratsdatenspeicherung.de - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 1 and grub
Keld Jørn Simonsen wrote: # grub grub device (hd0) /dev/hdc grub root (hd0,0) grub setup (hd0) I do not grasp this. How and where is it said that two disks are involved? hda and hdc should both be involved. There are not two disks involved in this instance. This is used in the scenario where the primary disk in the RAID1 (/dev/hda), already has grub installed in the MBR and you wish to install it on the secondary drive (/dev/hdc). This then allows for a failed primary drive to be removed and the machine to boot from the secondary - (may need BIOS to be set to boot from secondary drive). As an aside, after last weeks discovery that the Fedora 8 install had not installed grub on the secondary drive, as part of a RAID 1 install, some cursory Googling and searching Redhat's Knowledge base leads me to believe that this may well be normal for all Redhat (RHEL/Fedora) RAID1 installs. One has nothing to lose by installing grub on the second drive in this case and it may save some delay in recovery on losing the primary, although as has been pointed out, it is best practice to test missing drives as part of initial install testing. Regards, Richard - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: assemble vs create an array.......
Hello, I am not sure if you have received my email from last week with the results of the different combinations prescribed (it contained html code). Anyway, I did a ro mount to check the partition and was happy to see a lot of files intact. A few seemed destroyed, but I am not sure. I tried a xfs_check on the partition and it told me: ERROR: The filesystem have valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_check. If you are unable to mount the filesystem, then use the xfs_repair -L option to destroy the log and attempt a repair. Since I am unable to mount the partition, shoud I use the -L option with xfs_repair, or let it run without it? Again, please let me know if I should resend my previous email with the log file of xfs_repair -n. Thank you for your time, Dragos David Chinner wrote: On Thu, Dec 06, 2007 at 07:39:28PM +0300, Michael Tokarev wrote: What to do is to give repairfs a try for each permutation, but again without letting it to actually fix anything. Just run it in read-only mode and see which combination of drives gives less errors, or no fatal errors (there may be several similar combinations, with the same order of drives but with different drive missing). Ugggh. It's sad that xfs refuses mount when structure needs cleaning - the best way here is to actually mount it and see how it looks like, instead of trying repair tools. It self protection - if you try to write to a corrupted filesystem, you'll only make the corruption worse. Mounting involves log recovery, which writes to the filesystem Is there some option to force-mount it still (in readonly mode, knowing it may OOPs kernel etc)? Sure you can: mount -o ro,norecovery dev mtpt But it you hit corruption it will still shut down on you. If the machine oopses then that is a bug. thread prompted me to think. If I can't force-mount it (or browse it using other ways) as I can almost always do with (somewhat?) broken ext[23] just to examine things, maybe I'm trying it before it's mature enough? ;) Hehe ;) For maximum uber-XFS-guru points, learn to browse your filesystem with xfs_db. :P Cheers, Dave. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html