Re: raid problem: after every reboot /dev/sdb1 is removed?

2008-02-02 Thread Bill Davidsen

Berni wrote:

Hi!

I have the following problem with my softraid (raid 1). I'm running
Ubuntu 7.10 64bit with kernel 2.6.22-14-generic.

After every reboot my first boot partition in md0 is not synchron. One
of the disks (the sdb1) is removed. 
After a resynch every partition is synching. But after a reboot the
state is removed. 

The disks are new and both seagate 250gb with exactly the same partition table. 

  
Did you create the raid arrays and then install on them? Or add them 
after the fact? I have seen this type of problem when the initrd doesn't 
start the array before pivotroot, usually because the raid capabilities 
aren't in the boot image. In that case rerunning grub and mkinitrd may help.


I run raid on Redhat distributions, and some Slackware, so I can't speak 
for Ubuntu from great experience, but that's what it sounds like. When 
you boot, is the /boot mounted on a degraded array or on the raw partition?
Here some config files: 

#cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10] 
md2 : active raid1 sda6[0] sdb6[1]

  117185984 blocks [2/2] [UU]
  
md1 : active raid1 sda5[0] sdb5[1]

  1951744 blocks [2/2] [UU]
  
md0 : active raid1 sda1[0]

  19534912 blocks [2/1] [U_] this is the problem: looks 
like U_ after reboot
  
unused devices: none


#fdisk /dev/sda
  Device Boot  Start End  Blocks   Id  System
/dev/sda1   1243219535008+  fd  Linux raid
autodetect
/dev/sda22433   17264   1191380405  Extended
/dev/sda3   *   17265   2045125599577+   7  HPFS/NTFS
/dev/sda4   20452   3040079915342+   7  HPFS/NTFS
/dev/sda524332675 1951866   fd  Linux raid
autodetect
/dev/sda62676   17264   117186111   fd  Linux raid
autodetect

#fdisk /dev/sdb
 Device Boot  Start End  Blocks   Id  System
/dev/sdb1   1243219535008+  fd  Linux raid
autodetect
/dev/sdb22433   17264   1191380405  Extended
/dev/sdb3   17265   30400   1055149207  HPFS/NTFS
/dev/sdb524332675 1951866   fd  Linux raid
autodetect
/dev/sdb62676   17264   117186111   fd  Linux raid
autodetect

# mount
/dev/md0 on / type reiserfs (rw,notail)
proc on /proc type proc (rw,noexec,nosuid,nodev)
/sys on /sys type sysfs (rw,noexec,nosuid,nodev)
varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755)
varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
udev on /dev type tmpfs (rw,mode=0755)
devshm on /dev/shm type tmpfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
lrm on /lib/modules/2.6.22-14-generic/volatile type tmpfs (rw)
/dev/md2 on /home type reiserfs (rw)
securityfs on /sys/kernel/security type securityfs (rw)

Could anyone help me to solve this problem? 
thanks

greets
Berni
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  



--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


draft howto on making raids for surviving a disk crash

2008-02-02 Thread Keld Jørn Simonsen
This is intended for the linux raid howto. Please give comments.
It is not fully ready /keld

Howto prepare for a failing disk

The following will describe how to prepare a system to survive
if one disk fails. This can be important for a server which is
intended to always run. The description is mostly aimed at
small servers, but it can also be used for
work stations to protect it for not losing data, and be running even if a 
disk fails. Some recommendations on larger server setup is given
at the end of the howto.

This requires some extra hardware, especially disks, and the description 
will also touch how to mak the most out of the disks, be it in terms of
available disk space, or input/output speed.

1. Creating of partitions

We recommend creating partitions for /boot, root, swap and other file systems.
This can be done by fdisk, parted or maybe a graphical interface
like the Mandriva/PClinuxos harddrake2.  It is recommended to use drives
with equal sizes and performance characteristics.

If we are using the 2 drives sda and sdb, then sfdisk
may be used to make all the partitions into raid partitions:

   sfdisk -c /dev/sda 1 fd
   sfdisk -c /dev/sda 2 fd
   sfdisk -c /dev/sda 3 fd
   sfdisk -c /dev/sda 5 fd
   sfdisk -c /dev/sdb 1 fd
   sfdisk -c /dev/sdb 2 fd
   sfdisk -c /dev/sdb 3 fd
   sfdisk -c /dev/sdb 5 fd

Using:

   fdisk -l /dev/sda /dev/sdb

The partition layout could then look like this:

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   1  37  297171   fd  Linux raid autodetect
/dev/sda2  381132 8795587+  fd  Linux raid autodetect
/dev/sda311331619 3911827+  fd  Linux raid autodetect
/dev/sda41620  121601   9637554155  Extended
/dev/sda51620  121601   963755383+  fd  Linux raid autodetect

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot  Start End  Blocks   Id  System
/dev/sdb1   1  37  297171   fd  Linux raid autodetect
/dev/sdb2  381132 8795587+  fd  Linux raid autodetect
/dev/sdb311331619 3911827+  fd  Linux raid autodetect
/dev/sdb41620  121601   9637554155  Extended
/dev/sdb51620  121601   963755383+  fd  Linux raid autodetect



2. Prepare for boot

The system should be set up to boot from multiple devices, so that
if one disk fails, the system can boot from another disk.

On Intel hardware, there are two common boot loaders, grub and lilo.
Both grub and lilo can only boot off a raid1. they cannot boot off
any other software raid device type. The reason they can boot off
the raid1 is that hey see the raid1 as a normal disk, they only then use
one of the dishs when booting. The boot stage only involves loading the kernel
with a initrd image, so not much data is needed for this. The kernel,
the initrd and other boot files can be put in a small /boot partition.
We recommend something like 200 MB on an ext3 raid1.

Make the raid1 and ext3 filesystem:

   mdadm --create /dev/md0 --chunk=256 -R -l 1 -n 2 /dev/sda1 /dev/sdb1
   mkfs -t ext3 -f /dev/md0

Make each of the disks bootable by lilo:

   lilo -b /dev/sda /etc/lilo.conf1
   lilo -b /dev/sdb /etc/lilo.conf2

Make each of the disks bootable by grub

(to be described)

3. The root file system

The root file system can be on another raid tah the /boot partition.
We recommend an raid10,f2, as the root file system will mostly be reads, and
the raid10,f2 raid type is the fastest for reads, while also sufficient 
fast for writes. Other relevant raid types would be raid10,o2 or raid1.

It is recommended to use the udev file system, as this runs in RAM, and you
thus can avoid a number of read and writes to disk.

It is recommended that all file systems are mounted with the noatime option, 
this 
avoids writing to the filesystem inodes every time a file has been read or 
written.

Make the raid10,f2 and ext3 filesystem:

   mdadm --create /dev/md1 --chunk=256 -R -l 10 -n 2 -p f2 /dev/sda2 /dev/sdb2
   mkfs -t ext3 -f /dev/md1


4. The swap file system

If a disk fails, where processes are swapped to, then all these processes fail.
This may be vital processes for the system, or vital jobs on the system. You 
can prevent 
the failing of the processes by having the swap partitions on a raid. The swap 
area
needed is normally relatively small compared to the overall disk space 
available,
so we recommend the faster raid types over the more space economic. The 
raid10,f2
type seems to be the fastest here, other relevant raid types could be raid10,o2 
or raid1.

Given that you have created a raid array, you can just make the swap partition 

Re: In this partition scheme, grub does not find md information?

2008-02-02 Thread Bill Davidsen

Bill Davidsen wrote:

Moshe Yudkowsky wrote:

Michael Tokarev wrote:



To return to that peformance question, since I have to create at 
least 2 md drives using different partitions, I wonder if it's 
smarter to create multiple md drives for better performance.


/dev/sd[abcd]1 -- RAID1, the /boot, /dev, /bin/, /sbin

/dev/sd[abcd]2 -- RAID5, most of the rest of the file system

/dev/sd[abcd]3 -- RAID10 o2, a drive that does a lot of downloading 
(writes)


I think the speed of downloads is so far below the capacity of an 
array that you won't notice, and hopefully you will use things you 
download more than once, so you still get more reads than writes.



For typical filesystem usage, raid5 works good for both reads
and (cached, delayed) writes.  It's workloads like databases
where raid5 performs badly.


Ah, very interesting. Is this true even for (dare I say it?) 
bittorrent downloads?


What do you have for bandwidth? Probably not more than a T3 (145Mbit) 
which will max out at ~15MB/s, far below the write performance of a 
single drive, much less an array (even raid5).
It has been pointed out that I have a double typo there, I meant OC3 not 
T3, and 155Mbit.  Still, the most someone is likely to have, even in a 
large company.  Still not a large chance of being faster than the disk 
in raid-10 mode.


--
Bill Davidsen [EMAIL PROTECTED]
 Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over... Otto von Bismark 



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: draft howto on making raids for surviving a disk crash

2008-02-02 Thread Janek Kozicki
Keld Jørn Simonsen said: (by the date of Sat, 2 Feb 2008 20:41:31 +0100)

 This is intended for the linux raid howto. Please give comments.
 It is not fully ready /keld

very nice. do you intend to put it on http://linux-raid.osdl.org/ 

As wiki, it will be much easier for our community to fix errors and
add updates.

-- 
Janek Kozicki |
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: draft howto on making raids for surviving a disk crash

2008-02-02 Thread Keld Jørn Simonsen
On Sat, Feb 02, 2008 at 09:32:54PM +0100, Janek Kozicki wrote:
 Keld Jørn Simonsen said: (by the date of Sat, 2 Feb 2008 20:41:31 +0100)
 
  This is intended for the linux raid howto. Please give comments.
  It is not fully ready /keld
 
 very nice. do you intend to put it on http://linux-raid.osdl.org/ 

Yes, that is the intention.

 As wiki, it will be much easier for our community to fix errors and
 add updates.

Agreed. But I will not put it up before I am sure it is reasonably
flawless, ie. it will at least work. I found myself a few errors already.

best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux md and iscsi problems

2008-02-02 Thread Neil Brown
On Friday February 1, [EMAIL PROTECTED] wrote:
 
 
 Summarizing, I have two questions about the behavior of Linux md with  
 slow devices:
 
 1. Is it possible to modify some kind of time-out parameter on the  
 mdadm tool so the slow device wouldn't be marked as faulty because of  
 its slow performance.

No.  md doesn't do timeouts at all.  The underlying device does.
So if you are getting time out errors from the iscsi initiator, then
you need to change the timeout value used by the iscsi initiator.  md
has no part to play in this.  It just sends a request and eventually
gets either 'success' or 'fail'.

 
 2. Is it possible to control the buffer size of the RAID?, in other  
 words, can I control the amount of data I can write to the local disc  
 before I receive an acknowledgment from the slow device when I am  
 using the write-behind option.

No.  md/raid1 simply calls 'kmalloc' to get space to buffer each write
as the write arrives.  If the allocation succeeds, it is used to
perform the write lazily.  If the allocation fails, the write is
performs synchronously.

What did you hope to achieve by such tuning?  It can probably be
added if it is generally useful.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Assemble Same RAID0 More than Once?

2008-02-02 Thread AndrewL733
Is it possible to assemble the same Linux Software RAID0 array two or 
more times simultaneously? The idea would be to let one machine assemble 
the block devices with Read/Write Access and to let additional machines 
assemble the block devices with Read Only Access.


You might think this is a ridiculous question but I have discovered the 
following. If I:


   -- take a Server with two sets of 12 drives, each set connected to a
   Hardware RAID controller and configured as a RAID-5 Array
   -- export each 12-drive array as an individual iSCSI Target with IET
   or SCST
   -- connect the Targets to a Client machine and stripe them together
   as a Software RAID0 ON THE CLIENT MACHINE


I can get about 700 MB/sec writing to the Server and over 600 MB/sec 
reading -- with blockdev and other settings optimized. I'm talking about 
writing one stream of data and reading one stream back.


On the other hand, if I stripe the two Hardware RAID devices together ON 
THE SERVER and export them as a single Target, I only get about half the 
read and write performance on the Client.


Ideally, I would like to get to Two Target Performance with just One 
Target and I'm attacking this problem at the level of the iSCSI Target 
and Initiator software, as well as at the Network level. BUT if I can't 
solve it in any of these places, I'm wondering if there might be a way 
to Assemble the same Software RAID two or more times simultaneously.


Currently with the iSCSI Enterprise Target (and a couple of other Linux 
Targets), it is possible to allow one Client to connect to a target with 
Read/Write access and to give all other Clients Read Only Access. SO, if 
I create a Software RAID on the Server and export that as a Single 
Target, it is possible to give multiple users simultaneous access to the 
Target. However, as I said above, the performance is not optimal. '

'
If I create a Software RAID on the Client out of TWO targets, the 
performance is great. But now it seems much more complicated for two or 
more clients to access the data, because each client has to Assemble the 
Software RAID out of the same two Targets. And only one can have Write 
Access.


My question is, is it possible to Assemble a RAID if you can't write 
anything to the block device or touch its metadata (i.e., to mark that 
it is clean or dirty whatever gets written when the RAID is Assembled). 
In my first attempt to test this I tried making the Targets RO, but 
mdadm gave me a segmentation fault when I tried to Assemble the RAID0. 
And then when I made the Targets R/W again, one of them was missing its 
raid superblock and mdadm couldn't assemble it.


Alternately, is there a safe way to Assemble the same RAID 0 two or more 
times but only mount it R/W ONCE, and in all other instances mount it 
RO. What happens to the RAID metadata if you do that?



Andrew

--

EditShare -- The Smart Way to Edit Together
119 Braintree Street
Suite 402
Boston, MA 02134 
Tel:  +1 617.782.0479

Fax:  +1 617.782.1071

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 1 and grub

2008-02-02 Thread Keld Jørn Simonsen
On Wed, Jan 30, 2008 at 06:47:19PM -0800, David Rees wrote:
 On Jan 30, 2008 6:33 PM, Richard Scobie [EMAIL PROTECTED] wrote:
 
 FWIW, this step is clearly marked in the Software-RAID HOWTO under
 Booting on RAID:
 http://tldp.org/HOWTO/Software-RAID-HOWTO-7.html#ss7.3

A good an extesive reference, but somewhat outdated.

 BTW, I suspect you are missing the command setup from your 3rd
 command above, it should be:
 
 # grub
 grub device (hd0) /dev/hdc
 grub root (hd0,0)
 grub setup (hd0)

I do not grasp this. How and where is it said that two disks are
involved? hda and hdc should both be involved.

Best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid1 and raid 10 always writes all data to all disks?

2008-02-02 Thread Keld Jørn Simonsen
 I found a sentence in the HOWTO:

raid1 and raid 10 always writes all data to all disks

I think this is wrong for raid10.

eg

a raid10,f2 of 4 disks only writes to two of the disks -
not all 4 disks. Is that true?

best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


non-fresh: what?

2008-02-02 Thread Dexter Filmore
[   40.671910] md: md0 stopped.
[   40.676923] md: bindsdd1
[   40.677136] md: bindsda1
[   40.677370] md: bindsdb1
[   40.677572] md: bindsdc1
[   40.677618] md: kicking non-fresh sdd1 from array!

When is a disk non-fresh and what might lead to this? 
Happened about 15 times now since I built the array.

Dex


-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+++ L+++ E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h++ r* y?
--END GEEK CODE BLOCK--

http://www.vorratsdatenspeicherung.de
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 1 and grub

2008-02-02 Thread Richard Scobie

Keld Jørn Simonsen wrote:


# grub
grub device (hd0) /dev/hdc
grub root (hd0,0)
grub setup (hd0)


I do not grasp this. How and where is it said that two disks are
involved? hda and hdc should both be involved.


There are not two disks involved in this instance.

This is used in the scenario where the primary disk in the RAID1 
(/dev/hda), already has grub installed in the MBR and you wish to 
install it on the secondary drive (/dev/hdc).


This then allows for a failed primary drive to be removed and the 
machine to boot from the secondary - (may need BIOS to be set to boot 
from secondary drive).


As an aside, after last weeks discovery that the Fedora 8 install had 
not installed grub on the secondary drive, as part of a RAID 1 install, 
some cursory Googling and searching Redhat's Knowledge base leads me to 
believe that this may well be normal for all Redhat (RHEL/Fedora) RAID1 
installs.


One has nothing to lose by installing grub on the second drive in this 
case and it may save some delay in recovery on losing the primary, 
although as has been pointed out, it is best practice to test missing 
drives as part of initial install testing.


Regards,

Richard
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: assemble vs create an array.......

2008-02-02 Thread Dragos

Hello,
I am not sure if you have received my email from last week with the 
results of the different combinations prescribed (it contained html code).
Anyway, I did a ro mount to check the partition and was happy to see a 
lot of files intact. A few seemed destroyed, but I am not sure. I tried 
a xfs_check on the partition and it told me:


ERROR: The filesystem have valuable metadata changes in a log which 
needs to be replayed. Mount the filesystem to replay the log, and 
unmount it before re-running xfs_check. If you are unable to mount the 
filesystem, then use the xfs_repair -L option to destroy the log and 
attempt a repair.


Since I am unable to mount the partition, shoud I use the -L option with 
xfs_repair, or let it run without it?
Again, please let me know if I should resend my previous email with the 
log file of xfs_repair -n.


Thank you for your time,
Dragos


David Chinner wrote:

On Thu, Dec 06, 2007 at 07:39:28PM +0300, Michael Tokarev wrote:
  

What to do is to give repairfs a try for each permutation,
but again without letting it to actually fix anything.
Just run it in read-only mode and see which combination
of drives gives less errors, or no fatal errors (there
may be several similar combinations, with the same order
of drives but with different drive missing).



Ugggh. 

  

It's sad that xfs refuses mount when structure needs
cleaning - the best way here is to actually mount it
and see how it looks like, instead of trying repair
tools. 



It self protection - if you try to write to a corrupted filesystem,
you'll only make the corruption worse. Mounting involves log
recovery, which writes to the filesystem

  

Is there some option to force-mount it still
(in readonly mode, knowing it may OOPs kernel etc)?



Sure you can: mount -o ro,norecovery dev mtpt

But it you hit corruption it will still shut down on you. If
the machine oopses then that is a bug.

  

thread prompted me to think.  If I can't force-mount it
(or browse it using other ways) as I can almost always
do with (somewhat?) broken ext[23] just to examine things,
maybe I'm trying it before it's mature enough? ;)



Hehe ;)

For maximum uber-XFS-guru points, learn to browse your filesystem
with xfs_db. :P

Cheers,

Dave.
  

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html