Re: draft howto on making raids for surviving a disk crash

2008-02-06 Thread Michal Soltys

Keld Jørn Simonsen wrote:


Make each of the disks bootable by grub

(to be described)



It would probably be good to show how to use grub shell's install
command. It's the most flexible way and give the most (or rather total)
control. I could write some examples.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


udev events or vgscan on raid partitions right after assembly (delay bug?)

2007-11-10 Thread Michal Soltys
Originally I've thought, that delayed uevents regarding raid partitions were 
not related to md. To recap: if we assemble some md array as partitionable 
array, add/change uevents regarding its partitions will not happen right 
after assembly, only after next array related operation (mdadm -D, fdisk, 
etc.). Similar thing happen when array is being stopped - remove uevents for 
partitions will happen later. I made a post some time ago about it.



But now I have noticed, that analogus thing happens in the latest LVM 
(2.02.29) - as it got the ability to use sysfs data to choose which devices 
to consider during vgscan.


Now - if we're fresh after assembling some array as partitionable raid, and 
on one of the partitions there is lvm group - vgscan -ay will not find any 
lvm group on that partition.


It will find it during second attempt though, or if there was some array 
related operation, i.e. -


fdisk /dev/md/d0
mdadm -D /dev/md/d0

- were ran after the assembly. Or if using sysfs is explicitely prohibited 
in lvm.conf (sysfs_scan = 0), so it will consider all the nodes under /dev 
as per lvm.conf, without using /sys/block as a hint.


If to peek into /sys/block/md* right after the assembly of the array, 
there're no partition related directories. For example, if mdadm -As 
assembles /dev/md/d0 with (already present) 4 partitions, /sys/block/md_d0 
will have no md_d0p{1,2,3,4} dirs after the command. They will show up after 
next array related command.


Analogously, if we stop the array with mdadm -S /dev/md/d0 , partitions will 
remain until i.e. repeated mdadm -S /dev/md/d0


I'm not sure if this is the proper place to report this kinda of things 
(maybe to kernel list ?), but I believe it to be sort of a bug. Any i.e. 
hotplugged hard drive with partitions won't have those issues, be it udev, 
lvm, etc.


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stride / stripe alignment on LVM ?

2007-11-02 Thread Michal Soltys

Janek Kozicki wrote:


And because LVM is putting its own metadata on /dev/md1, the ext3
partition is shifted by some (unknown for me) amount of bytes from
the beginning of /dev/md1.



It seems to be multiply of 64KiB. You can specify it during pvcreate, with 
--metadatasize option. It will be rounded to multiply of 64 KiB, and will 
add another 64 KiB on its own. Extents will follow directly after that. 4 
sectors mentioned in pcvreate's man page are covered by that option as well.


So i.e. if you have chunk 1MiB, then pvcreate ... --metadatasize 960K ...
should give you chunk-aligned logical volumes, assuming you have actual 
extent size set appropriately as well. If you use default chunk size, you 
shouldn't need any extra options.


Make sure if it really is this way, after pv/vg/first lv creation. I found 
it experimentally, so ymmv.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


chunk size (was Re: Time to deprecate old RAID formats?)

2007-10-19 Thread Michal Soltys

Doug Ledford wrote:

course, this comes at the expense of peak throughput on the device.
Let's say you were building a mondo movie server, where you were
streaming out digital movie files.  In that case, you very well may care
more about throughput than seek performance since I suspect you wouldn't
have many small, random reads.  Then I would use a small chunk size,
sacrifice the seek performance, and get the throughput bonus of parallel
reads from the same stripe on multiple disks.  On the other hand, if I



Out of curiosity though - why wouldn't large chunk work well here ? If you 
stream video (I assume large files, so like a good few MBs at least), the 
reads are parallel either way.


Yes, the amount of data read from each of the disks will be in less perfect 
proportion than in small chunk size scenario, but it's pretty neglible. 
Benchamrks I've seen (like Justin's one) seem not to care much about chunk 
size in sequential read/write scenarios (and often favors larger chunks). 
Some of my own tests I did few months ago confirmed that as well.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Partitionable raid array... How to create devices ?

2007-10-16 Thread Michal Soltys

BERTRAND Joël wrote:


Well, /dev/mdp0 is created. But what's about /dev/mdp0p1 ? I believe 
that mdadm has to create required devices. I don't understand where is 
my mistake. Any idea ?




Two things come to my mind:

- udev messing up with what mdadm is doing (but this isn't the moment 
where the problems should show up). Either way, check them out to avoid 
subtle-looking problems.


- long time ago I noticed, that mdp parameter has problems with creating 
nodes for raid partitions on arrays with non-standard names. In my case 
it did work when explicit number of partitions was specified, but maybe 
your case is different. Try --auto=part4 or --auto=p4 - see if they 
work. For the reference :

http://marc.info/?l=linux-raidm=118720367217616w=2

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backups w/ rsync

2007-09-30 Thread Michal Soltys

Wolfgang Denk wrote:

Dear Bill,

in message [EMAIL PROTECTED] you wrote:


Be aware that rsync is useful for making a *copy* of your files, which 
isn't always the best backup. If the goal is to preserve data and be 
able to recover in time of disaster, it's probably not optimal, while if 
you need frequent access to old or deleted files it's fine.


If you want to do real backups you should use real tools, like bacula
etc.



I wouldn't agree here. All depends on how you organize yuor things, write
scripts, and so on. It isn't any less real solution than amanda or bacula.
It's much more DIY solution though, so not everyone will be inclined to use it.

ps.
Sorry for offtopic. Last in this subject from me.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backups w/ rsync

2007-09-28 Thread Michal Soltys

Goswin von Brederlow wrote:


Thanks, should have looked at --link-dest before replying. I wonder
how long rsync had that option. I wrote my own rsync script years
ago. Maybe it predates this.



According to news file, since ~ 2002-9, so quite a bit of time.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backups w/ rsync

2007-09-28 Thread Michal Soltys

Goswin von Brederlow wrote:


I was thinking Michal Soltys ment it this way. You can probably
replace the cp invocation with an rsync one but that hardly changes
things.

I don't think you can do this in a single rsync call. Please correct
me if I'm wrong.



something along this way:

rsync other options --link-dest /backup/2007-01-01/ \
rsync://[EMAIL PROTECTED]/module /backup/2007-01-02/

It will create backup of .../module in ...-02 hardlinking to ...-01 (if 
possible).


So, no need for cp -l. There's similar example in rsync man. Also - 
multiple --link-dest are supported too.


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help: very slow software RAID 5.

2007-09-27 Thread Michal Soltys

Dean S. Messing wrote:


I don't see how one would do incrementals.  My backup system uses
currently does a monthly full backup,   a weekly level  3  (which
saves everything that has changed since the last level 3 a week ago) and
daily level 5's (which save everything that changed today).



Rsync is fantastic tool for incremental backups. Everything that didn't 
change can be hardlinked to previous entry. And time of performing the 
backup is pretty much neglible. Essentially - you have equivalent of 
full backups at almost minimal time and space cost possible.




-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: md device naming question

2007-09-24 Thread Michal Soltys

Nix wrote:

On 19 Sep 2007, maximilian attems said:



i presume it may also be /sys/block/mdNN ?


That's it, e.g. /sys/block/md0. Notable subdirectories include holders/
(block devices within the array, more than one if e.g. LVM is in use),



Also, if you mount the raid as a partitionable one, /sys/block/md_dNN ;

mdadm has pretty decent handling of all the names (check out auto options, 
both usable in mdadm.conf and from command line). Also from my experience, 
if you try to use udev on partitionable arrays, partition related uevents 
will be delayed until next mdadm invocation, or some partition related 
command (like fdisk).



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Help: very slow software RAID 5.

2007-09-20 Thread Michal Soltys

Dean S. Messing wrote:


Also (as I asked) what is the downside?  From what I have read, random
access reads will take a hit.  Is this correct?

Thanks very much for your help!

Dean



Besides bonnie++ you should probably check iozone. It will allow you to test 
very specific settings quite thoroughly. Although with current 
multi-gigabyte memory systems the test runs may take a bit time.


http://www.iozone.org/

There's nice introduction to the progam there, along with some example graph 
results.



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm --auto=mdp on non-standard named arrays

2007-08-15 Thread Michal Soltys
Just a tiny detail, but it looks like -auto=mdp won't create additional 
device nodes for raid's partitions (unless explicitely specified by number), 
when used with non-standard name, i.e.


mdadm -C /dev/md/abc -l0 -n2 /dev/sda2 /dev/sdb2 --auto=mdp

will only create /dev/md/abc node. Remaining p/part/partition have no 
problems here.


Tested with mdadm v2.6.2

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm (on partitionable arrays) and partition uevents

2007-08-11 Thread Michal Soltys
During doing some tests, I've found a slight delay of partition related 
uevents, when raid devices are assembled/stopped.


For example - consider md/d0 with 1 partition created. The array is unassembled.

1) mdadm -A /dev/md/d0

will generate change even properly for md/d0, but add' uevent for its 
partition, will be delayed until some next mdadm operation or some partition 
related operation - fdisk, etc.


2) mdadm -S /dev/md/d0

Analogously, remove for partitions will be run on some later mdadm 
command, or even unsuccessful attempt to fdisk that array.


To not cause any potential conflicts, mdadm.conf had symlinks=no, and mdadm 
was used only with directory based names, whereas udev had no custom rules 
regarding any md devices (so only standard /dev/md_d[0-9]* and /dev/md[0-9]* 
were created).


Tested under udev 114, mdadm 2.6.2 on two different systems (one arch 64bit, 
the other 32bit).


I'm not actually sure if it's even a bug in the first place, or if it has 
anything to do with mdadm, but reporting it nonetheless.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mdadm and superblock format 1

2007-08-07 Thread Michal Soltys
I've noticed, that whenever I use 1.1 or 1.2 superblock format, mdadm 
will always report version 1, when i.e. mdadm -E --scan is issued.


Also, in these two cases, mdadm -A --scan won't work, unless proper 
ARRAY line is present in mdadm.conf (and if metadata is part of the 
description, it must be either 1.1 or 1.2).



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID on partitions and partitions on RAID

2007-07-29 Thread Michal Soltys

Cry Regarder wrote:


Thanks!  A couple questions:

1.  Are you sharing that spare with an other array?  If not, why not do a raid-6
instead of a raid-5?



No, the spare is not shared. Now that I think about it and you reminded 
about raid-6...


I can think of one small plus of my setup - that spare is not spinning
(it's stopped with sdparm -C stop). So the disk should be in almost unused
condition if the need for it arrives. Excluding vibrations from the rest of 
the disks, and barely active electronics.



2.  I noticed that you built your raid partitions on corresponding disk
partitions.  Why do that instead of making one monolithic raid volume and then
partitioning that into the desired pieces?



few reasons:

1) I wanted /boot out of any raid
2) I wanted LVM for some of the partitions
3) I prefered regular mbr on the whole disks.
4) I wanted properly aligned XFS, so I can use its su/sw options

Alternatives:

- create separate md0 (for lvm) md1 (for xfs), but it seemed wrong when 
there is support for partitionable arrays

- or I could just put everything on lvm and don't bother with xfs alignment.

All things considered, it looked like a decent compromise. Of course, I'm 
open for suggestions.




1.  Adjust the partition table on each of the component disks so that I can
assemble the drives from /dev/sd?1 instead of /dev/sd?



No can do here, afaik. When you assemble from /dev/sd? , there're no 
partitions to start with on those disks. Either way, I'm not aware of any 
tools allowing - shrink raid, shift it, create fitting mbr ...



2.  Adjust my /dev/md0 so that it is partitioned into /dev/md0_1 /dev/md0_2 or
something of that ilk.



No idea if you can easily switch from non-partitionable to partitionable raid.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID on partitions and partitions on RAID

2007-07-28 Thread Michal Soltys

Cry Regarder wrote:



Partitions on RAID:

Back to when I constructed the array, I placed an ext3 partition directly on
/dev/md0.  Since construction, I have expanded the array by two disks, but have
not at this time resized the ext3 partition.  


What I would like to do is put a swap partition and XFS partition into that 1TB
free space.  However, I have no idea how to go about doing that.  If I had
originally used LVM, it would be easy, but I had read about some performance
hits re ext3 on lvm on raid so I decided to be simple.  Any advice?



When I was doing recently some tests, I haven't noticed any significant 
performance differences if lvm2 is between filesystem and raid, or not. Note 
 though, that my tests weren't too extensive (yet). Also, I use large lv 
extents  - 512MiB.


Either way, you can use partitionable raid arrays. Today I've set following 
config on one of the new machines: raid 5 from 4x500GB disks + one hot 
spare. Each disk is partitioned in the same way - 64mb boot partition 
identical on each disk, and each disk is bootable (sdX1), swap (sdX2), 
partitionable (sdX3). sd[abcde]3 raid has GPT partition - 1st one is used by 
LVM2 for the usual stuff (root,usr,var,home.. just 24GB assigned here), 
the 2nd one is big XFS for my space-hungry users. The XFS partition is 
carefully aligned to be always on the stride boundary, up to 9 disks (as I 
plan do add new disks and grow that XFS later, while having its su/sw 
properly set).


You will of course need initramfs to boot from something like that. 
Depending on your distro (and how much can you dig into its config files), 
YMMV. (small remark - udev and mdadm, depending on their rules, might get in 
one others way - check / adjust both of them carefully).



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID on partitions and partitions on RAID

2007-07-28 Thread Michal Soltys

Michal Soltys wrote:

Cry Regarder wrote:



one hot spare. Each disk is partitioned in the same way - 64mb boot 
partition identical on each disk, and each disk is bootable (sdX1), swap 
(sdX2), partitionable (sdX3). sd[abcde]3 raid has GPT partition - 1st 
one is used by LVM2 for the usual stuff (root,usr,var,home.. just 24GB 



Correction - The swap is also on raid (on partition governed by lvm). 
Initial idea was well, risky (i.e. what if a disk failed that had its swap 
actually used).

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID and filesystem setup / tuning questions regarding specific scenario

2007-07-21 Thread Michal Soltys
I'm making raid 5 consisting of 4 disks, 500 GB each, with one spare 
standby. In future to be expanded with extra disks. Server will be running 
with ups, and with extra machine doing daily rsync backup of the system and 
stuff deemed important. On top of the raid, there will probably be simple 
lvm2 setup - linear with large extents (512 MB or 1 GB). Files will be 
accessed primarily through samba shares, by 10 - 20 people at the same time. 
Server will have 2GB ram (increasing it, if it's necessary, won't be a 
problem), core2 duo cpu, running 64bit linux. Disks are on ICH9R, in ahci mode.


Currently, users have ~200 GB partition, almost filled up, with ca. 700,000 
files - which gives rather small value ~300 KB / file. From what they say, 
it will be more or less the same on the new machine.



The two basic questions - raid parameters and filesystem choice. I'll of 
course make tests, but potential combination of parameters is not so small, 
so I'm looking for starting tips.



Regarding parameters - I've googled / searched this list, and found some
suggestions regarding parameters like chunk_size, nr_requests, read ahead 
setting, etc. Still I have feeling, that suggested settings are rather 
tailored more towards big filesizes - and it's certainly not the case here. 
Wouldn't values - like 64 MB read ahead on whole raid device, 1MB chunks - a 
bit overkill in my scenario ? Maybe someone is running something similar, 
and could provide some insights ?



Regarding filesystem - I've thought about actually using ext3 without the 
journal (so effectively ext2), considering ups and regular backups to 
separate machine. Also, judging from what I've found so far, XFS is not that 
great in case of many small files. As for ext2/3, googling revealed 
documents like http://ext2.sourceforge.net/2005-ols/paper-html/index.html , 
which are quite encouraging (besides, document is already 2 years old). 
Another advantage of ext2/3, is that it can be shrinked, whereas XFS cannot,

afaik.

How much of a performance gain would I get from using journalless ext3 ? 
Either way, is testing ext2/3 worthwhile here, or should I jump right to XFS ?



Misc. more specific questions:

- lvm2

In my scenario - linear, big extents - is there a [significant] performance 
hit coming from using it ? Any other caveats I should be aware of ?


- read ahead

All the elements of the final setup - drives themselves, RAID, LVM2 - have 
read ahead parameter. How should they be set, relatively to each other ? 
Also, basing on  http://www.rhic.bnl.gov/hepix/talks/041019pm/schoen.pdf - 
too big read ahead can hurt performance in multiuser setup. And as mentioned 
above - 64 MB seems a bit crazy... Availabilty of memory is not an issue though.


Any other stacking parameters that should be set properly ?

- nr_requests

I've also found suggestion about increasing this parameter - to 256 or 512, 
in http://marc.info/?l=linux-raidm=118302016516950w=2 . It was also 
mentioned in some 3ware technical doc. Any comments on that ?


- max_sectors_kb

Any suggestions regarding this one ? I've found suggestions to set it to 
chunk size, but it seems strange (?)


- journal on separate devices

Generaly - how fast should be the extra drive, compared to the RAID array ? 
Also how big ? Any side effects of setting it big (on ext3, there's limit 
when the journal is set on the main partition due to memory requirements; 
but there's no limit when it's on separate drive) ?



-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html