split RAID1 during backups?

2005-10-24 Thread Jeff Breidenbach

Hi all,

I have a two drive RAID1 serving data for a busy website. The
partition is 500GB and contains millions of 10KB files. For reference,
here's /proc/mdstat

Personalities : [raid1]
md0 : active raid1 sdc1[0] sdd1[1]
  488383936 blocks [2/2] [UU]

For backups, I set the md0 partition to readonly and then use dd_rescue
+ netcat to copy the parition over a gigabit network. Unfortuantely,
this process takes almost 10 hours. I'm only able to copy about 18MB/s
from md0 due to disk contention with the webserver. If I had the full
attention of a single disk, I could read at nearly 60MB/s.

So - I'm thinking of the following backup scenario.  First, remount
/dev/md0 readonly just to be safe. Then mount the two component
paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
component partition, and tell the backup process to work from the
other component partition. Once the backup is complete, point the
webserver back at /dev/md0, unmount the component partitions, then
switch read-write mode back on.

Am I insane? 

Everything on this system seems bottlenecked by disk I/O. That
includes the rate web pages are served as well as the backup process
described above. While I'm always hungry for perforance tips, faster
backups are the current focus. For those interested in gory details
such as drive types, NCQ settings, kernel version and whatnot, I
dumped a copy of dmesg output here: http://www.jab.org/dmesg

Cheers,
Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread Jurriaan Kalkman

 Hi all,

 I have a two drive RAID1 serving data for a busy website. The
 partition is 500GB and contains millions of 10KB files. For reference,
 here's /proc/mdstat

 Personalities : [raid1]
 md0 : active raid1 sdc1[0] sdd1[1]
   488383936 blocks [2/2] [UU]

 For backups, I set the md0 partition to readonly and then use dd_rescue
 + netcat to copy the parition over a gigabit network. Unfortuantely,
 this process takes almost 10 hours. I'm only able to copy about 18MB/s
 from md0 due to disk contention with the webserver. If I had the full
 attention of a single disk, I could read at nearly 60MB/s.

First of all, if the data is mostly static, rsync might work faster. Don't
feed rsync millions of files in one go - try to split it in separate
processes for say all files starting with a, all files starting with b
etc.

 So - I'm thinking of the following backup scenario.  First, remount
 /dev/md0 readonly just to be safe. Then mount the two component
 paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
 component partition, and tell the backup process to work from the
 other component partition. Once the backup is complete, point the
 webserver back at /dev/md0, unmount the component partitions, then
 switch read-write mode back on.

 Am I insane?

It doesn't sound insane. If it's actually fast, is something only you can
test on your hardware.

By the way, it used to be with regular IDE disks that using hdc and hdd
together on a single wire was a sure way to get a slow system. I take it
sdc and sdd using SATA don't influence each other?

Good luck,
Jurriaan

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread Brad Campbell

Jeff Breidenbach wrote:


So - I'm thinking of the following backup scenario.  First, remount
/dev/md0 readonly just to be safe. Then mount the two component
paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
component partition, and tell the backup process to work from the
other component partition. Once the backup is complete, point the
webserver back at /dev/md0, unmount the component partitions, then
switch read-write mode back on.


Why not do something like this ?

mount -o remount,ro /dev/md0 /web
mdadm --fail /dev/md0 /dev/sdd1
mdadm --remove /dev/md0 /dev/sdd1
mount -o ro /dev/sdd1 /target

do backup here

umount /target
mdadm -add /dev/md0 /dev/sdd1
mount -o remount,rw /dev/md0 /web

That way the web server continues to run from the md..
However you will endure a rebuild on md0 when you re-add the disk, but given everything is mounted 
read-only, you should not practically be doing anything and if you fail a disk during the rebuild 
the other disk will still be intact.


I second jurriaan's vote for rsync also, but I would be inclined just to let it loose on the whole 
disk rather than break it up into parts.. but then I have heaps of ram too..


Regards,
Brad
--
Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so. -- Douglas Adams
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread Jeff Breidenbach

First of all, if the data is mostly static, rsync might work faster.

Any operation that stats the individual files - even to just look at
timestamps - takes about two weeks. Therefore it is hard for me to see
rsync as a viable solution, even though the data is mostly
static. About 400,000 files change between weekly backups.

I take it sdc and sdd using SATA don't influence each other?

Correct.

However you will endure a rebuild on md0 when you re-add the disk, but
given everything is mounted read-only, you should not practically be
doing anything

If the rebuild operation is a no-op, then that sounds like a great
idea. If the rebuild operation requires scanning over all data in both
drives, I think that's going to be at least as expensive as the
current 10 hour process.

Thanks for the suggestions so far.

Cheers,
Jeff

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread dean gaudet
On Mon, 24 Oct 2005, Jeff Breidenbach wrote:

 First of all, if the data is mostly static, rsync might work faster.
 
 Any operation that stats the individual files - even to just look at
 timestamps - takes about two weeks. Therefore it is hard for me to see
 rsync as a viable solution, even though the data is mostly
 static. About 400,000 files change between weekly backups.

taking a long time to stat individual files makes me wonder if you're
suffering from atime updates and O(n) directory lookups... have you tried
this:

- mount -o noatime,nodiratime
- tune2fs -O dir_index  (and e2fsck -D)
  (you need recentish e2fsprogs for this, and i'm pretty sure you want
  2.6.x kernel)

a big hint you're suffering from atime updates is write traffic when your
fs is mounted rw, and your static webserver is the only thing running (and
your logs go elsewhere)... atime updates are probably the only writes
then.  try iostat -x 5.

a big hint you're suffering from O(n) directory lookups is heaps of system
time... (vmstat or top).


On Mon, 24 Oct 2005, Brad Campbell wrote:

 mount -o remount,ro /dev/md0 /web
 mdadm --fail /dev/md0 /dev/sdd1
 mdadm --remove /dev/md0 /dev/sdd1
 mount -o ro /dev/sdd1 /target
 
 do backup here
 
 umount /target
 mdadm -add /dev/md0 /dev/sdd1
 mount -o remount,rw /dev/md0 /web

the md event counts would be out of sync and unless you're using bitmapped 
intent logging this would cause a full resync.  if the raid wasn't online 
you could probably use one of the mdadm options to force the two devices 
to be a sync'd raid1 ... but i'm guessing you wouldn't be able to do it 
online.

other 2.6.x bleeding edge options are to mark one drive as write-mostly
so that you have no read traffic competition while doing a backup... or
just use the bitmap intent logging and a nbd to add a third, networked,
copy of the drive on another machine.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread John Stoffel
 Jeff == Jeff Breidenbach jeff@jab.org writes:


Jeff # mount | grep md0
Jeff /dev/md0 on /data1 type reiserfs (rw,noatime,nodiratime)

Ah, you're using reiserfs on here.  It may or may not be having
problems with all those files per-directory that you have.  Is there
any way you can split them up more into sub-directories?  

Old news servers used to run into this exact same problem, and what
they did was move all files starting with 'a' into the 'a/' directory,
all files starting with 'b' into b/... etc.  You can go down as many
levels as you want.  

Jeff Individual directories contain up to about 150,000 files. If I
Jeff run ls -U on all directories, it completes in a reasonably
Jeff amount of time (I forget how much, but I think it is well under
Jeff an hour). Reiserfs is supposed to be good at this sort of
Jeff thing. If I were to stat each file, then it's a different story.

Do you stat the files in inode order (not sure how reiserfs stores
files), when you're doing a readdir() on the directory contents?  You
don't want to bother sorting at all, you just want to pull them off
the disk as efficiently as possible.

I think you'll get alot more performance out of your system if you can
just re-do how the application writes/reads the files you're using.
It almost sounds like some sort of cache system...

The other idea would be to use 'inotify' and just copy those files
which change to the cloned box.

Another idea, which would require more hardware would be to make some
readonly copies of the system and have all reads go there, and only
writes goto the master system.  If the master dies, you just promote a
slave into that role.  If a slave dies, you have extras running
around.  Then you could do your backups against the readonly systems,
in parallel to get the most performance out of your backups.

But knowing more about the application would help.  Millions of tiny
files aren't optimal these days.  

Oh yeah, what kinds of block size are you using on the filesystem?
And how many disks?  Splitting the load across more smaller disks will
probably help as well, since I suspect that your times are dominated
by seek and directory overhead, not actually reading of all these tiny
files.

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread dean gaudet
On Mon, 24 Oct 2005, Jeff Breidenbach wrote:

 Dean, the comment about write-mostly is confusing to me.  Let's say
 I somehow marked one of the component drives write-mostly to quiet it
 down. How do I get at it? Linux will not let me mount the component
 partition if md0 is also mounted. Do you think write-mostly or
 write-behind are likely enough to be magic bullets that I should
 learn all about them?

if one drive is write-mostly, and you remount the filesystem read-only... 
then no writes should be occuring... and you could dd from the component 
drive directly and get a consistent fs image.  (i'm assuming you can 
remount the filesystem read-only for the duration of the backup because it 
sounds like that's how you do it now; and i'm assuming you're happy enough 
with your dd_rescue image...)

myself i've been considering a related problem... i don't trust LVM/DM 
snapshots in 2.6.x yet, and i've been holding back a 2.4.x box waiting for 
them to stabilise... but that seems to be taking a long time.  the box 
happens to have a 3-way raid1 anyhow, and 2.6.x bitmapped intent logging 
would give me a great snapshot backup option:  just break off one disk 
during the backup and put it back in the mirror when done.

there's probably one problem with this 3-way approach... i'll need some 
way to get the fs (ext3) to reach a safe point where no log recovery 
would be required on the disk i break out of the mirror... because under 
no circumstances do you want to write on the disk while it's outside the 
mirror.  (LVM snapshotting in 2.4.x requires a VFS lock patch which does 
exactly this when you create a snapshot.)


 John, I'm using 4KB blocks in reiserfs with tail packing.

i didn't realise you were using reiserfs... i'd suggest disabling tail 
packing... but then i've never used reiser, and i've only ever seen 
reports of tail packing having serious performance impact.  you're really 
only saving yourself an average of half a block per inode... maybe try a 
smaller block size if the disk space is an issue due to lots of inodes.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread Thomas Garner
Should there be any consideration for the utilization of the gigabit 
interface that is passing all of this backup traffic, as well as the 
speed of the drive that is doing all of the writing during this 
transaction?  Is the 18MB/s how fast the data is being copied over the 
network, or is it some metric within the host system?


Thomas

Jeff Breidenbach wrote:

Hi all,

I have a two drive RAID1 serving data for a busy website. The
partition is 500GB and contains millions of 10KB files. For reference,
here's /proc/mdstat

Personalities : [raid1]
md0 : active raid1 sdc1[0] sdd1[1]
  488383936 blocks [2/2] [UU]

For backups, I set the md0 partition to readonly and then use dd_rescue
+ netcat to copy the parition over a gigabit network. Unfortuantely,
this process takes almost 10 hours. I'm only able to copy about 18MB/s
from md0 due to disk contention with the webserver. If I had the full
attention of a single disk, I could read at nearly 60MB/s.

So - I'm thinking of the following backup scenario.  First, remount
/dev/md0 readonly just to be safe. Then mount the two component
paritions (sdc1, sdd1) readonly. Tell the webserver to work from one
component partition, and tell the backup process to work from the
other component partition. Once the backup is complete, point the
webserver back at /dev/md0, unmount the component partitions, then
switch read-write mode back on.

Am I insane? 


Everything on this system seems bottlenecked by disk I/O. That
includes the rate web pages are served as well as the backup process
described above. While I'm always hungry for perforance tips, faster
backups are the current focus. For those interested in gory details
such as drive types, NCQ settings, kernel version and whatnot, I
dumped a copy of dmesg output here: http://www.jab.org/dmesg

Cheers,
Jeff
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: split RAID1 during backups?

2005-10-24 Thread Jeff Breidenbach

On 10/24/05, Thomas Garner [EMAIL PROTECTED] wrote:
 Should there be any consideration for the utilization of the gigabit
 interface that is passing all of this backup traffic, as well as the
 speed of the drive that is doing all of the writing during this
 transaction?  Is the 18MB/s how fast the data is being copied over the
 network, or is it some metric within the host system?

The switched gigabit network is plenty fast. The bottleneck is
reading from the RAID1 while it is under contention.

Here are measurements from transferring a chunk of data from
/dev/zero, a single unmounted drive, and RAID1. Measurements are
reported by dd_rescue and reflect how fast data is moving over the
network. I was careful to use smart command line options with
dd_rescue, avoid contaminating Linux's disk cache, and make sure
results were repeatable.

MB/s   Operation
   
72.0   dd-rescue /dev/zero - | netcat
61.8   dd-rescue [unmounted single drive]  - | netcat
18.8   dd-rescue md0 - | netcat

dd_rescue v1.11 options:
 -B 4096 -q  -l -d -s 11G -m 200M -S 0
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html