Re: How to create initrd.img to boot LVM-on-RAID0?

2007-10-03 Thread Dean S. Messing

Ian Ward Comfort wrote:
: On Oct 3, 2007, at 10:18 AM, Dean S. Messing wrote:
: > I've created a software RAID-0, defined a Volume Group on in with  
: > (currently) a single logical volume, and copied my entire  
: > installation onto it, modifying the copied fstab to reflect where  
: > the new "/" is.
: 
: > mkinitrd --preload raid0  --with=raid0 initrd_raid.img 2.6.22.5-76-fc7
: 
: > But the thing won't complete the boot process. From the boot  
: > messages it appears to not be starting the array, so when it goes  
: > to scan for LVs it doesn't find the one that's sitting on top of  
: > the array where root lives.
: >
: > Are there instructions for how to make this work?  I've googled for  
: > a couple of hours, tried a bunch of stuff, but can't get it to  
: > work.  From what I've read I suspect I must hand-tweek the "init"  
: > file in the initrd.
: 
: You're probably correct in that your initrd's nash script currently  
: lacks any "raidautorun" directives, since mkinitrd is looking at your  
: old fstab and finding that your old root device is not on RAID.
: 
: Instead of specifying modules yourself and hand-tweaking the nash  
: script, just ask mkinitrd to figure out the correct modules to load  
: and RAID devices to start on its own, given your chosen boot device,  
: with the --fstab option.
: 
:   mkinitrd --fstab=/newroot/etc/fstab initrd_raid.img 2.6.22.5-76-fc7


That did the trick!
Thanks a lot, Ian.


Dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to create initrd.img to boot LVM-on-RAID0?

2007-10-03 Thread Dean S. Messing

Goswin von Brederlow wrote:
: "Dean S. Messing" <[EMAIL PROTECTED]> writes:
: 
: > I'm having the devil of a time trying to boot off
: > an  "LVM-on-RAID0" device on my Fedora 7 system.
: >
: > I've created a software RAID-0, defined a Volume Group on in with
: > (currently) a single logical volume, and copied my entire
: > installation onto it, modifying the copied fstab to reflect where
: > the new "/" is.
: >
: > I created a new initrd with:
: >
: > mkinitrd --preload raid0  --with=raid0 initrd_raid.img 2.6.22.5-76-fc7
: >
: > The LVM modules are getting included in the initrd "for free" because
: > I'm currently running on a non-raid LV-managed file system.
: >
: > I added a stanza to grub.conf for the new initrd.img.
: >
: > But the thing won't complete the boot process.
: >>From the boot messages it appears to not
: > be starting the array, so when it goes to scan for LVs it doesn't
: > find the one that's sitting on top of the array where root lives.
: 
: Maybe your lvm.conf filters it out or the devices are missing to
: access it?

I am not at a point in my meagre understanding to fool with it.

When I unpack the initrd, I see the raid modules but nothing in the
"init" script that activtates the array.  The lvm.conf file is the
default one..

: > Are there instructions for how to make this work?  I've googled for
: > a couple of hours, tried a bunch of stuff, but can't get it to
: > work.  From what I've read I suspect I must hand-tweek the "init"
: > file in the initrd.
: >
: > Surely there is "a right way" to do this.
: 
: Install debian, live happily ever after. :)

No comment.

: By the way, why bother with raid0? lvm can do striping on its own
: saving you one layer alltogether and you already have lvm working
: right. Why needlessly add problems to your working system?

I'm experimenting (and learning) right now.  I'm aware of the striped
lv option, which I will also try.  Thanks.

By the way the current system is working with LVM because at
installation time I told it to use LVM on the installed system.

Dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
On Wed, 3 Oct 2007 13:36:39 -0700, David Rees wrote:

> > # xfs_db -c frag -f /dev/md0
> > actual 1828276, ideal 1708782, fragmentation factor 6.54%
> >
> > Good or bad?
> 
> Not bad, but not that good, either. Try running xfs_fsr into a nightly
> cronjob. By default, it will defrag mounted xfs filesystems for up to
> 2 hours. Typically this is enough to keep fragmentation well below 1%.

Worth a shot.

> -Dave

Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
On Wed, 3 Oct 2007 16:35:21 -0400 (EDT), Justin Piszcz wrote:

> What does cat /sys/block/md0/md/mismatch_cnt say?

$ cat /sys/block/md0/md/mismatch_cnt
0

> That fragmentation looks normal/fine.

Cool.

> Justin.

Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Richard Scobie

Andrew Clayton wrote:


Yeah, I was wondering about that. It certainly hasn't improved things,
it's unclear if it's made things any worse..



Many 3124 cards are PCI-X, so if you have one of these (and you seem to 
be using a server board which may well have PCI-X), bus performance is 
not going to be an issue.


Regards,

Richard
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread David Rees
On 10/3/07, Andrew Clayton <[EMAIL PROTECTED]> wrote:
> On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:
> > Have you checked fragmentation?
>
> You know, that never even occurred to me. I've gotten into the mind set
> that it's generally not a problem under Linux.

It's probably not the root cause, but certainly doesn't help things.
At least with XFS you have an easy way to defrag the filesystem
without even taking it offline.

> # xfs_db -c frag -f /dev/md0
> actual 1828276, ideal 1708782, fragmentation factor 6.54%
>
> Good or bad?

Not bad, but not that good, either. Try running xfs_fsr into a nightly
cronjob. By default, it will defrag mounted xfs filesystems for up to
2 hours. Typically this is enough to keep fragmentation well below 1%.

-Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Justin Piszcz

What does cat /sys/block/md0/md/mismatch_cnt say?

That fragmentation looks normal/fine.

Justin.

On Wed, 3 Oct 2007, Andrew Clayton wrote:


On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:


Have you checked fragmentation?


You know, that never even occurred to me. I've gotten into the mind set
that it's generally not a problem under Linux.


xfs_db -c frag -f /dev/md3

What does this report?


# xfs_db -c frag -f /dev/md0
actual 1828276, ideal 1708782, fragmentation factor 6.54%

Good or bad?

Seeing as this filesystem will be three years old in December, that
doesn't seem overly bad.


I'm currently looking to things like

http://lwn.net/Articles/249450/ and
http://lwn.net/Articles/242559/

for potential help, fortunately it seems I won't have too long to wait.


Justin.


Cheers,

Andrew


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
On Wed, 03 Oct 2007 19:53:08 +0200, Goswin von Brederlow wrote:

> Andrew Clayton <[EMAIL PROTECTED]> writes:
> 
> > Hi,
> >
> > Hardware:
> >
> > Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1
> > (root file system) is connected to the onboard Silicon Image 3114
> > controller. The other 3 (/home) are in a software RAID 5 connected
> > to a PCI Silicon Image 3124 card. I moved the 3 raid disks off the
> > on board controller onto the card the other day to see if that
> > would help, it didn't.
> 
> I would think the onboard controller is connected to the north or
> south bridge and possibly hooked directly into the hyper
> transport. The extra controler is PCI so you are limited to
> theoretical 128MiB/s. For me the onboard chips do much better (though
> at higher cpu cost) than pci cards.

Yeah, I was wondering about that. It certainly hasn't improved things,
it's unclear if it's made things any worse..

> MfG
> Goswin


Cheers,

Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
On Wed, 3 Oct 2007 12:43:24 -0400 (EDT), Justin Piszcz wrote:

> Have you checked fragmentation?

You know, that never even occurred to me. I've gotten into the mind set
that it's generally not a problem under Linux.

> xfs_db -c frag -f /dev/md3
> 
> What does this report?

# xfs_db -c frag -f /dev/md0
actual 1828276, ideal 1708782, fragmentation factor 6.54%

Good or bad? 

Seeing as this filesystem will be three years old in December, that
doesn't seem overly bad.


I'm currently looking to things like

http://lwn.net/Articles/249450/ and 
http://lwn.net/Articles/242559/

for potential help, fortunately it seems I won't have too long to wait.

> Justin.

Cheers,

Andrew
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Justin Piszcz



On Wed, 3 Oct 2007, Andrew Clayton wrote:


On Wed, 3 Oct 2007 12:48:27 -0400 (EDT), Justin Piszcz wrote:


Also if it is software raid, when you make the XFS filesyste, on it,
it sets up a proper (and tuned) sunit/swidth, so why would you want
to change that?


Oh I didn't, the sunit and swidth were set automatically. Do they look
sane?. From reading the XFS section of the mount man page, I'm not
entirely sure what they specify and certainly wouldn't have any idea
what to set them to.


Justin.


Cheers,

Andrew



You should not need to set them as mount options unless you are overriding 
the defaults.


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to create initrd.img to boot LVM-on-RAID0?

2007-10-03 Thread Goswin von Brederlow
"Dean S. Messing" <[EMAIL PROTECTED]> writes:

> I'm having the devil of a time trying to boot off
> an  "LVM-on-RAID0" device on my Fedora 7 system.
>
> I've created a software RAID-0, defined a Volume Group on in with
> (currently) a single logical volume, and copied my entire
> installation onto it, modifying the copied fstab to reflect where
> the new "/" is.
>
> I created a new initrd with:
>
> mkinitrd --preload raid0  --with=raid0 initrd_raid.img 2.6.22.5-76-fc7
>
> The LVM modules are getting included in the initrd "for free" because
> I'm currently running on a non-raid LV-managed file system.
>
> I added a stanza to grub.conf for the new initrd.img.
>
> But the thing won't complete the boot process.
>>From the boot messages it appears to not
> be starting the array, so when it goes to scan for LVs it doesn't
> find the one that's sitting on top of the array where root lives.

Maybe your lvm.conf filters it out or the devices are missing to
access it?

> Are there instructions for how to make this work?  I've googled for
> a couple of hours, tried a bunch of stuff, but can't get it to
> work.  From what I've read I suspect I must hand-tweek the "init"
> file in the initrd.
>
> Surely there is "a right way" to do this.

Install debian, live happily ever after. :)

By the way, why bother with raid0? lvm can do striping on its own
saving you one layer alltogether and you already have lvm working
right. Why needlessly add problems to your working system?

> (And, yes, my /boot partition is an ordinary device, not involved with
> the RAID0 or the LVs).
>
> Dean

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Goswin von Brederlow
Andrew Clayton <[EMAIL PROTECTED]> writes:

> Hi,
>
> Hardware:
>
> Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file 
> system) is connected to the onboard Silicon Image 3114 controller. The other 
> 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 
> card. I moved the 3 raid disks off the on board controller onto the card the 
> other day to see if that would help, it didn't.

I would think the onboard controller is connected to the north or
south bridge and possibly hooked directly into the hyper
transport. The extra controler is PCI so you are limited to
theoretical 128MiB/s. For me the onboard chips do much better (though
at higher cpu cost) than pci cards.

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to create initrd.img to boot LVM-on-RAID0?

2007-10-03 Thread Dean S. Messing

I'm having the devil of a time trying to boot off
an  "LVM-on-RAID0" device on my Fedora 7 system.

I've created a software RAID-0, defined a Volume Group on in with
(currently) a single logical volume, and copied my entire
installation onto it, modifying the copied fstab to reflect where
the new "/" is.

I created a new initrd with:

mkinitrd --preload raid0  --with=raid0 initrd_raid.img 2.6.22.5-76-fc7

The LVM modules are getting included in the initrd "for free" because
I'm currently running on a non-raid LV-managed file system.

I added a stanza to grub.conf for the new initrd.img.

But the thing won't complete the boot process.
>From the boot messages it appears to not
be starting the array, so when it goes to scan for LVs it doesn't
find the one that's sitting on top of the array where root lives.

Are there instructions for how to make this work?  I've googled for
a couple of hours, tried a bunch of stuff, but can't get it to
work.  From what I've read I suspect I must hand-tweek the "init"
file in the initrd.

Surely there is "a right way" to do this.

(And, yes, my /boot partition is an ordinary device, not involved with
the RAID0 or the LVs).

Dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5 performance issue.

2007-10-03 Thread Justin Piszcz
Also if it is software raid, when you make the XFS filesyste, on it, it 
sets up a proper (and tuned) sunit/swidth, so why would you want to change 
that?


Justin.

On Wed, 3 Oct 2007, Justin Piszcz wrote:


Have you checked fragmentation?

xfs_db -c frag -f /dev/md3

What does this report?

Justin.

On Wed, 3 Oct 2007, Andrew Clayton wrote:


Hi,

Hardware:

Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file 
system) is connected to the onboard Silicon Image 3114 controller. The 
other 3 (/home) are in a software RAID 5 connected to a PCI Silicon Image 
3124 card. I moved the 3 raid disks off the on board controller onto the 
card the other day to see if that would help, it didn't.


Software:

Fedora Core 6, 2.6.23-rc9 kernel.

Array/fs details:

Filesystems are XFS

FilesystemTypeSize  Used Avail Use% Mounted on
/dev/sda2  xfs 20G  5.6G   14G  29% /
/dev/sda5  xfs213G  3.6G  209G   2% /data
none tmpfs   1008M 0 1008M   0% /dev/shm
/dev/md0   xfs466G  237G  229G  51% /home

/dev/md0 is currently mounted with the following options

noatime,logbufs=8,sunit=512,swidth=1024

sunit and swidth seem to be automatically set.

xfs_info shows

meta-data=/dev/md0   isize=256agcount=16, agsize=7631168 
blks

=   sectsz=4096  attr=1
data =   bsize=4096   blocks=122097920, imaxpct=25
=   sunit=64 swidth=128 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=32768, version=2
=   sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none   extsz=524288 blocks=0, rtextents=0

The array has a 256k chunk size using left-symmetric layout.

/sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
256, alleviates the problem at best)

I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 
(doesn't

seem to have made any difference)

Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768

IO scheduler is cfq for all devices.


This machine acts as a file server for about 11 workstations. /home (the 
software RAID 5) is exported over NFS where by the clients mount their home 
directories (using autofs).


I set it up about 3 years ago and it has been fine. However earlier this 
year we started noticing application stalls. e.g firefox would become 
unrepsonsive and the window would grey out (under Compiz), this typically 
lasts 2-4 seconds.


During these stalls, I see the below iostat activity (taken at 2 second 
intervals on the file server). High iowait, high await's. The 
stripe_cache_active max's out and things kind of grind to halt for a few 
seconds until the stripe_cache_active starts shrinking.


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.000.000.000.250.00   99.75

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00  5.47 0.0040.8014.91 
0.059.73   7.18   3.93
sdb   0.00 0.00  1.49  1.49 5.97 9.9510.67 
0.06   18.50   9.00   2.69
sdc   0.00 0.00  0.00  2.99 0.0015.9210.67 
0.014.17   4.17   1.24
sdd   0.00 0.00  0.50  2.49 1.9913.9310.67 
0.025.67   5.67   1.69
md0   0.00 0.00  0.00  1.99 0.00 7.96 8.00 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.250.005.241.500.00   93.02

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00 12.50 0.0085.7513.72 
0.129.60   6.28   7.85
sdb 182.50   275.00 114.00 17.50   986.0082.0016.24 
337.03  660.64   6.06  79.70
sdc 171.00   269.50 117.00 20.00  1012.0094.0016.15 
315.35  677.73   5.86  80.25
sdd 149.00   278.00 107.00 18.50   940.0084.0016.32 
311.83  705.33   6.33  79.40
md0   0.00 0.00  0.00 1012.00 0.00  8090.0015.99 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 
0.000.00   0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.000.001.50   44.610.00   53.88

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00  1.00 0.00 4.25 8.50 
0.000.00   0.00   0.00
sdb 168.5064.00 129.50 58.00  1114.00   508.0017.30 
645.37 1272.90   5.34 100.05
sdc 194.0076.50 141.50 43.00  1232.00   360.0017.26 
664.01  916.30   5.42 100.05
sd

Re: RAID 5 performance issue.

2007-10-03 Thread Justin Piszcz

Have you checked fragmentation?

xfs_db -c frag -f /dev/md3

What does this report?

Justin.

On Wed, 3 Oct 2007, Andrew Clayton wrote:


Hi,

Hardware:

Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file 
system) is connected to the onboard Silicon Image 3114 controller. The other 3 
(/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I 
moved the 3 raid disks off the on board controller onto the card the other day 
to see if that would help, it didn't.

Software:

Fedora Core 6, 2.6.23-rc9 kernel.

Array/fs details:

Filesystems are XFS

FilesystemTypeSize  Used Avail Use% Mounted on
/dev/sda2  xfs 20G  5.6G   14G  29% /
/dev/sda5  xfs213G  3.6G  209G   2% /data
none tmpfs   1008M 0 1008M   0% /dev/shm
/dev/md0   xfs466G  237G  229G  51% /home

/dev/md0 is currently mounted with the following options

noatime,logbufs=8,sunit=512,swidth=1024

sunit and swidth seem to be automatically set.

xfs_info shows

meta-data=/dev/md0   isize=256agcount=16, agsize=7631168 blks
=   sectsz=4096  attr=1
data =   bsize=4096   blocks=122097920, imaxpct=25
=   sunit=64 swidth=128 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=32768, version=2
=   sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none   extsz=524288 blocks=0, rtextents=0

The array has a 256k chunk size using left-symmetric layout.

/sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
256, alleviates the problem at best)

I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't
seem to have made any difference)

Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768

IO scheduler is cfq for all devices.


This machine acts as a file server for about 11 workstations. /home (the 
software RAID 5) is exported over NFS where by the clients mount their home 
directories (using autofs).

I set it up about 3 years ago and it has been fine. However earlier this year 
we started noticing application stalls. e.g firefox would become unrepsonsive 
and the window would grey out (under Compiz), this typically lasts 2-4 seconds.

During these stalls, I see the below iostat activity (taken at 2 second 
intervals on the file server). High iowait, high await's. The 
stripe_cache_active max's out and things kind of grind to halt for a few 
seconds until the stripe_cache_active starts shrinking.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.000.000.000.250.00   99.75

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00  5.47 0.0040.8014.91 0.05 
   9.73   7.18   3.93
sdb   0.00 0.00  1.49  1.49 5.97 9.9510.67 0.06 
  18.50   9.00   2.69
sdc   0.00 0.00  0.00  2.99 0.0015.9210.67 0.01 
   4.17   4.17   1.24
sdd   0.00 0.00  0.50  2.49 1.9913.9310.67 0.02 
   5.67   5.67   1.69
md0   0.00 0.00  0.00  1.99 0.00 7.96 8.00 0.00 
   0.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.250.005.241.500.00   93.02

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00 12.50 0.0085.7513.72 0.12 
   9.60   6.28   7.85
sdb 182.50   275.00 114.00 17.50   986.0082.0016.24   
337.03  660.64   6.06  79.70
sdc 171.00   269.50 117.00 20.00  1012.0094.0016.15   
315.35  677.73   5.86  80.25
sdd 149.00   278.00 107.00 18.50   940.0084.0016.32   
311.83  705.33   6.33  79.40
md0   0.00 0.00  0.00 1012.00 0.00  8090.0015.99 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.000.001.50   44.610.00   53.88

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00  1.00 0.00 4.25 8.50 0.00 
   0.00   0.00   0.00
sdb 168.5064.00 129.50 58.00  1114.00   508.0017.30   
645.37 1272.90   5.34 100.05
sdc 194.0076.50 141.50 43.00  1232.00   360.0017.26   
664.01  916.30   5.42 100.05
sdd 172.0090.50 114.50 50.00   996.00   456.0017.65   
662.54  977.28   6.08 100.05
md0   0.00 0.00  0.50  8.00 2.0032.00 8

RAID 5 performance issue.

2007-10-03 Thread Andrew Clayton
Hi,

Hardware:

Dual Opteron 2GHz cpus. 2GB RAM. 4 x 250GB SATA hard drives. 1 (root file 
system) is connected to the onboard Silicon Image 3114 controller. The other 3 
(/home) are in a software RAID 5 connected to a PCI Silicon Image 3124 card. I 
moved the 3 raid disks off the on board controller onto the card the other day 
to see if that would help, it didn't.

Software:

Fedora Core 6, 2.6.23-rc9 kernel.

Array/fs details:

Filesystems are XFS

FilesystemTypeSize  Used Avail Use% Mounted on
/dev/sda2  xfs 20G  5.6G   14G  29% /
/dev/sda5  xfs213G  3.6G  209G   2% /data
none tmpfs   1008M 0 1008M   0% /dev/shm
/dev/md0   xfs466G  237G  229G  51% /home

/dev/md0 is currently mounted with the following options

noatime,logbufs=8,sunit=512,swidth=1024

sunit and swidth seem to be automatically set.

xfs_info shows

meta-data=/dev/md0   isize=256agcount=16, agsize=7631168 blks
 =   sectsz=4096  attr=1
data =   bsize=4096   blocks=122097920, imaxpct=25
 =   sunit=64 swidth=128 blks, unwritten=1
naming   =version 2  bsize=4096  
log  =internal   bsize=4096   blocks=32768, version=2
 =   sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none   extsz=524288 blocks=0, rtextents=0

The array has a 256k chunk size using left-symmetric layout.

/sys/block/md0/md/stripe_cache_size is currently at 4096 (upping this from
256, alleviates the problem at best)

I also have currently set /sys/block/sd[bcd]/queue/nr_requests to 512 (doesn't
seem to have made any difference)

Also blockdev --setra 8192 /dev/sd[bcd] also tried 16384 and 32768

IO scheduler is cfq for all devices.


This machine acts as a file server for about 11 workstations. /home (the 
software RAID 5) is exported over NFS where by the clients mount their home 
directories (using autofs).

I set it up about 3 years ago and it has been fine. However earlier this year 
we started noticing application stalls. e.g firefox would become unrepsonsive 
and the window would grey out (under Compiz), this typically lasts 2-4 seconds.

During these stalls, I see the below iostat activity (taken at 2 second 
intervals on the file server). High iowait, high await's. The 
stripe_cache_active max's out and things kind of grind to halt for a few 
seconds until the stripe_cache_active starts shrinking.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.000.000.000.250.00   99.75

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00  5.47 0.0040.8014.91 0.05 
   9.73   7.18   3.93
sdb   0.00 0.00  1.49  1.49 5.97 9.9510.67 0.06 
  18.50   9.00   2.69
sdc   0.00 0.00  0.00  2.99 0.0015.9210.67 0.01 
   4.17   4.17   1.24
sdd   0.00 0.00  0.50  2.49 1.9913.9310.67 0.02 
   5.67   5.67   1.69
md0   0.00 0.00  0.00  1.99 0.00 7.96 8.00 0.00 
   0.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.250.005.241.500.00   93.02

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00 12.50 0.0085.7513.72 0.12 
   9.60   6.28   7.85
sdb 182.50   275.00 114.00 17.50   986.0082.0016.24   
337.03  660.64   6.06  79.70
sdc 171.00   269.50 117.00 20.00  1012.0094.0016.15   
315.35  677.73   5.86  80.25
sdd 149.00   278.00 107.00 18.50   940.0084.0016.32   
311.83  705.33   6.33  79.40
md0   0.00 0.00  0.00 1012.00 0.00  8090.0015.99 
0.000.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.000.001.50   44.610.00   53.88

Device: rrqm/s   wrqm/s   r/s   w/srkB/swkB/s avgrq-sz avgqu-sz 
  await  svctm  %util
sda   0.00 0.00  0.00  1.00 0.00 4.25 8.50 0.00 
   0.00   0.00   0.00
sdb 168.5064.00 129.50 58.00  1114.00   508.0017.30   
645.37 1272.90   5.34 100.05
sdc 194.0076.50 141.50 43.00  1232.00   360.0017.26   
664.01  916.30   5.42 100.05
sdd 172.0090.50 114.50 50.00   996.00   456.0017.65   
662.54  977.28   6.08 100.05
md0   0.00 0.00  0.50  8.00 2.0032.00 8.00 0.00 
   0.00   0.00   0.00
sde   0.00 0.00  0.00  0.00 0.00 0.00 0.00 0.00 
   0.00   0.00

Re: Journalling filesystem corruption fixed in between?

2007-10-03 Thread Michael Tokarev
Rustedt, Florian wrote:
> Hello list,
> 
> some folks reported severe filesystem-crashes with ext3 and reiserfs on
> mdraid level 1 and 5.

I guess much more strong evidience and details are needed.
Without any additional information I for one can only make
a (not-so-pleasant) guess about those "some folks", nothing
more.  We're running several dozens of systems on raid1s and
raid5s since 2.4 kernel (and some since 2.2 if memory serves,
with an additional patch for raid functionality), -- nothing
except of usual mostly hardware problems since that.  And
many other people use linux raid and especially ext3 file-
system in production on large boxes with good load -- such
a corruption, be it not a particular system specific (due
to, for example, a bad ram or faulty controller or whatever),
should cause alot of messages here @linux-raid and elsewhere.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html