Re: Few questions

2007-12-07 Thread Corey Hickey
Michael Makuch wrote:
 I realize this is the developers list and though I am a developer I'm 
 not a developer
 of linux raid, but I can find no other source of answers to these questions:

Don't worry; it's a user list too.

 $ cat /proc/mdstat
 Personalities : [raid6] [raid5] [raid4]
 md0 : active raid5 etherd/e0.0[0] etherd/e0.2[9](S) etherd/e0.9[8] 
 etherd/e0.8[7] etherd/e0.7[6] etherd/e0.6[5] etherd/e0.5[4] 
 etherd/e0.4[3] etherd/e0.3[2] etherd/e0.1[1]
   3907091968 blocks level 5, 64k chunk, algorithm 2 [9/9] [U]
   []  resync = 64.5% (315458352/488386496) 
 finish=2228.0min speed=1292K/sec
 unused devices: none
 
 and I have no idea where the raid6 came from.

As far as I understand, the Personalities line just shows what RAID
capabilities are compiled into the kernel (and loaded, if modules). For
example, even though I'm only using raid0, I have:

-
$ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid0 sdc[0] sdb[1]
  976772992 blocks 64k chunks

unused devices: none
-

-Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5: weird size results after Grow

2007-10-13 Thread Corey Hickey

Marko Berg wrote:

Bill Davidsen wrote:

Marko Berg wrote:
Any suggestions on how to fix this, or what to investigate next, 
would be appreciated!


I'm not sure what you're trying to fix here, everything you posted 
looks sane.


I'm trying to find the missing 300 GB that, as df reports, are not 
available. I ought to have a 900 GB array, consisting of four 300 GB 
devices, while only 600 GB are available. Adding the fourth device 
didn't increase the capacity of the array (visible, at least). E.g. 
fdisk reports the array size to be 900 G, but df still claims 600 
capacity. Any clues why?


df reports the size of the filesystem, which is still about 600GB--the 
filesystem doesn't resize automatically when the size of the underlying 
device changes.


You'll need to use resize2fs, resize_reiserfs, or whatever other tool is 
appropriate for your type of filesystem.


-Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 5: weird size results after Grow

2007-10-13 Thread Corey Hickey

Justin Piszcz wrote:


On Sat, 13 Oct 2007, Marko Berg wrote:


Corey Hickey wrote:

Marko Berg wrote:

Bill Davidsen wrote:

Marko Berg wrote:
Any suggestions on how to fix this, or what to investigate next, would 
be appreciated!


I'm not sure what you're trying to fix here, everything you posted 
looks sane.
I'm trying to find the missing 300 GB that, as df reports, are not 
available. I ought to have a 900 GB array, consisting of four 300 GB 
devices, while only 600 GB are available. Adding the fourth device didn't 
increase the capacity of the array (visible, at least). E.g. fdisk reports 
the array size to be 900 G, but df still claims 600 capacity. Any clues 
why?
df reports the size of the filesystem, which is still about 600GB--the 
filesystem doesn't resize automatically when the size of the underlying 
device changes.


You'll need to use resize2fs, resize_reiserfs, or whatever other tool is 
appropriate for your type of filesystem.


-Corey 

Right, so this isn't one of my sharpest days... Thanks a bunch, Corey!


No problem.


Ah, already answered.


vger.kernel.org greylisted my smtp server, so it took my message a while 
to get to the list.


-Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moron Destroyed RAID6 Array Superblocks

2007-04-14 Thread Corey Hickey
Aaron C. de Bruyn wrote:
 Ok--I got moved in to my new place and am back and running on the 'net.
 I sat down for a few hours and attempted to write a script to try all
 possible combinations of drives...but I have to admit that I'm lost.
 
 I have 8 drives in the array--and I can output every possible
 combination of those.  But what the heck would be the logic to output
 all combinations of the 8 drives using only 6 at a time?  My head hurts.

You want only 5 at a time, don't you?
8 drives - 1 spare - 2 parity = 5

Anyway, a very quick-n-dirty way to get what you want is to:
1. calculate all permutations
2. strip away the last items of each permutation
3. get rid of duplicate lines

$ wget http://hayne.net/MacDev/Perl/permutations
$ perl permutations a b c d e f g h | cut -d' ' -f-5 | sort -u

The Perl script above is from:
http://hayne.net/MacDev/Perl/

-Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20: reproducible hard lockup with RAID-5 resync

2007-02-18 Thread Corey Hickey
Neil Brown wrote:
 Ok, so the difference is CONFIG_SYSFS_DEPRECATED. If that is not
 defined, the kernel locks up. There's not a lot of code under
 #ifdef/#ifndef CONFIG_SYSFS_DEPRECATED, but since I'm not familiar with
 any of it I don't expect trying to locate the bug on my own would be
 very productive.

 Neil, do you have CONFIG_SYSFS_DEPRECATED enabled? If so, does disabling
 it reproduce my problem? If you can't reproduce it, should it take the
 problem over to linux-kernel?
 
 # CONFIG_SYSFS_DEPRECATED is not set
 
 No, it is not set, and yet it all still works for me.

Dang, again. :)

 It is very hard to see how this CONFIG option can make a difference.
 Have you double checked that setting it removed the problem and
 clearing it causes the problem?

Yes, it seems odd to me too, but I have double-checked. If I build a
kernel with CONFIG_SYSFS_DEPRECATED enabled, it works; if I disable that
option and rebuild the kernel, it locks up.

I just tried running 'make defconfig' and then enabling only RAID,
RAID-0, RAID-1, and RAID-4/5/6. If I then disable
CONFIG_SYSFS_DEPRECATED, there aren't any problems. ...so, I'll try to
isolate the problem some more later.

Thanks,
Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20: reproducible hard lockup with RAID-5 resync

2007-02-17 Thread Corey Hickey
Corey Hickey wrote:
 When I get home (late) tonight I'll try running dd and badblocks on the 
 corresponding drives and partitions.

Well, I haven't been able to reproduce the problem that way. I tried the
following:

$ dd id=/dev/hda of=/dev/null
$ badblocks /dev/hda
$ badblocks -n /dev/hda

...and the same for sda6, sdb, sdc6, sdd, and md2. In each case I killed
the test after several seconds, on the assumption that if the problem
was reproducible within less than a second by triggering a resync, it
wouldn't take long any other way.

If anyone has any suggestions for further tests I can do, I'll be happy
to try them out.

Thanks,
Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20: reproducible hard lockup with RAID-5 resync

2007-02-16 Thread Corey Hickey
Neil Brown wrote:
 On Thursday February 15, [EMAIL PROTECTED] wrote:
 I think I have found an easily-reproducible bug in Linux 2.6.20. I have
 already applied the Fix various bugs with aligned reads in RAID5
 patch, and that had no effect. It appears to be related to the resync
 process, and makes the system lock up, hard.
 
 I'm guessing that the problem is at a lower level than raid.
 What IDE/SATA controllers do you have?  Google to see if anyone else
 has had problems with them in 2.6.20.
 During the lock up, nothing is printed to the console, and the magic
 SysRQ key has no effect; I have to poke the reset button.
 
 Sound's like interrupts are disabled, but x86_64 always enables the
 NMI watchdog which should trigger if interrupts are off for too long. 
 
 Do you have CONFIG_DETECT_SOFTLOCKUP=y in your .config (it is in the
 kernel debugging options menu I think).  If not, setting that would be
 worth a try.
 
 A raid5 resync across 5 sata drives on a couple of different
 silicon-image controllers doesn't lock up for me.

Wow, thanks for the quick response. I have to go to bed now, but I'll
try to get you that information tomorrow.

Thanks,
Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.20: reproducible hard lockup with RAID-5 resync

2007-02-16 Thread Corey Hickey

Neil Brown wrote:

On Thursday February 15, [EMAIL PROTECTED] wrote:

I think I have found an easily-reproducible bug in Linux 2.6.20. I have
already applied the Fix various bugs with aligned reads in RAID5
patch, and that had no effect. It appears to be related to the resync
process, and makes the system lock up, hard.


I'm guessing that the problem is at a lower level than raid.
What IDE/SATA controllers do you have?  Google to see if anyone else
has had problems with them in 2.6.20.


I have an nForce3 motherboard. lspci calls my IDE:
nVidia Corporation CK8S Parallel ATA Controller (v2.5) (rev a2)
...and my SATA:
nVidia Corporation CK8S Serial ATA Controller (v2.5) (rev a2)

I'm using libata for my SATA drives and the old IDE driver for my IDE 
drive. For reference, I have uploaded my kernel configuration and the 
output of lspci:

http://fatooh.org/files/tmp/config-2.6.20
http://fatooh.org/files/tmp/lspci-v

Anyway, I googled a bit, and I also looked through the recent threads in 
the linux-kernel archives, but I haven't found anything. I don't follow 
kernel development closely, though, so it's quite possible I missed 
something.


When I get home (late) tonight I'll try running dd and badblocks on the 
corresponding drives and partitions.



During the lock up, nothing is printed to the console, and the magic
SysRQ key has no effect; I have to poke the reset button.


Sound's like interrupts are disabled, but x86_64 always enables the
NMI watchdog which should trigger if interrupts are off for too long. 


How long is too long? I waited a few minutes, at least, on the first 
few tries.



Do you have CONFIG_DETECT_SOFTLOCKUP=y in your .config (it is in the
kernel debugging options menu I think).  If not, setting that would be
worth a try.


I do indeed have CONFIG_DETECT_SOFTLOCKUP enabled. The Kconfig 
description says it should detect lockups  10 seconds, I've waited 
longer than that many times.



A raid5 resync across 5 sata drives on a couple of different
silicon-image controllers doesn't lock up for me.


Heck. ;)  Would it by any chance make a difference that I'm running 
RAID-5 across a mixture of drives and partitions?


Thanks again,
Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.20: reproducible hard lockup with RAID-5 resync

2007-02-15 Thread Corey Hickey
I think I have found an easily-reproducible bug in Linux 2.6.20. I have
already applied the Fix various bugs with aligned reads in RAID5
patch, and that had no effect. It appears to be related to the resync
process, and makes the system lock up, hard.

The steps to reproduce are:
1. Be running Linux 2.6.20 and do whatever is necessary to prepare for a
crash (close open files, sync, unmount filesystems, or whatever).
Alternatively, just boot with 'init=/bin/bash'.
2. Run 'mdadm -S /dev/md2', where /dev/md2 is a RAID-5.
3. Run 'mdadm -A /dev/md2 -U resync'.
4. Wait about 1 second. The system will lock up.

During the lock up, nothing is printed to the console, and the magic
SysRQ key has no effect; I have to poke the reset button. Normally, I
wouldn't rule out a hardware problem, but I have reasonable faith in my
computer. Neither memtest86+ nor cpuburn nor normal operation have
flushed out any instability.

Upon reboot, 2.6.20 will lock up almost immediately when it tries to
resync the array. This appears to occur regardless of whether the resync
is just starting; if I run 2.6.19 for a while until the resync is, say,
50% done and then reboot to 2.6.20, the lockup still happens.

I have provided what I hope is enough information below.


System information:
Athlon64 3400+
64-bit Linux 2.6.20 compiled with GCC 4.1.2
64-bit Debian Sid
RAID-5 of 5 devices:
   /dev/hda   (IDE hard drive)
   /dev/sda6  (partition on SATA hard drive)
   /dev/sdb   (SATA hard drive)
   /dev/sdc6  (partition on SATA hard drive)
   /dev/sdd   (SATA hard drive)


bugfood:~# mdadm -D /dev/md2
/dev/md2:
Version : 00.90.03
  Creation Time : Mon May 29 22:13:47 2006
 Raid Level : raid5
 Array Size : 781433344 (745.23 GiB 800.19 GB)
Device Size : 195358336 (186.31 GiB 200.05 GB)
   Raid Devices : 5
  Total Devices : 5
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Thu Feb 15 22:07:26 2007
  State : active, resyncing
 Active Devices : 5
Working Devices : 5
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 64K

 Rebuild Status : 26% complete

   UUID : d016a205:bd3106ef:b19cb15b:b6d70494
 Events : 0.3971003

Number   Major   Minor   RaidDevice State
   0   860  active sync   /dev/sda6
   1   8   381  active sync   /dev/sdc6
   2   302  active sync   /dev/hda
   3   8   163  active sync   /dev/sdb
   4   8   484  active sync   /dev/sdd



Thank you,
Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5, 2 out of 4 disks failed, Events value differs too much

2006-12-07 Thread Corey Hickey

Bodo Thiesen wrote:

Hi, I have a little problem:

Some hours ago the second of four disks were kicked out of my RAID5 thus 
rendering it unusable. As of my current knowledge, the disks are still working 
correctly (I assume a cable connection problem) but that's not the problem. The 
real problem is, that the first failed disk has an event value of 9102893, the 
second failed disk has a value of 9324862 and the other two disks have a value 
of 9324869. In this case, what is the best to do to recover the RAID? Because 
just recreating the array with the 9324862-disk and the two 9324869-disks and 
later hotadding the 9102893-disk, is just unclean and as I understood it, this 
would trigger some silent data failures. Is there a chance to prevent this data 
failures to happen at all, or is it at least possible to tell, where this 
error(s) are (so I can manually check the data and take appropriate steps)? 
Remember that I still have the data from the first failed disk, from which 
parts may still be relatively up to date.

Has anyone had this problem already and found a nice solution for this?


If nobody gives you any better advice, I would follow this approach. 
These commands are examples and may need to be fixed; I haven't had this 
exact problem before (only similar ones) and I can't test anything right 
now.


First, force the reassembly of the array using the three freshest disks.
# mdadm --assemble --force --run /dev/md0 /dev/sdb /dev/sdc /dev/sdd

Next, use whatever fsck program corresponds to your filesystem and do a 
read-only check. Something like:

# reiserfsck --check /dev/md0

If fsck finds only a few problems, then it's probably safe to go ahead 
and tell fsck to fix them; data loss will be minimal or nonexistent.

# reiserfsck --fix-fixable /dev/md0

Now you ought to be able to mount the filesystem and look around.
# mount /dev/md0

If all looks good, then hot-add the stale disk and let it resync.
# mdadm /dev/md0 -a /dev/sda

Good luck,
Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd (slow) RAID performance

2006-12-07 Thread Corey Hickey

Bill Davidsen wrote:

Dan Williams wrote:

On 12/1/06, Bill Davidsen [EMAIL PROTECTED] wrote:

Thank you so much for verifying this. I do keep enough room on my drives
to run tests by creating any kind of whatever I need, but the point is
clear: with N drives striped the transfer rate is N x base rate of one
drive; with RAID-5 it is about the speed of one drive, suggesting that
the md code serializes writes.

If true, BOO, HISS!

Can you explain and educate us, Neal? This look like terrible 
performance.



Just curious what is your stripe_cache_size setting in sysfs?

Neil, please include me in the education if what follows is incorrect:

Read performance in kernels up to and including 2.6.19 is hindered by
needing to go through the stripe cache.  This situation should improve
with the stripe-cache-bypass patches currently in -mm.  As Raz
reported in some cases the performance increase of this approach is
30% which is roughly equivalent to the performance difference I see of
a 4-disk raid5 versus a 3-disk raid0.

For the write case I can say that MD does not serialize writes.  If by
serialize you mean that there is 1:1 correlation between writes to the
parity disk and writes to a data disk.  To illustrate I instrumented
MD to count how many times it issued a write to the parity disk and
compared that to how many writes it performed to the member disks for
the workload dd if=/dev/zero of=/dev/md0 bs=1024k count=100.  I
recorded 8544 parity writes and 25600 member disk writes which is
about 3 member disk writes per parity write, or pretty close to
optimal for a 4-disk array.  So, serialization is not the cause,
performing sub-stripe width writes is not the cause as 98% of the
writes happened without needing to read old data from the disks.
However, I see the same performance on my system, about equal to a
single disk.


But the number of writes isn't an indication of serialization. If I 
write disk A, then B, then C, then D, you can't tell if I waited for 
each write to finish before starting the next, or did them in parallel. 
And since the write speed is equal to the speed of a single drive, 
effectively that's what happens, even though I can't see it in the code.


For what it's worth, my read and write speeds on a 5-disk RAID-5 are 
somewhat faster than the speed of any single drive. The array is a 
mixture of two different SATA drives and one IDE drive.


Sustained individual read performances range from 56 MB/sec for the IDE 
drive to 68 MB/sec for the faster SATA drives. I can read from the 
RAID-5 at about 100MB/sec.


I can't give precise numbers for write speeds, except to say that I can 
write to a file on the filesystem (which is mostly full and probably 
somewhat fragmented) at about 83 MB/sec.


None of those numbers are equal to the theoretical maximum performance, 
so I see your point, but they're still faster than one individual disk.


I also suspect that write are not being combined, since writing the 2GB 
test runs at one-drive speed writing 1MB blocks, but floppy speed 
writing 2k blocks. And no, I'm not running out of CPU to do the 
overhead, it jumps from 2-4% to 30% of one CPU, but on an unloaded SMP 
system it's not CPU bound.


Here is where I step into supposition territory.  Perhaps the
discrepancy is related to the size of the requests going to the block
layer.  raid5 always makes page sized requests with the expectation
that they will coalesce into larger requests in the block layer.
Maybe we are missing coalescing opportunities in raid5 compared to
what happens in the raid0 case?  Are there any io scheduler knobs to
turn along these lines?


Good thought, I had already tried that but not reported it, changing 
schedulers make no significant difference. In the range of 2-3%, which 
is close to the measurement jitter due to head position or whatever.


I changed my swap to RAID-10, but RAID-5 just can't keep up with 
70-100MB/s data bursts which I need. I'm probably going to scrap 
software RAID and go back to a controller, the write speeds are simply 
not even close to what they should be. I have one more thing to try, a 
tool I wrote to chase another problem a few years ago. I'll report if I 
find something.


I have read that using RAID to stripe swap space is ill-advised, or at 
least unnecessary. The kernel will stripe multiple swap devices if you 
assign them the same priority.

http://tldp.org/HOWTO/Software-RAID-HOWTO-2.html

If you've been using RAID-10 for swap, then I think you could just 
assign multiple RAID-1 devices the same swap priority for the same 
effect with (perhaps) less overhead.


-Corey
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html