Hi everyone;
Thanks for the information so far!
Greatly appreciated.
I've just found this:
http://home-tj.org/wiki/index.php/Sil_m15w#Message:_Re:_SiI_3112_.26_Seagate_drivers
Which in particular mentions that Silicon Image controllers and
Seagate drives don't work too well together, and
Jeff Garzik wrote:
Molle Bestefich wrote:
I've just found this:
http://home-tj.org/wiki/index.php/Sil_m15w#Message:_Re:_SiI_3112_.26_Seagate_drivers
Which in particular mentions that Silicon Image controllers and
Seagate drives don't work too well together, and neither Silicon Image
nor
Martin Kihlgren writes:
And no, nothing hangs except the disk access to the device
in question when a disk fails.
Sounds good! +1 for USB...
My Seagate disks DO generate too much heat if I stack them on top
of each other, which their form factor suggests they would accept.
Starts to take
Patrick Hoover wrote:
Is anyone else having issues with USB interfaced disks to implement
RAID? Any thoughts on Pros / Cons for doing this?
Sounds like a very good stress test for MD.
I often find servers completely hung when a disk fails, this usually
happens in the IDE layer.
If using USB
I'm looking for new harddrives.
This is my experience so far.
SATA cables:
=
I have zero good experiences with any SATA cables.
They've all been crap so far.
3.5 ATA harddrives buyable where I live:
==
(All drives are 7200rpm, for some
Peter T. Breuer wrote:
1) I would like raid request retries to be done with exponential
delays, so that we get a chance to overcome network brownouts.
I presume the former will either not be objectionable
You want to hurt performance for every single MD user out there, just
because things
Peter T. Breuer wrote:
You want to hurt performance for every single MD user out there, just
There's no performance drop! Exponentially staged retries on failure
are standard in all network protocols ... it is the appropriate
reaction in general, since stuffing the pipe full of immediate
Peter T. Breuer wrote:
We can't do a HOT_REMOVE while requests are outstanding,
as far as I know.
Actually, I'm not quite sure which kind of requests you are
talking about.
Only one kind. Kernel requests :). They come in read and write
flavours (let's forget about the third race for the
Sevrin Robstad wrote:
I created the RAID when I installed Fedora Core 3 some time ago,
didn't do anything special so the chunks should be 64kbyte and
parity should be left-symmetric ?
I have no idea what's default on FC3, sorry.
Any Idea ?
I missed that you were trying to fdisk -l
Sevrin Robstad wrote:
I got a friend of mine to make a list of all the 6^6 combinations of dev
1 2 3 4 5 missing,
shouldn't this work ???
Only if you get the layout and chunk size right.
And make sure that you know whether you were using partitions (eg.
sda1) or whole drives (eg. sda - bad
Karl Voit wrote:
if (super == NULL) {
fprintf(stderr, Name : No suitable drives found for %s\n, mddev);
[...]
Well I guess, the message will be shown, if the superblock is not found.
Yes. No clue why, my buest guess is that you've already zeroed the superblock.
What does madm --query /
Karl Voit wrote:
443: root at ned ~ # mdadm --examine /dev/sd[abcd]
Shows that all 4 devices are ACTIVE SYNC
Please note that there is no 1 behind sda up to sdd!
Yes, you're right.
Seems you've created an array/superblocks on both sd[abcd] (line 443
onwards), and on sd[abcd]1 (line
Henrik Holst wrote:
Is sda1 occupying the entire disk? since the superblock is the /last/
128Kb (I'm assuming 128*1024 bytes) the superblocks should be one and
the same.
Ack, never considered that.
Ugly!!!
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a
Karl Voit wrote:
OK, I upgraded my kernel and mdadm:
uname -a:
Linux ned 2.6.13-grml #1 Tue Oct 4 18:24:46 CEST 2005 i686 GNU/Linux
That release is 10 months old.
Newest release is 2.6.17.
You can see changes to MD since 2.6.13 here:
Karl Voit wrote:
I published the whole story (as much as I could log during my reboots
and so on) on the web:
http://paste.debian.net/8779
From the paste bin:
443: [EMAIL PROTECTED] ~ # mdadm --examine /dev/sd[abcd]
Shows that all 4 devices are ACTIVE SYNC
Next
Tim wrote:
That would probably be ideal, issue the power off command with
something like a 30 second timeout, which would give the system time to
power off cleanly first.
I don't think that's ideal.
Many systems restore power to the last known state, thus powering off
cleanly would result in
Christian Pernegger wrote:
Intel SE7230NH1-E mainboard
Pentium D 930
HPA recently said that x86_64 CPUs have better RAID5 performance.
Promise Ultra133 TX2 (2ch PATA)
- 2x Maxtor 6B300R0 (300GB, DiamondMax 10) in RAID1
Onboard Intel ICH7R (4ch SATA)
- 4x Western Digital WD5000YS
Nix wrote:
Adam Talbot wrote:
Can any one give me more info on this error?
Pulled from /var/log/messages.
raid6: read error corrected!!
The message is pretty easy to figure out and the code (in
drivers/md/raid6main.c) is clear enough.
But the message could be clearer, for instance it would
David M. Strang wrote:
Well today, during this illustrious rebuild... it appears I actually DID
have a disk fail. So, I have 26 disks... 1 partially rebuilt, and 1 failed.
Common scenario it seems.
Hoping and praying that a rebuild didn't actually wipe the disk and maybe
just synced things
Ric Wheeler wrote:
You are absolutely right - if you do not have a validated, working
barrier for your low level devices (or a high end, battery backed array
or JBOD), you should disable the write cache on your RAIDed partitions
and on your normal file systems ;-)
There is working support for
NeilBrown wrote:
Change ENOTSUPP to EOPNOTSUPP
Because that is what you get if a BIO_RW_BARRIER isn't supported !
Dumb question, hope someone can answer it :).
Does this mean that any version of MD up till now won't know that SATA
disks does not support barriers, and therefore won't flush
gelma wrote:
first run: lot of strange errors report about impossible i_size
values, duplicated blocks, and so on
You mention only filesystem errors, no block device related errors.
In this case, I'd say that it's more likely that dm-crypt is to blame
rather than MD.
I think you should try the
Sam Hopkins wrote:
mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
While it should work, a bit drastic perhaps?
I'd start with mdadm --assemble --force.
With --force, mdadm will pull the event counter of the most-recently
failed drive up to current status which should give you a
Tim Bostrom wrote:
It appears that /dev/hdf1 failed this past week and /dev/hdh1 failed back in
February.
An obvious question would be, how much have you been altering the
contents of the array since February?
I tried a mdadm --assemble --force and was able to get the following:
Jonathan wrote:
# mdadm -C /dev/md0 -n 4 -l 5 missing /dev/etherd/e0.[023]
I think you should have tried mdadm --assemble --force first, as I
proposed earlier.
By doing the above, you have effectively replaced your version 0.9.0
superblocks with version 0.9.2. I don't know if version 0.9.2
Jonathan wrote:
I was already terrified of screwing things up
now I'm afraid of making things worse
Adrenalin... makes life worth living there for a sec, doesn't it ;o)
based on what was posted before is this a sensible thing to try?
mdadm -C /dev/md0 -c 32 -n 4 -l 5 missing
Jonathan wrote:
Well, the block sizes are back to 32k now, but I still had no luck
mounting /dev/md0 once I created the array.
Ahem, I missed something.
Sorry, the 'a' was hard to spot.
Your array used layout : left-asymmetric, while the superblock you've
just created has layout:
Jonathan wrote:
how safe should the following be?
mdadm --assemble /dev/md0 --uuid=8fe1fe85:eeb90460:c525faab:cdaab792
/dev/etherd/e0.[01234]
You can hardly do --assemble anymore.
After you have recreated superblocks on some of the devices, those are
conceptually part of a different raid
Does anyone know of a way to disable libata's 5-time retry when a read fails?
It has the effect of causing every failed sector read to take 6
seconds before it fails, causing raid5 rebuilds to go awfully slow.
It's generally undesirable too, when you've got RAID on top that can
write replacement
Hi Neil, list
You wrote:
mdadm -C /dev/md1 --assume-clean /dev/sd{a,b,c,d,e,f}1
Will the above destroy data by overwriting the on-disk v0.9 superblock
with a larger v1 superblock?
--assume-clean is not document in 'mdadm --create --help', by the way
- what does it do?
-
To unsubscribe from
Molle Bestefich wrote:
Neil Brown wrote:
How do I force MD to raise the event counter on sdb1 and accept it
into the array as-is, so I can avoid bad-block induced data
corruption?
For that, you have to recreate the array.
Scary. And hairy.
How much do I have to bribe you to make
A system with 6 disks, it was UU a moment ago, after read errors
on a file now looks like:
/proc/mdstat:
md1 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[6](F) sda1[7](F)
level 5, 64k chunk, algorithm 2 [6/4] [__]
uname:
linux 2.6.11-gentoo-r4
What's the recommended
Neil Brown wrote:
You shouldn't need to upgrade kernel
Ok.
I had a crazy idea that 2 devices down in a RAID5 was an MD bug.
I didn't expect MD to kick that last disk - I would have thought that
it would just pass on the read error in that situation. If you've got
the time to explain I'd like
Neil Brown wrote:
It is arguable that for a read error on a degraded raid5, that may not
be the best thing to do, but I'm not completely convinced. A read
error will mean that a write to the same stripe will have to fail, so
at the very least we would want to switch the array read-only.
That
Neil Brown wrote:
use --assemble --force
# mdadm --assemble --force /dev/md1
mdadm: forcing event count in /dev/sda1(0) from 163362 upto 163368
mdadm: /dev/md1 has been started with 5 drives (out of 6).
Oops, only 5 drives, but I know data is OK on all 6 drives.
I also know that there are bad
Neil Brown wrote:
How do I force MD to raise the event counter on sdb1 and accept it
into the array as-is, so I can avoid bad-block induced data
corruption?
For that, you have to recreate the array.
Scary. And hairy.
How much do I have to bribe you to make this work:
# mdadm --assemble
May I offer the point of view that this is a bug:
MD apparently tries to keep a raid5 array up by using 4 out of 6 disks.
Here's the event chain, from start to now:
==
1.) Array assembled automatically with 6/6 devices.
2.) Read error, MD kicks sdb1.
Todd [EMAIL PROTECTED] wrote:
The strangest thing happened the other day. I booted my machine
and the permissions were all messed up. I couldn't access many
files as root which were owned by root. I couldnt' run common
programs as root or a standard user.
Odd, have you found out why?
What was
Bill Davidsen wrote:
Molle Bestefich wrote:
it wrote:
Ouch.
How does hardware raid deal with this? Does it?
Hardware RAID controllers deal with this by rounding the size of
participant devices down to nearest GB, on the assumption that no
drive manufacturers would have the guts
it wrote:
Ouch.
How does hardware raid deal with this? Does it?
Hardware RAID controllers deal with this by rounding the size of
participant devices down to nearest GB, on the assumption that no
drive manufacturers would have the guts to actually sell eg. a 250 GB
drive with less than exactly
[EMAIL PROTECTED] wrote:
Not only that, the raid developers themselves
consider autoassembly deprecated.
http://article.gmane.org/gmane.linux.kernel/373620
Hmm. My knee-jerk, didn't-stop-to-think-about-it reaction is that
this is one of the finest features of linux raid, so why remove
Michael Barnwell wrote:
I'm experiencing silent data corruption
on my RAID 5 set of four 400GB SATA disks.
I have circa the same hardware:
* AMD Opteron 250
* Silicon Image 3114
* 300 GB Maxtor SATA
Just to add a data point, I've run your test on my RAID 1 (not RAID 5
!) without problems.
NeilBrown wrote:
We allow the superblock to record an 'old' and a 'new'
geometry, and a position where any conversion is up to.
When starting an array we check for an incomplete reshape
and restart the reshape process if needed.
*Super* cool!
-
To unsubscribe from this list: send the line
Rik Herrin wrote:
Wouldn't connecting a UPS + using a stable kernel
version remove 90% or so of the RAID-5 write hole
problem?
There are some RAID systems that you'd rather not have redundant power on.
Think encryption. As long as a system is online, it's normal for it
to have encryption
Rik Herrin wrote:
I was interested in Linux's RAID capabilities and
read that mdadm was the tool of choice. We are
currently comparing software RAID with hardware RAID
MD is far superior to most of the hardware RAID solutions I've touched.
In short, it seems MD is developed with the goal of
Lajber Zoltan wrote:
I have some simple test with bonnie++, the sw raid superior to hw raid,
except big-name storage systems.
http://zeus.gau.hu/~lajbi/diskbenchmarks.txt
Cool.
But what does gep, tip, diskvez, iras, olvasas and atlag mean?
-
To unsubscribe from this list: send the line
I found myself typing IMHO after writing up just about each comment.
I've dropped that and you'll just have to know that all this is IMHO
and not an attack on your ways if they happen to be different ^_^.
Neil Brown wrote:
I like the suggestion of adding one-line descriptions to this.
How
mdadm's command line arguments seem arcane and cryptic and unintuitive.
It's difficult to grasp what combinations will actually do something
worthwhile and what combinations will just yield a 'you cannot do
that' output.
I find myself spending 20 minutes with mdadm --help and
Neil Brown wrote:
I would like it to take an argument in contexts where --bitmap was
meaningful (Create, Assemble, Grow) and not where --brief is
meaningful (Examine, Detail). but I don't know if getopt_long will
allow the 'short_opt' string to be changed half way through
processing...
Spencer Tuttle wrote:
Is it possible to have /boot on /dev/md_d0p1 in a RAID5 configuration
and boot with GRUB?
Only if you get yourself a PCI card with a RAID BIOS on it and attach
the disks to that.
The RAID BIOS hooks interrupt 13 and allows GRUB (or DOS or LILO for
that matter) to see the
On Saturday November 12, Neil Brown wrote:
On Saturday November 12, Kyle Wong wrote:
I understand that if I store a 224KB file into the RAID5, the
file will be divided into 7 parts x 32KB, plus 32KB parity.
(Am I correct in this?)
Sort of ... if the filesystem happens to lay it out like
If
- a disk is part of a MD RAID 1 array, and
- the disk is 'flapping', eg. going online and offline repeatedly in a
hotswap system, and
- a *write* occurs to the MD array at a time when the disk happens to
be offline,
will MD handle this correctly?
Eg. will it increase the event counters on the
Mario 'BitKoenig' Holbe wrote:
Molle Bestefich wrote:
Eg. will it increase the event counters on the other disks /even/ when
no reboot or stop-start has been performed, so that when the flappy
Event counters are increased immediately when an event occurs.
A device failure is an event
Mr. James W. Laferriere wrote:
Is there a documented procedure to follow during
creation or after that will get a raid6 array to self
rebuild ?
MD will rebuild your array automatically, given that it has a spare disk to use.
raid5: Disk failure on sde, disabling device. Operation continuing
Dan Williams wrote:
The first question is whether a solution along these lines would be
valued by the community? The effort is non-trivial.
I don't represent the community, but I think the idea is great.
When will it be finished and where can I buy the hardware? :-)
And if you don't mind
Pallai Roland wrote:
Molle Bestefich wrote:
Claas Hilbrecht wrote:
Pallai Roland schrieb:
this is a feature patch that implements 'proactive raid5 disk
replacement' (http://www.arctic.org/~dean/raid-wishlist.html),
After my experience with a broken raid5 (read the list) I
Claas Hilbrecht wrote:
Pallai Roland schrieb:
this is a feature patch that implements 'proactive raid5 disk
replacement' (http://www.arctic.org/~dean/raid-wishlist.html),
After my experience with a broken raid5 (read the list) I think the
partially failed disks feature you describe is
Ewan Grantham wrote:
I know, this is borderline, but figure this is the group of folks who
will know. I do a lot of audio and video stuff for myself and my
family. I also have a rather unusual networking setup. Long story
short, when I try to run Linux as my primary OS, I usually end up
On 7/24/05, Ewan Grantham [EMAIL PROTECTED] wrote:
On 7/24/05, Molle Bestefich [EMAIL PROTECTED] wrote:
Ewan Grantham wrote:
I know, this is borderline, but figure this is the group of folks who
will know. I do a lot of audio and video stuff for myself and my
family. I also have
Neil Brown wrote:
On Friday July 8, [EMAIL PROTECTED] wrote:
So a clean RAID1 with a disk missing should start without --run, just
like a clean RAID5 with a disk missing?
Not that with /dev/loop3 no funcitoning,
mdadm --assemble --scan
will still work.
Super!
That was exactly the point
On Friday July 8, [EMAIL PROTECTED] wrote:
On 8 Jul 2005, Molle Bestefich wrote:
On 8 Jul 2005, Melinda Taylor wrote:
We have a computer based at the South Pole which has a degraded raid 5
array across 4 disks. One of the 4 HDD's mechanically failed but we have
bought the majority
Mitchell Laks wrote:
However I think that raids should boot as long as they are intact, as a matter
of policy. Otherwise we lose our ability to rely upon them for remote
servers...
It does seem wrong that a RAID 5 starts OK with a disk missing, but a
RAID 1 fails.
Perhaps MD is unable to
Hmm, I think the information in /var/log/messages are actually
interesting for MD debugging.
Seems there was a bad sector somewhere in the middle of all this,
which might have triggered something?
Attached (gzipped - sorry for the inconvenience, but it's 5 kB vs. 250 kB!)
I've cut out a lot of
Phantazm wrote:
And the kernel log is filled up with this.
Feb 20 08:43:13 [kernel] md: md0: sync done.
Feb 20 08:43:13 [kernel] md: syncing RAID array md0
Feb 20 08:43:13 [kernel] md: minimum _guaranteed_ reconstruction speed: 5000
KB/sec/disc.
Feb 20 08:43:13 [kernel] md: md0: sync done.
David Greaves wrote:
Guy wrote:
Well, I agree with KISS, but from the operator's point of view!
I want...
[snip]
Fair enough.
[snip]
should the LED control code be built into mdadm?
Obviously not.
But currently, a LED control app would have to pull information from
/proc/mdstat,
Hervé Eychenne wrote:
Molle Bestefich wrote:
There seems to be an obvious lack of a properly thought out interface
to notify userspace applications of MD events (disk failed -- go
light a LED, etc).
I'm not sure how a proper interface could be done (so I'm basically
just blabbering
David Greaves wrote:
Does everyone really type cat /proc/mdstat from time to time??
How clumsy...
And yes, I do :)
You're not alone..
*gah...*
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at
Michael Tokarev wrote:
I just come across an interesting situation, here's the
scenario.
[snip]
Now we have an interesting situation. Both superblocks in d1
and d2 are identical, event counts are the same, both are clean.
Things wich are different:
utime - on d1 it is more recent
Does this sound reasonable?
Does to me. Great example!
Thanks for painting the pretty picture :-).
Seeing as you're clearly the superior thinker, I'll address your brain
instead of wasting wattage on my own.
Let's say that MD had the feature to read from both disks in a mirror
and perform a
The following is (I think) appropriate for 2.4.30. The bug it fixes
can result in data corruption in a fairly unusual circumstance (having
a 3 drive raid1 array running in degraded mode, and suffering a system
crash).
What's unusual? Having a 3 drive raid1 array?
It's not unusual for a
Neil Brown wrote:
Is there any way to tell MD to do verify-on-write and
read-from-all-disks on a RAID1 array?
No.
I would have thought that modern disk drives did some sort of
verify-on-write, else how would they detect write errors, and they are
certainly in the best place to do
Just wondering;
Is there any way to tell MD to do verify-on-write and
read-from-all-disks on a RAID1 array?
I was thinking of setting up a couple of RAID1s with maximum data safety.
I'd like to verify after each write to a disk plus I'd like to read
from all disks and perform data comparison
Tobias wrote:
[...]
I just found your mail on this list, where I have been lurking for
some weeks now to get acquainted with RAID, but I fear my mail would
be almost OT there:
Think so? It's about RAID on Linux isn't it?
I'm gonna CC the list anyway, hope it's okay :-).
I was just curious
Max Waterman wrote:
Can I just make it a slave device? How will that effect performance?
AFAIR (CMIIW):
- The standards does not allow a slave without a master.
- The master has a role to play in that it does coordination of some
sort (commands perhaps?) between the slave drive and the
John McMonagle wrote:
All panics seem to be associated with accessing bad spot on sdb
It seems really strange that one can get panic from a drive problem.
sarcasm Wow, yeah, never seen that happen with Linux before! /sarcasm
Just for the fun of it, try digging up a disk which has a bad spot
Molle Bestefich wrote:
sarcasm Wow, yeah, never seen that happen with Linux before! /sarcasm
Wait a minute, that wasn't a very productive comment.
Nevermind, I'm probably just ridden with faulty hardware.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body
Neil Brown wrote:
It is writes, but don't be scared. It is just super-block updates.
In 2.6, the superblock is marked 'clean' whenever there is a period of
about 20ms of no write activity. This increases the chance on a
resync won't be needed after a crash.
(unfortunately) the superblocks
Neil Brown wrote:
Is my perception of the situation correct?
No. Writing the superblock does not cause the array to be marked
active.
If the array is idle, the individual drives will be idle.
Ok, thank you for the clarification.
Seems like a design flaw to me, but then again, I'm biased
Guy [EMAIL PROTECTED] wrote:
I generally agree with you, so I'm just gonna cite / reply to the
points where we don't :-).
This sounded like Neil's current plan. But if I understand the plan, the
drive would be kicked out of the array.
Yeah, sounds bad.
Although it should be marked as
Robin Bowes wrote:
I envisage something like:
md attempts read
one disk/partition fails with a bad block
md re-calculates correct data from other disks
md writes correct data to bad disk
- disk will re-locate the bad block
Probably not that simple, since some times multiple blocks will
No email [EMAIL PROTECTED] wrote:
Forgive me as this is probably a silly question and one that has been
answered many times, I have tried to search for the answers but have
ended up more confused than when I started. So thought maybe I could
ask the community to put me out of my misery
Is
Carlos Knowlton wrote:
Molle Bestefich wrote:
Linux filesystems seems to stink real bad when they span multiple
terabytes, at least that's my personal experience. I've tried both
ext3 and reiserfs. Even simple operations such as deleting files
suddenly take on the order of 10-20 minutes.
I'm
82 matches
Mail list logo