Re: detecting/correcting _slightly_ flaky disks

2007-03-06 Thread Michael Stumpf
Bill Davidsen wrote: Michael Stumpf wrote: This is the drive I think is most suspect. What isn't obvious, because it isn't listed in the self test log, is between #1 and #2 there was an aborted, hung test. The #4 short test that was aborted was also a hung test that I eventually, manually a

Re: Help with chunksize on raid10 -p o3 array

2007-03-06 Thread Bill Davidsen
Peter Rabbitson wrote: Hi, I have been trying to figure out the best chunk size for raid10 before migrating my server to it (currently raid1). I am looking at 3 offset stripes, as I want to have two drive failure redundancy, and offset striping is said to have the best write performance, with

Re: mismatch_cnt questions

2007-03-06 Thread Bill Davidsen
Neil Brown wrote: On Monday March 5, [EMAIL PROTECTED] wrote: Neil Brown wrote: [trim Q re how resync fixes data] For raid1 we 'fix' and inconsistency by arbitrarily choosing one copy and writing it over all other copies. For raid5 we assume the data is correct and update the parity.

Re: detecting/correcting _slightly_ flaky disks

2007-03-06 Thread Bill Davidsen
Michael Stumpf wrote: This is the drive I think is most suspect. What isn't obvious, because it isn't listed in the self test log, is between #1 and #2 there was an aborted, hung test. The #4 short test that was aborted was also a hung test that I eventually, manually aborted--heard clickin

Re: RAID1, hot-swap and boot integrity

2007-03-06 Thread Bill Davidsen
H. Peter Anvin wrote: Mike Accetta wrote: I wonder if having the MBR typically outside of the array and the relative newness of partitioned arrays are related? When I was considering how to architect the RAID1 layout it seemed like a partitioned array on the entire disk worked most naturally.

Re: RAID1, hot-swap and boot integrity

2007-03-06 Thread Bill Davidsen
Mike Accetta wrote: Bill Davidsen wrote: Gabor Gombas wrote: On Fri, Mar 02, 2007 at 09:04:40AM -0500, Mike Accetta wrote: Thoughts or other suggestions anyone? This is a case where a very small /boot partition is still a very good idea... 50-100MB is a good choice (some initramfs ge

Re: high mismatch count after scrub

2007-03-06 Thread Justin Piszcz
On Tue, 6 Mar 2007, Dexter Filmore wrote: xerxes:/sys/block/md0/md# cat mismatch_cnt 147248 Need to worry? If you have a swap file on this array, then that could explain it, so don't worry. Nope, swap is not on the array. Couple of loops tho. If not... maybe worry? I assume you did a '

Re: high mismatch count after scrub

2007-03-06 Thread Dexter Filmore
> > xerxes:/sys/block/md0/md# cat mismatch_cnt > > 147248 > > Need to worry? > > If you have a swap file on this array, then that could explain it, so > don't worry. Nope, swap is not on the array. Couple of loops tho. > > If not... maybe worry? > > I assume you did a 'check' or 'repair' before l

Re: Replace drive in RAID5 without losing redundancy?

2007-03-06 Thread Ralf Müller
Am 06.03.2007 um 08:37 schrieb dean gaudet: On Tue, 6 Mar 2007, Neil Brown wrote: On Monday March 5, [EMAIL PROTECTED] wrote: Is it possible to mark a disk as "to be replaced by an existing spare", then migrate to the spare disk and kick the old disk _after_ migration has been done? Or

Re: mismatch_cnt questions - how about raid10?

2007-03-06 Thread Justin Piszcz
On Tue, 6 Mar 2007, Peter Rabbitson wrote: Neil Brown wrote: On Tuesday March 6, [EMAIL PROTECTED] wrote: Neil Brown wrote: When we write to a raid1, the data is DMAed from memory out to each device independently, so if the memory changes between the two (or more) DMA operations, you will g

Re: mismatch_cnt questions - how about raid10?

2007-03-06 Thread Peter Rabbitson
Neil Brown wrote: On Tuesday March 6, [EMAIL PROTECTED] wrote: Neil Brown wrote: When we write to a raid1, the data is DMAed from memory out to each device independently, so if the memory changes between the two (or more) DMA operations, you will get inconsistency between the devices. Does this

Re: mismatch_cnt questions - how about raid10?

2007-03-06 Thread Neil Brown
On Tuesday March 6, [EMAIL PROTECTED] wrote: > Neil Brown wrote: > > When we write to a raid1, the data is DMAed from memory out to each > > device independently, so if the memory changes between the two (or > > more) DMA operations, you will get inconsistency between the devices. > > Does this ap

Re: mismatch_cnt questions - how about raid10?

2007-03-06 Thread Peter Rabbitson
Neil Brown wrote: When we write to a raid1, the data is DMAed from memory out to each device independently, so if the memory changes between the two (or more) DMA operations, you will get inconsistency between the devices. Does this apply to raid 10 devices too? And in case of LVM if swap is on

Re: RAID1, hot-swap and boot integrity

2007-03-06 Thread Gabor Gombas
On Mon, Mar 05, 2007 at 06:32:32PM -0500, Mike Accetta wrote: > Yes, we actually have a separate (smallish) boot partition at the front of > the array. This does reduce the at-risk window substantially. I'll have to > ponder whether it reduces it close enough to negligible to then ignore, but >

Re: Replace drive in RAID5 without losing redundancy?

2007-03-06 Thread Ralf Müller
Am 05.03.2007 um 23:29 schrieb Neil Brown: On Monday March 5, [EMAIL PROTECTED] wrote: Is it possible to mark a disk as "to be replaced by an existing spare", then migrate to the spare disk and kick the old disk _after_ migration has been done? Or not even kick - but mark as new spare.