On May 23, 2012, at 2:56 PM, Jim Klimov wrote:

> Thanks again,
> 2012-05-24 1:01, Richard Elling wrote:
>>> At least the textual error message infers that if a hotspare
>>> were available for the pool, it would kick in and invalidate
>>> the device I am scrubbing to update into the pool after the
>>> DD-phase (well, it was not DD but a hung-up resilver in this
>>> case, but that is not substantial).
>> The man page is clear on this topic, IMHO
> Indeed, even in snv_117 the zpool man page says that. But the
> console/dmesg message was also quite clear, so go figure whom
> to trust (or fear) more ;)

The FMA message is consistent with the man page.

> fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-GH, TYPE: Fault, VER: 1, 
> EVENT-TIME: Wed May 16 03:27:31 MSK 2012
> PLATFORM: Sun Fire X4500, CSN: 0804AMT023            , HOSTNAME: thumper
> SOURCE: zfs-diagnosis, REV: 1.0
> EVENT-ID: cc25a316-4018-4f13-c675-d1d84c6325c3
> DESC: The number of checksum errors associated with a ZFS device
> exceeded acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-GH for more 
> information.
> AUTO-RESPONSE: The device has been marked as degraded.  An attempt
> will be made to activate a hot spare if available.
> IMPACT: Fault tolerance of the pool may be compromised.
> REC-ACTION: Run 'zpool status -x' and replace the bad device.
>>> > dd, or simular dumb block copiers, should work fine.
>>> > However, they are inefficient...
>>> Define efficient? In terms of transferring the 900Gb payload
>>> of a 1Tb HDD used for ZFS for a year - DD would beat resilver
>>> anytime, in terms of getting most or (less likely) all of the
>>> valid bits with data onto the new device. It is the next phase
>>> (getting the rest of the bits into valid state) that needs
>>> some attention, manual or automated.
>> speed != efficiency
> Ummm... this is likely to start a flame war with other posters,
> and you did not say what efficiency is to you? How can we compare
> apples to meat, not even knowing whether the latter is a steak or
> a pork knee?

Efficiency allows use of denominators other than time. Speed is restricted
to a denominator of time. There is no flame war here, look elsewhere.

> I, for now, choose to stand by a statement that reduction of the
> timeframe that the old disk needs to be in the system is a good
> thing, as well as that changing the IO pattern from random writes
> into (mostly) sequential writes and after that random reads may
> be also somewhat more efficient, especially under other loads
> (interfering less with them). Even though the whole replacement
> process may take more wallclock time, there are cases when I'd
> likely trust it to do a better job than original resilvering.
> I think, someone with equipment could stage an experiment and
> compare the two procedures (existing and proposed) on a nearly
> full and somewhat fragmented pool.

Operationally, your method loses every time.

> Maybe you can disenchant me (not with vague phrases but either
> theory or practice) and I would then see that my trust is blind,
> misdirected and without basement. =)

>> IMHO, this is too operationally complex for most folks. KISS wins.
> That's why I proposed to tuck this scenario under the zfs hood
> (DD + selective scrub + ditto writes during the process,
> as an optional alternative to current resilver), or explain
> coherently why this should not be done - not for any situation.
> Implementing it as a standard supported command would be KISS ;)
> Especially if it is known that with some quirks this procedure
> works, and may be beneficial to some cases, i.e. by reducing
> the timeframe that a pool with a flaky disk in place is exposed
> to potential loss of redundancy and large amounts of data, and
> in the worst case the loss is constrained to those sectors
> which couldn't be (correctly) read by DD from the source disk
> and couldn't be reconstructed by raidz/mirror redundancies due
> to whatever overlaying problems (i.e. a sector from same block
> died on another disk too).

You have not made a case for why this hybrid and failure-prone 
procedure is required. What problem are you trying to solve?

>> What is it about error counters that frightens you enough to want to clear
>> them often?
> In this case, mostly, the fright of having the device kicked
> out of the pool automatically instead of getting it "synced"
> ("resilvered" is an improper term here, I guess) to proper state.

Why not follow the well-designed existing procedure?

> In general - since this is a part of some migration procedure
> which is, again, expected to have errors, we don't really care
> for signalling them. Why doesn't the original resilver signal
> several million CKSUM errors per new empty disk when it does
> reconstruction of sectors onto it? I'd say this is functionally
> identical. (At least, would be - if it were part of a supported
> procedure as I suggest).
> Thanks,
> //Jim Klimov
> PS: I pondered for a while if I should make up an argument that
> on a dying disk mechanics, lots of random IO (resilver) instead
> of sequential IO (DD) would cause it to die faster, but that's
> just a FUD not backed by any scientific data or statistics -
> which you likely have, and perhaps opposing this argument indeed.

The failure data does not support your hypothesis.
 -- richard


ZFS and performance consulting
SCALE 10x, Los Angeles, Jan 20-22, 2012

zfs-discuss mailing list

Reply via email to