Let me try to formulate my idea again... You called a similar process "pushing the rope" some time ago, I think.
I feel like I'm passing some exam and am trying to pick answers for a discipline like philosophy and I have no idea about the examinator's preferences - is he an ex-Communism teacher or an eager new religion fanatic? The same answer can lead to an A or to an F on a state exam. Ah, that was some fun experience :) Well, what we know is what remains after we forget everything that we were taught, while the exams are our last chance to learn something at all =) 2012-05-24 10:28, Richard Elling wrote:
You have not made a case for why this hybrid and failure-prone procedure is required. What problem are you trying to solve?
Bigger-better-faster? ;) The original proposal in this thread was about understanding how resilvers and scrubs work, why they are so dog slow on HDDs in comparison to sequential reads, and thinking aloud what can be improved in this area. One of the later posts was about improving the disk replacement (where the original is still responsive, but may be imperfect) for filled-up fragmented pools by including a stage of fast data transfer and a different IO pattern for verification and updating of the new disk image, in comparison with current resilver's IO patterns. This may or may not have some benefits in certain (corner?) cases which are of practical interest to some users on this list, and if this discussion leads to a POC made by a competent ZFS programmer, which can be tested on a variety of ZFS pools (without risking one's only pool on a homeNAS) - so much the better. Then we would see if this scenario is viable or utterly useless and bad in every tested case. The practical numbers I have from the same box and disks are: * Copy from a 250Gb raidz1 (9*(4+1)) pool to a single-disk 3Tb test pool took 24 hours to fill the new disk - including the ZFS overheads. * Copying of one raw 250(232)Gb partition takes under 2 hours (if it can sustain about 70Mb/s reads from the source without distractions like other pool IO - then 1 hour). * Proper resilvering (reading all BP-tree from the original pool, reading all blocks from the TLVDEV, writing reconstructed(?) sectors to the target disk) from one partition to another took 17 hours. * Full scrubbing (reading all blocks from the pool, fixing checksum mismatches) takes 25-27 hours. * Selective scrubbing - unimplemented, timeframe unknown (reading all BP-tree from the original pool, reading all blocks from the TLVDEV including the target disk and the original disk, fixing checksum mismatches without panicky messages and/or hotspares kicking in). I *guess* it would have similar speed to a resilver, but less bound to random write IO patterns, which may be better for latencies of other tasks on the system. So, in case of original resilver, I replace the not-yet-dead disk with a hotspare, and after 17 hours of waiting I see if it was successfully resilvered or not. During this time the disk can die for example, leaving my pool with lowered protection (or lack thereof in case of raidz1 or two-way mirrors). In case of the new method proposed for a POC implementation, after 1 hour I'd already have a somewhat reliable copy of that vdev (a few blocks may have mismatches, but if the source disk dies or is taken away now - not the whole TLVDEV or pool is degraded and has compromised protection). Then after the same +17 hours for scrubs I'd be certain that this copy is good. If the new writes incoming to this TLVDEV between start of DD and end of scrub are directed to be written on both the source disk and its copy, then there are less (down to zero) checksum discrepancies that the scrub phase would find.
Why not follow the well-designed existing procedure?
First it was a theoretical speculation, but a couple of days later the incomplete resilver made me a practical experiment of the idea.
The failure data does not support your hypothesis.
Ok, then my made-up and dismissed argument does not stand ;) Thanks for the discussion, //Jim Klimov _______________________________________________ zfs-discuss mailing list firstname.lastname@example.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss