Let me try to formulate my idea again... You called a similar
process "pushing the rope" some time ago, I think.

I feel like I'm passing some exam and am trying to pick answers
for a discipline like philosophy and I have no idea about the
examinator's preferences - is he an ex-Communism teacher or an
eager new religion fanatic? The same answer can lead to an A
or to an F on a state exam. Ah, that was some fun experience :)

Well, what we know is what remains after we forget everything
that we were taught, while the exams are our last chance to
learn something at all =)

2012-05-24 10:28, Richard Elling wrote:
You have not made a case for why this hybrid and failure-prone
procedure is required. What problem are you trying to solve?

Bigger-better-faster? ;)

The original proposal in this thread was about understanding
how resilvers and scrubs work, why they are so dog slow on
HDDs in comparison to sequential reads, and thinking aloud
what can be improved in this area.

One of the later posts was about improving the disk replacement
(where the original is still responsive, but may be imperfect)
for filled-up fragmented pools by including a stage of fast
data transfer and a different IO pattern for verification and
updating of the new disk image, in comparison with current
resilver's IO patterns.

This may or may not have some benefits in certain (corner?)
cases which are of practical interest to some users on this
list, and if this discussion leads to a POC made by a competent
ZFS programmer, which can be tested on a variety of ZFS pools
(without risking one's only pool on a homeNAS) - so much the
better. Then we would see if this scenario is viable or utterly
useless and bad in every tested case.

The practical numbers I have from the same box and disks are:
* Copy from a 250Gb raidz1 (9*(4+1)) pool to a single-disk 3Tb
  test pool took 24 hours to fill the new disk - including the
  ZFS overheads.
* Copying of one raw 250(232)Gb partition takes under 2 hours
  (if it can sustain about 70Mb/s reads from the source without
  distractions like other pool IO - then 1 hour).
* Proper resilvering (reading all BP-tree from the original pool,
  reading all blocks from the TLVDEV, writing reconstructed(?)
  sectors to the target disk) from one partition to another
  took 17 hours.
* Full scrubbing (reading all blocks from the pool, fixing
  checksum mismatches) takes 25-27 hours.
* Selective scrubbing - unimplemented, timeframe unknown
  (reading all BP-tree from the original pool, reading all
  blocks from the TLVDEV including the target disk and the
  original disk, fixing checksum mismatches without panicky
  messages and/or hotspares kicking in).
  I *guess* it would have similar speed to a resilver, but
  less bound to random write IO patterns, which may be better
  for latencies of other tasks on the system.

So, in case of original resilver, I replace the not-yet-dead
disk with a hotspare, and after 17 hours of waiting I see if
it was successfully resilvered or not. During this time the
disk can die for example, leaving my pool with lowered
protection (or lack thereof in case of raidz1 or two-way

In case of the new method proposed for a POC implementation,
after 1 hour I'd already have a somewhat reliable copy of
that vdev (a few blocks may have mismatches, but if the
source disk dies or is taken away now - not the whole TLVDEV
or pool is degraded and has compromised protection). Then
after the same +17 hours for scrubs I'd be certain that
this copy is good.

If the new writes incoming to this TLVDEV between start of
DD and end of scrub are directed to be written on both the
source disk and its copy, then there are less (down to zero)
checksum discrepancies that the scrub phase would find.

Why not follow the well-designed existing procedure?

First it was a theoretical speculation, but a couple of days
later the incomplete resilver made me a practical experiment
of the idea.

The failure data does not support your hypothesis.
Ok, then my made-up and dismissed argument does not stand ;)

Thanks for the discussion,
//Jim Klimov
zfs-discuss mailing list

Reply via email to