On 2012-12-11 16:44, Jim Klimov wrote:
For single-break-per-row tests based on hypotheses from P parities,
D data disks and R broken rows, we need to checksum P*(D^R) userdata
recombinations in order to determine that we can't recover the block.
A small maths correction: the formula above reflects that we change
some one item from on-disk value to reconstrycted hypothesis on some
one data disk(column) in all rows, or on P disks if we try to recover
from more than one failed item in a row.
Reality is worse :)
Our original info (parity errors and checksum mismatch) warranted
only that we have at least one error in userdata. It is possible
that other (R-1) errors are on the parity disk, so the recombination
should also check all variants with (0..R-1) unchanged rows with
their on-disk contents intact.
This gives us something like P*(D + D^2 + ... + D^R) variants to
test, which is roughly a 25% increase in recombinations in the
range of computationally feasible amounts of error-matching.
Heck, just counting from 1 to 2^64 in a "i++" loop takes a lot
of CPU time
By my estimate, even that would take until the next Big Bang,
at least on my one computer ;)
Just for fun: a count to 2^32 took 42 seconds, so my computer
can do 10^8 trivial loops per second - but that's just a data
point. What really matters is that 4^64 == (2^32)^33, which
is a lot. Roughly, 2^3 = 8 ~= 10, so the plain count from 1 to
4^64 would take about 42*10^30 seconds, or roughly 10^24 years.
If the astronomers' estimates are correct, this amounts to
10^13 lifetimes of our universe, or so ;)
zfs-discuss mailing list