On 2014-11-19 at 06:52, Eric Blake wrote:
On 11/18/2014 01:26 PM, Eric Blake wrote:

Now, in response to your question about some other 3-pass inducing
pattern, let's think back to v1, where you questioned what would happen
if a hole in the reftable gets turned into data due to a later
allocation.  Let's see if I can come up with a scenario for that...

Let's stick with a cluster size of 512, and use 32-bit and 64-bit widths
as our two sizes.  If we downsize from 64 to 32 bits, then every two
refblock clusters in the old table results in one call to operation()
for the new table; conversely, if we upsize, then every refblock cluster
in the old table gives two calls to operation() in the new table.  The
trick at hand is to come up with some image where we punch a hole so
that on the first pass, we call operation() with refblock_empty true for
one iteration (necessarily a call later than the first, since the image
header guarantees the first refblock is not empty), but where we have
data after the hole, where it is the later data that triggers the
allocation that will finally start to fill the hole.

How about starting with an image that occupies between 1.5 and 2
refblocks worth of 32-width clusters (so an image anywhere between 193
and 256 clusters, or between 98816 and 131072 bytes).  You should be
able to figure out how many clusters this consumes for L1, L2, plus 1
for header, reftable, and 2 for refblocks, in order to figure out how
many remaining clusters are dedicated to data; ideally, the data
clusters are contiguous, and occupy a swath that covers at least
clusters 126 through 192.  Widening to 64-bit width will require 4
refblocks instead of 2, if all refblocks are needed.  But the whole idea
of punching a hole is that we don't need a refblock if it will be
all-zero entries.  So take this original image, and discard the data
clusters from physical index 126 through 192, (this is NOT the data
visible at guest offset 31744, but whatever actual offset of guest data
that maps to physical offset 31744).  The old reftable now looks like {
refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191
empty, 192-whatever occupied, tail empty] }.  With no allocations
required, this would in turn would map to the following new refblocks: {
refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127
empty], NULL, refblock_n4 [192-whatever occupied] }.  Note that we do
not need to allocate refblock_n3 because of the hole in the old
refblock; we DO end up allocating three refblocks, but in the sequence
of things, refblock_n1 and refblock_n2 are allocated while we are
visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is
not allocated until after we have already passed over the first half of
refblock_o2.

Thus, the second walk over the image will see that we need to allocate
refblock_n3 because it now contains entries (in particular, the entry
for refblock_n4, but also the 1-cluster entry for the proposed reftable
that is allocated between the walks).  So, while your test used the
allocation of the reftable as the spillover point, my scenario here uses
the allocation of later refblocks as the spillover point that got missed
during the first iteration.

Oops,...

which means the reftable now looks like { refblock1, NULL, refblock3,
NULL... }; and where refblock1 now has at least two free entries
(possibly three, if the just-freed refblock2 happened to live before
cluster 62).  is we can also free refblock2

...forgot to delete these random thoughts that I typed up but no longer
needed after reworking the above text.

At any rate, I'm not certain we can come up with a four-pass scenario;
if it is even possible, it would be quite complex.

[snip] (But rest assured, I read it all ;-))

At this point, I've spent far too long writing this email.  I haven't
completely ruled out the possibility of a corner case needing four
passes through the do loop, but the image sizes required to get there
are starting to be quite large compared to your simpler test of needing
three passes through the do loop.

Right, see test 026. Without an SSD, it takes more than ten minutes, not least because it tests resizing the reftable which means writing a lot of data to an image with 512 byte clusters.

I won't be bothered if we call it
good, and quit trying to come up with any other "interesting" allocation
sequencing.

The problem is, in my opinion, that we won't gain a whole lot from proving that there are cases where you need a fourth pass and test these cases. Fundamentally, they are not different from cases with three passes (technically, not even different from two pass cases). You scan through the refcounts, you detect that you need refblocks which you have not yet allocated, you allocate them, then you respin until all allocations are done. The only problem would be whether it'd be possible to run into an infinite loop: Can allocating new refblocks lead to a case where we have to allocate even more refblocks? Well, just judging from how complicated it is to even find a case where the number of new allocations hasn't gone down to zero in the fourth pass, we can safely rule that out.

Some people may ask why the walks are performed in a loop without a fixed limit (because they can't find cases where allocations haven't settled at the third pass). But I doubt that'll be a serious problem. It's much easier to have such a basically unlimited loop with the reasoning "We don't know exactly how many loops it'll take, but it will definitely settle at some point in time" than limiting the loop and then having to explain why we know exactly that it won't take more than X passes. The only problem with not limiting is that we need one walk to verify that all allocations have settled. But we need that for the common case (two passes) anyway, so that's not an issue.

The code from this version does not care whether it takes one, two, three, four or 42 passes. It's all the same. It will never take one and it will probably never take 42 passes; but if it does, well, it will work. Therefore, I think testing one non-standard number of passes (three) is enough. I'd like to test more, but the effort's just not worth it. I think.

Max

Reply via email to