-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hans Reiser wrote:
> David Masover wrote:
> 
> 
>>
>>I realize that this may not be quite the industrial-strength repacker
>>that you wanted, but it should be immediately useful, which is a lot
>>better than "We might do it if you pay us."
> 
> 
> Just wait a little, and shortly after we go into the kernel we will work
> on the repacker.
> 
> Hans

Disclaimer:  I've hardly read any of the Reiser4 code, and I'm not
really an authority on this subject.  I just like to pretend that I am.
 I would take this off-list, but I'm curious about whether I'm wrong.

The repacker (and the resizer) doesn't seem like a hugely complicated
concept, unless you're trying to streamline the user experience during
the process.  "On-line" means that I don't have to use a bootdisk and
stop all my servers.  It doesn't mean that I would do it at any time
other than 2 AM, when I do backups, when I generally expect almost 0
traffic.

Basically, I'm saying that an off-line or a slow on-line shrinker should
have been done by now.  In fact, it should have been done before the
meta-files, because meta-files benefit from a repacker, but not the
other way around.

Since you've told me to wait, I'm going to write this, because it's
easier for me to write documentation than to read code.  This is
probably the fault of school, and will likely disappear this summer.

Anyway, this is how I think the resizer should be done:

If we are growing the FS, we should lock everything necessary, then
change the size value for the FS and make the new blocks available.
Unless we're actually storing something in unused nodes, this should be
an instantaneous operation which requires very little hacking to add.  I
seem to remember that there was even an offline resizer (growing only)
awhile ago.

If we are shrinking the FS, we first set the new size of the FS in RAM,
so that nothing will try to write to the "chopped-off" portion until
we're done.

Next, we turn off the "write-in-the-middle" feature for large
database-like files (where a block in the middle of a huge file may be
written twice to avoid fragmentation), so that absolutely no new writes
will go to the chopped-off portion.

Basically, the filesystem should already think it's shrunken by now, we
just need to make sure it doesn't freak out when it _reads_ blocks past
the end of the FS.  We should capture warnings about this and dirty
those nodes on the spot (nodes which are being read and which are in the
chopped section) -- they are already in RAM, so it'll be faster that way.

Next, we start walking the tree (as you described), dirtying all the
blocks we find which are in the chopped portion and leaving the rest
alone.  We need to be careful about locking here, but that should just
mean "Lock the block we're dealing with, or if locks aren't that
granularity, lock the whole file."  Locking should block, and userland
shouldn't have to know about it except to notice that the FS seems a
little slow right then.

This isn't as dangerous as it seems.  If there is a crash, we just go
back to the old size -- automatically, since the new size hasn't been
written to disk anywhere yet -- with the only difference being that most
of the files will be already moved to where we want them.

Locking isn't as hard as it seems.  If this were a VFS-level operation,
we'd have to worry about a new directory being created, a file being
moved, or our current path being deleted out from under us, but we
aren't working on the semantic layer, we're working on the key/object
layer.  If I'm right, that means that all the things that we'd have to
worry about are merely seen as new writes, and would thus go to the new
places.

Metadata blocks may need a tiny bit of special treatment, since it may
be some small amount of data changing in-place.  All we do here is, when
we notice any attempted write outside the new FS size, but inside the
old FS size, we relocate before we flush it out to disk.  If this means
there's some parent metadata block we need to move, we do it afterwards,
as part of the same transaction.  When we finally get to a parent block
that does not need to be moved, we close the transaction.  This isn't as
elegant as the method for moving data blocks, but it works.  I think.

The nice thing about this is that for the most part, the net impact on
normal FS operation is about the same as that of doing a large "cp -a".

Thoughts?  How close to right is this?  Do you already have another
document on the same thing that I should be reading?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQIVAwUBQl3wqXgHNmZLgCUhAQIIGw//SWn2lkNPAGrFcF/r+Vr3t84l/haxnDFL
AF9/xARb6vQ/Mu/AQEd/L8lNabLPymXdzfBUJan2mhLFH97SrlGrA3hdBDcd9xMi
LXlvernTOFcv63jTB2cEq4awnMpTih4mZFrp1qAJ0kcWSu8oaCBUaOk3htXBfuKU
YAkireyHU6EWV2HQlfHmJrd9G/Z0CR6JmmAfVeBKG1CkI0t4Y86GmbeMVqsLdSz1
VEHfdTsCWgcaaod5GOjMk7BbB1a+fvf2wDk3ZsTiCkk8KP1JYPjKnXpCgG3ts8np
hMH1CEDj2Ql+lga8s44fXc0zrez6OAMjzMc/erNc6eUA7iFedQhmQW5oPxMu7TNh
aDF8PekMeYF1cYR1gFXG7B2P5gFx/k2KqDCxzHFNGKZLtSDBvuVlotDD7oJspYpd
5qvVQ0Mj1iYe6bxnV11rCHOvE2f56JlrFJtmzmEI0vmsln0sE4WktxKFONddwf5H
FuEn0L6XB+HkA9gsvkrM3J5xTd2PP1G02oF1MQFRIe3+CsomlSOwE1ZjfEi81s/p
z3Lvz6+0AO8xS7L2et84/y6uCaTb2/z8LZUhKMKx2j+OaUSBqgrzTYjCcotYooYO
7OM4KrSwpjmYQkCtm+iGYy8eC9sv09ng+YsE7F0MJlV1YZ17wo0eRbOVoUJpIgFF
kyq/7WnIUwM=
=D55S
-----END PGP SIGNATURE-----

Reply via email to