The current repacker code uses the allocate on flush code and the
transaction code, and walks through the tree sorting it, walking in both
directions.

Hans

David Masover wrote:

> Hans Reiser wrote:
>
> >David Masover wrote:
>
>
> >>I realize that this may not be quite the industrial-strength repacker
> >>that you wanted, but it should be immediately useful, which is a lot
> >>better than "We might do it if you pay us."
>
>
> >Just wait a little, and shortly after we go into the kernel we will work
> >on the repacker.
>
> >Hans
>
>
> Disclaimer:  I've hardly read any of the Reiser4 code, and I'm not
> really an authority on this subject.  I just like to pretend that I am.
>  I would take this off-list, but I'm curious about whether I'm wrong.
>
> The repacker (and the resizer) doesn't seem like a hugely complicated
> concept, unless you're trying to streamline the user experience during
> the process.  "On-line" means that I don't have to use a bootdisk and
> stop all my servers.  It doesn't mean that I would do it at any time
> other than 2 AM, when I do backups, when I generally expect almost 0
> traffic.
>
> Basically, I'm saying that an off-line or a slow on-line shrinker should
> have been done by now.  In fact, it should have been done before the
> meta-files, because meta-files benefit from a repacker, but not the
> other way around.
>
> Since you've told me to wait, I'm going to write this, because it's
> easier for me to write documentation than to read code.  This is
> probably the fault of school, and will likely disappear this summer.
>
> Anyway, this is how I think the resizer should be done:
>
> If we are growing the FS, we should lock everything necessary, then
> change the size value for the FS and make the new blocks available.
> Unless we're actually storing something in unused nodes, this should be
> an instantaneous operation which requires very little hacking to add.  I
> seem to remember that there was even an offline resizer (growing only)
> awhile ago.
>
> If we are shrinking the FS, we first set the new size of the FS in RAM,
> so that nothing will try to write to the "chopped-off" portion until
> we're done.
>
> Next, we turn off the "write-in-the-middle" feature for large
> database-like files (where a block in the middle of a huge file may be
> written twice to avoid fragmentation), so that absolutely no new writes
> will go to the chopped-off portion.
>
> Basically, the filesystem should already think it's shrunken by now, we
> just need to make sure it doesn't freak out when it _reads_ blocks past
> the end of the FS.  We should capture warnings about this and dirty
> those nodes on the spot (nodes which are being read and which are in the
> chopped section) -- they are already in RAM, so it'll be faster that way.
>
> Next, we start walking the tree (as you described), dirtying all the
> blocks we find which are in the chopped portion and leaving the rest
> alone.  We need to be careful about locking here, but that should just
> mean "Lock the block we're dealing with, or if locks aren't that
> granularity, lock the whole file."  Locking should block, and userland
> shouldn't have to know about it except to notice that the FS seems a
> little slow right then.
>
> This isn't as dangerous as it seems.  If there is a crash, we just go
> back to the old size -- automatically, since the new size hasn't been
> written to disk anywhere yet -- with the only difference being that most
> of the files will be already moved to where we want them.
>
> Locking isn't as hard as it seems.  If this were a VFS-level operation,
> we'd have to worry about a new directory being created, a file being
> moved, or our current path being deleted out from under us, but we
> aren't working on the semantic layer, we're working on the key/object
> layer.  If I'm right, that means that all the things that we'd have to
> worry about are merely seen as new writes, and would thus go to the new
> places.
>
> Metadata blocks may need a tiny bit of special treatment, since it may
> be some small amount of data changing in-place.  All we do here is, when
> we notice any attempted write outside the new FS size, but inside the
> old FS size, we relocate before we flush it out to disk.  If this means
> there's some parent metadata block we need to move, we do it afterwards,
> as part of the same transaction.  When we finally get to a parent block
> that does not need to be moved, we close the transaction.  This isn't as
> elegant as the method for moving data blocks, but it works.  I think.
>
> The nice thing about this is that for the most part, the net impact on
> normal FS operation is about the same as that of doing a large "cp -a".
>
> Thoughts?  How close to right is this?  Do you already have another
> document on the same thing that I should be reading?
>

Reply via email to