> > It seems to me that once you copy meta data, you can indeed
> > copy all live data sequentially.
> 
> I don't see this, given the top down strategy. For instance, if I 
> understand the transactional update process, you can't commit the 
> metadata until the data is in place.
> 
> Can you explain in more detail your reasoning?

First realize that this is just a thought experiment -- I
haven't read much source code in any detail as yet and it is
entirely likely what I am suggesting can not work or there is
no need for it!  With that caveat....

http://blogs.sun.com/bonwick/entry/smokin_mirrors talks about
top-down resilvering.   That is, copy the root blocks first
(for example the uberblock), then the blocks they point to
and so on [1].  A major goal here is to minimize data loss in
case of a second failure.  Completely losing a metadata block
means you can't access anything it points to so metadata
blocks are far more precious than data blocks.  This is
different from a normal update transaction, where the
copy-on-write proceeds bottom up -- which is what you are
talking about.  A major goal for a normal update is to ensure
that at all times a consistent filesystem structure is seen
by all.

All I was suggesting is that once all the metadata is copied
(or "resilvered"), switch to sequential copying to maximize
performance.  This does make checking the validity of a data
block more complicated.  So instead of copying data of file1
and then file2 and so on, just copy blocks in the most
efficient order, save their checksums and periodically
validate a whole bunch.  In fact since metadata is read
first, you can roughly figure which metadata blocks will be
needed when to check data block validity (because you know
where data blocks are stored).

> >   Given that a vast majority
> > of disk blocks in use will typically contain data, this is a
> > winning strategy from a performance point of view and still
> > allows you to retrieve a fair bit of data in case of a second
> > disk failure (checksumming will catch a case where good
> > metadata points to as yet uncopied data block).  If amount of
> > live data is > 50% of disk space you may as well do a disk
> > copy, perhaps skipping over already copied meta data.
> >
> > Not only that, you can even start using the disk being
> > resilvered right away for writes,  The new write will be
> > either to a) an already copied block
> 
> How can that be, under a COW regime?

I was talking about resilvering, not a normal update.  Copy
on write happens only for a normal update.  I was speculating
that you can do normal updates during resilvering.

Not sure if this is clear to anyone!

-- bakul

[1] Top down resilvering seems very much like a copying
garbage collector.  That similarity make me wonder if the
physical layout can be rearranged in some way for a more
efficient access to data -- the idea is to resilver and
compactify at the same time on one of the mirrors and then
make it the master and resilver the other mirrors.  Nah...
probably not worth the hassle.  [Again, I suspect no one else
understands what I am talking about:-)]
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to