> > It seems to me that once you copy meta data, you can indeed > > copy all live data sequentially. > > I don't see this, given the top down strategy. For instance, if I > understand the transactional update process, you can't commit the > metadata until the data is in place. > > Can you explain in more detail your reasoning?
First realize that this is just a thought experiment -- I haven't read much source code in any detail as yet and it is entirely likely what I am suggesting can not work or there is no need for it! With that caveat.... http://blogs.sun.com/bonwick/entry/smokin_mirrors talks about top-down resilvering. That is, copy the root blocks first (for example the uberblock), then the blocks they point to and so on [1]. A major goal here is to minimize data loss in case of a second failure. Completely losing a metadata block means you can't access anything it points to so metadata blocks are far more precious than data blocks. This is different from a normal update transaction, where the copy-on-write proceeds bottom up -- which is what you are talking about. A major goal for a normal update is to ensure that at all times a consistent filesystem structure is seen by all. All I was suggesting is that once all the metadata is copied (or "resilvered"), switch to sequential copying to maximize performance. This does make checking the validity of a data block more complicated. So instead of copying data of file1 and then file2 and so on, just copy blocks in the most efficient order, save their checksums and periodically validate a whole bunch. In fact since metadata is read first, you can roughly figure which metadata blocks will be needed when to check data block validity (because you know where data blocks are stored). > > Given that a vast majority > > of disk blocks in use will typically contain data, this is a > > winning strategy from a performance point of view and still > > allows you to retrieve a fair bit of data in case of a second > > disk failure (checksumming will catch a case where good > > metadata points to as yet uncopied data block). If amount of > > live data is > 50% of disk space you may as well do a disk > > copy, perhaps skipping over already copied meta data. > > > > Not only that, you can even start using the disk being > > resilvered right away for writes, The new write will be > > either to a) an already copied block > > How can that be, under a COW regime? I was talking about resilvering, not a normal update. Copy on write happens only for a normal update. I was speculating that you can do normal updates during resilvering. Not sure if this is clear to anyone! -- bakul [1] Top down resilvering seems very much like a copying garbage collector. That similarity make me wonder if the physical layout can be rearranged in some way for a more efficient access to data -- the idea is to resilver and compactify at the same time on one of the mirrors and then make it the master and resilver the other mirrors. Nah... probably not worth the hassle. [Again, I suspect no one else understands what I am talking about:-)] _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss