Well I read this email having just written a mammoth one in the other thread, my thoughts:
The main difficulty in this, as far as I see it, is you're intentionally moving data on a checksummed copy-on-write filesystem ;). At the very least this is creating lots of work before we even start to address the problem (and given that the ZFS guys are undoubtedly working on device removal, that effort would be wasted). I think this is probably more difficult than it's worth -- re-writing data should be a separate non RAID-Z specific feature (once you're changing the block pointers, you need to update the checksums, and you need to ensure that you're maintaining consistency, preserve snapshots, etc. etc.). Surely it would be much easier to leave the data as is and version the array's disk layout? James On 8/7/07, Pawel Jakub Dawidek <pjd at freebsd.org> wrote: > Yeah:) > > I'd like to work on this. Here are my first observations: > - We need to call vdev_op_asize method with additonal 'offset' argument, > - We need to move data to new disk starting from the very begining, so > we can't reuse scrub/resilver code which does tree-walk through the > data. > > Below you can see how I imagine to extend RAIDZ. Here is the legend: > << >> - block boundaries > D<x> - data block > P<x> - parity block > N<x> - new parity block > U - unused > * - if offset in I/O request is less that this marker we use four > disks only, if greater - we use five disks > > After adding 'NewDisk' to RAIDZ vdev, we have something like this: > > Disk0 Disk1 Disk2 Disk3 NewDisk > > <<P00 D00 D01 D02 U > P01 D03 D04 D05 U > P02 D06>> <<P03 D07>> U > <<P04 D08>> <<P05 D09 U > P06 D10 D11 D12>> U > <<P07 D13 D14 D15>> U > > Then we start moving data, but we need to beging from the start: > > Disk0 Disk1 Disk2 Disk3 NewDisk > > <<N00 D00 D01 D02 D03 > N01 D04 D05 D06>> <<P03 > D07>> * U U U U > <<P04 D08>> <<P05 D09 U > P06 D10 D11 D12>> U > <<P07 D13 D14 D15>> U > > At the end we have something like this (free space at the end): > > Disk0 Disk1 Disk2 Disk3 NewDisk > > <<N00 D00 D01 D02 D03 > N01 D04 D05 D06>> <<P03 > D07>> <<P04 D08>> <<N03 D09 > D10 D11 D12>> <<N04 D13 > D14 D15>> U U U > U U U U U > > The biggest problem for me is a method to traverse allocated blocks > sorted by offset. Any hints how to do it? > > -- > Pawel Jakub Dawidek http://www.wheel.pl > pjd at FreeBSD.org http://www.FreeBSD.org > FreeBSD committer Am I Evil? Yes, I Am! > > _______________________________________________ > zfs-code mailing list > zfs-code at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-code > > >