On Sat, Apr 16, 2011 at 09:29:26PM +0200, Manuel Bouyer wrote: > Hello, > attached is a work in progress on ffs snapshot (as it's work in progress, > some debug and instrumentation code is still present in the > patch, no need to comment on this part :). > The start of this work is that when working on quota, I noticed that > taking a snapshot on a 500Gb filesystem needs several minutes, and is > O(n) with the number of persisent snapshots. > Here's some timings on a otherwise idle 500Gb filesystem (it's some brand of > SATA2 3.5" drive attached to a AHCI controller, so it's a reasonable test > bed for today): > java# /usr/bin/time fssconfig fss0 /home /home/snaps/snap0 > 260.53 real 0.00 user 1.15 sys > /home: suspended 77.873 sec, redo 1184 of 2556 > java# /usr/bin/time fssconfig fss1 /home /home/snaps/snap1 > 377.87 real 0.00 user 2.53 sys > /home: suspended 206.078 sec, redo 1184 of 2556 > java# /usr/bin/time fssconfig fss2 /home /home/snaps/snap2 > 508.23 real 0.00 user 4.28 sys > /home: suspended 338.534 sec, redo 1184 of 2556 > java# /usr/bin/time fssconfig fss3 /home /home/snaps/snap3 > 621.40 real 0.00 user 5.50 sys > /home: suspended 431.154 sec, redo 1183 of 2556 > > suspending a filesystem for more than 7mn to take a snapshot makes > persisent snapshot quite useless to me. I wonder how it would behaves > on a multi-terabyte filesystem. > > I looked at where the time is spend and found 2 major issues: > 1 cgaccount() works in 2 pass: first it copies cg before suspending the > filesystem; then it is called again to copy only the cg that have been > modified between copy and filesystem suspend. > The problem is that to copy a cg we need to allocate blocks for the snapshot > file, which may be in a cg we just copied. This is the cause of the high > number of cg copies (almost half of them) with the filesystem suspended. > > 2 while the filesystem is suspended, we want to expunge the snapshot files > from the snapshot view (make them appear as a 0-length file). > With ~500GB sparse files this is a lot of work. > > I fixed 1) by preallocating needed blocks snapshot_setup().
Good catch. Committed. > Fixing 2) is trickier. To avoid the heavy writes to the snapshot file > with the fs suspended, the snapshot appears with its real lenght and > blocks at the time of creation, but is marked invalid (only the > inode block needs to be copied, and this can be done before suspending > the fs). Now BLK_SNAP should never be seen as a block number, and we skip > ffs_copyonwrite() if the write is to a snapshot inode. I strongly object here. There are good reasons to expunge old snapshots. Even it it were done right, without deadlocks and locking-against-self, the resulting snapshot looses at least two properties: - A snapshot is considered stable. Whenever you read a block you get the same contents. Allowing old snapshots to exist but not running copy-on-write means these blocks will change their contents. - A snapshot will fsck clean. It is impossible to change fsck_ffs to check a snapshot as these old snapshots indirect blocks now will contain garbage. You cannot copy blocks before suspension without rewriting them once the file system is suspended. The check in ffs_copyonwrite() will only work as long as the old snapshot exists. As sson as it gets removed we will run COW on the blocks used by the old snapshot. > With these changes the times are much more reasonable: > /usr/bin/time fssconfig fss0 /home /home/snaps/snap0 > 299.68 real 0.00 user 1.10 sys > /home: suspended 0.310 sec, redo 0 of 2556 > /usr/bin/time fssconfig fss1 /home /home/snaps/snap1 > 188.10 real 0.00 user 0.86 sys > /home: suspended 0.270 sec, redo 0 of 2556 > /usr/bin/time fssconfig fss2 /home /home/snaps/snap2 > 169.78 real 0.00 user 0.95 sys > /home: suspended 0.450 sec, redo 0 of 2556 > /usr/bin/time fssconfig fss3 /home /home/snaps/snap3 > 172.39 real 0.00 user 0.99 sys > /home: suspended 0.300 sec, redo 0 of 2556 > > This seems to work; one issue with this patch is that the block > count for the snapshot inode, and block summary informations (the > second being probably a consequence of the first) appear wrong when > running fsck against a snapshot. I believe this is fixable, but > I've not yet found from where the information mismatch is coming from. > > comments ? > > PS: I'm away from computers for one week, so don't expect replies to > your comments before next sunday. > > -- > Manuel Bouyer <[email protected]> > NetBSD: 26 ans d'experience feront toujours la difference > -- -- Juergen Hannken-Illjes - [email protected] - TU Braunschweig (Germany)
