Re: Honest timeline for btrfsck
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/2011 10:50 PM, Chris Mason wrote: > On Thu, Oct 06, 2011 at 10:31:41AM -0500, Jeff Putney wrote: >>> No, in this case it means we're confident it will get rolled >>> out. >> >> On Aug 18th confidence was high enough to declare a possible >> release that very day. This confidence turned into 7 weeks of >> silence followed by another 2 week estimate. >> >> These confident declarations are why things like mniederle's >> btrfs_rescue are considered 'interim' and not worth building on. >> Had this confidence of imminent release not been the prevalent >> message for the last year, others would have stepped in to fill >> the void. >> >>> I've given a number of hard dates recently and I'd prefer to >>> show up with the code instead. I don't think it makes sense to >>> put a partial implementation out there, we'll just have a bunch >>> of people reporting problems that I know exist. >>> >>> -chris >>> >> >> This strategy of 'Lone Wolfing it' has delayed the release by a >> year. Either you are flying solo because you think that you can >> make more meaningful progress without the involvement of the >> btrfs community, or you are willing to forfeit the contributions >> of the community in order to not have to listen to any >> complaints. >> >> The other problem of this flying solo plan, is that you are >> making the assumption that the problems you know about are more >> significant than the problems you are unaware of and could be >> flushed out with more eyes on the code. The longer you delay the >> release of the source, the longer it will be until confidence can >> be generated that major issues have been resolved. >> >> http://en.wikipedia.org/wiki/Release_early,_release_often > > [ Thanks for everyone's comments! ] > > Keep in mind that btrfs was released and ran for a long time while > intentionally crashing when we ran out of space. This was a > really important part of our development because we attracted a > huge number of contributors, and some very brave users. > > For fsck, even the stuff I have here does have a way to go before > it is at the level of an e2fsck or xfs_repair. But I do want to > make sure that I'm surprised by any bugs before I send it out, and > that's just not the case today. The release has been delayed > because I've alternated between a few different ways of repairing, > and because I got distracted by some important features in the > kernel. Yes. The single biggest rule of file system recovery tools is that you never leave the file system more broken than when you found it. Beta testing fsck, when the author him/herself isn't comfortable releasing the code, is insane when you have data you care about. If you disagree, I'll hit the pause button until you learn some very hard lessons. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6Og+4ACgkQLPWxlyuTD7KcywCeNmC9N5pwuHaLu1++YhoSQYWC +Y0An0wgtv3dxsH6ZZCdPy2JihJWOe14 =g/pv -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
On Thu, 6 Oct 2011 23:20:45 + (UTC) Yalonda Gishtaka wrote: > and tarnishing Oracle's name. Thank you sir you just made my day. -- With respect, Roman signature.asc Description: PGP signature
Re: Honest timeline for btrfsck
On Thu, Oct 06, 2011 at 10:31:41AM -0500, Jeff Putney wrote: > > No, in this case it means we're confident it will get rolled out. > > On Aug 18th confidence was high enough to declare a possible release > that very day. This confidence turned into 7 weeks of silence > followed by another 2 week estimate. > > These confident declarations are why things like mniederle's > btrfs_rescue are considered 'interim' and not worth building on. Had > this confidence of imminent release not been the prevalent message for > the last year, others would have stepped in to fill the void. > > > I've given a number of hard dates recently and I'd prefer to show up > > with the code instead. I don't think it makes sense to put a partial > > implementation out there, we'll just have a bunch of people reporting > > problems that I know exist. > > > > -chris > > > > This strategy of 'Lone Wolfing it' has delayed the release by a year. > Either you are flying solo because you think that you can make more > meaningful progress without the involvement of the btrfs community, or > you are willing to forfeit the contributions of the community in order > to not have to listen to any complaints. > > The other problem of this flying solo plan, is that you are making the > assumption that the problems you know about are more significant than > the problems you are unaware of and could be flushed out with more > eyes on the code. The longer you delay the release of the source, the > longer it will be until confidence can be generated that major issues > have been resolved. > > http://en.wikipedia.org/wiki/Release_early,_release_often [ Thanks for everyone's comments! ] Keep in mind that btrfs was released and ran for a long time while intentionally crashing when we ran out of space. This was a really important part of our development because we attracted a huge number of contributors, and some very brave users. For fsck, even the stuff I have here does have a way to go before it is at the level of an e2fsck or xfs_repair. But I do want to make sure that I'm surprised by any bugs before I send it out, and that's just not the case today. The release has been delayed because I've alternated between a few different ways of repairing, and because I got distracted by some important features in the kernel. That's how software goes sometimes, and I'll take the criticism because it hasn't gone as well as it should have. But, I can't stress enough how much I appreciate everyone's contributions and interest in btrfs. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
On Thu, Oct 6, 2011 at 10:31 AM, Jeff Putney wrote: >> No, in this case it means we're confident it will get rolled out. > > On Aug 18th confidence was high enough to declare a possible release > that very day. This confidence turned into 7 weeks of silence > followed by another 2 week estimate. > > These confident declarations are why things like mniederle's > btrfs_rescue are considered 'interim' and not worth building on. Had > this confidence of imminent release not been the prevalent message for > the last year, others would have stepped in to fill the void. > >> I've given a number of hard dates recently and I'd prefer to show up >> with the code instead. I don't think it makes sense to put a partial >> implementation out there, we'll just have a bunch of people reporting >> problems that I know exist. >> >> -chris >> > > This strategy of 'Lone Wolfing it' has delayed the release by a year. > Either you are flying solo because you think that you can make more > meaningful progress without the involvement of the btrfs community, or > you are willing to forfeit the contributions of the community in order > to not have to listen to any complaints. > > The other problem of this flying solo plan, is that you are making the > assumption that the problems you know about are more significant than > the problems you are unaware of and could be flushed out with more > eyes on the code. The longer you delay the release of the source, the > longer it will be until confidence can be generated that major issues > have been resolved. > > http://en.wikipedia.org/wiki/Release_early,_release_often > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > The problem with this is that people naturally look for a fsck tool when something bad goes wrong. Something as important as a fsck utility shouldn't be released (unofficially or officially) half baked. It can irreparably destroy a filesystem which could've otherwise been repaired with a fully functional fsck. I think Chris is trying to prevent that from happening. Perhaps Chris can set up a private developer repo and ask for help from redhat, fujitsu, etc..? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
On 07/10/11 10:20, Yalonda Gishtaka wrote: > Couldn't have put it better. It's really time for Chris Mason > to stop disgracing the open source community and tarnishing > Oracle's name. Oh come on - he's working *for* Oracle to do this and we are getting the benefits for free. We can hardly complain when he's trying to deal with LKML, doing btrfs devel for Oracle and having a life as well (i.e. his recent vacation). I've known too many people burn out in IT due to overcommitment and I don't want to see that happen to Chris. If you wish to direct his priorities then I suggest that you should be paying Oracle to do so, or else attempt to employ him yourself. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
Jeff Putney gmail.com> writes: > This strategy of 'Lone Wolfing it' has delayed the release by a year. > Either you are flying solo because you think that you can make more > meaningful progress without the involvement of the btrfs community, or > you are willing to forfeit the contributions of the community in order > to not have to listen to any complaints. > Couldn't have put it better. It's really time for Chris Mason to stop disgracing the open source community and tarnishing Oracle's name. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v0 07/18] btrfs: generic data structure to build unique lists
On Thu, Oct 06, 2011 at 01:33:00PM -0700, Andi Kleen wrote: > Arne Jansen writes: > > > ulist is a generic data structures to hold a collection of unique u64 > > values. The only operations it supports is adding to the list and > > enumerating it. > > It is possible to store an auxiliary value along with the key. > > The implementation is preliminary and can probably be sped up > > significantly. > > It is used by subvolume quota to translate recursions into iterative > > loops. > > Hmm, sounds like a job for lib/idr.c > > What do your ulists do that idr doesn't? Arne's ulists keep full u64 values, IDR are int based. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
On 10/06/2011 11:31 AM, Jeff Putney wrote: > http://en.wikipedia.org/wiki/Release_early,_release_often I can appreciate both Jeff's and Andi's positions on this issue. I do wonder why the fsck isn't publicly available as it is as a non-release version, just so people can begin getting their eyes on it and make contributions. I think that would really help with getting a higher quality product in less time, which is a good goal to attempt to achieve. I've only played with btrfs at this point, and I'm mostly waiting for an fsck tool to exist (in a mature form) before using this fine filesystem on any of my systems, so I am interested in seeing the fsck reach maturity as I am very excited about all the features that btrfs offers. That said, I also think that we ought not to complain to Chris when he is doing work that will benefit us all, without any cost to us. We may prefer that he take a different approach in developing this tool, but in the end he is serving us and we ought not to look a gift horse in the mouth, as they say. Chris, I respectfully request that the code you have be placed into a public repository. It is your choice of course, but I believe it would be a good thing for btrfs. However and whenever it is delivered to the community, I am confident that btrfs will be ready for production use very soon. Thanks to you and all the devs for working so hard to bring Linux into the future of filesystems! -- R signature.asc Description: OpenPGP digital signature
Re: Honest timeline for btrfsck
2011/10/6 Andi Kleen : > Jeff Putney writes: >> >> http://en.wikipedia.org/wiki/Release_early,_release_often > > Well the other principle in free software you're forgetting > is: > > "It will be released when it's ready" > > If you don't like Chris' ways to do releases you're free to write > something on your own or pay someone to do so. Otherwise > you just have to deal with his time frames, as shifty > as they may be. I did a different thing, I've offered Chris money to help rescue an hosed btrfs or to point to someone who could do, we ended in doing some tests (for free) but nothing else materialized. While the time passed has diminished the value of the data to be rescued I'm more on the "show us some code we can start from" than "it will be released when ready" vagon. Francesco R. > > -Andi > -- > a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v0 03/18] btrfs: add nested locking mode for paths
On Fri, Oct 07, 2011 at 12:44:30AM +0400, Andrey Kuzmin wrote: > Perhaps you could just elaborate on "needs this feature"? In general, write > lock gives one exclusive access, so the need for additional read > (non-exclusive) lock does not appear easily understandable. Usually it's because the low level code can be called both with and without locking and it doesn't know. But that usually can be avoided with some restructuring. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v0 17/18] btrfs: add qgroup ioctls
Arne Jansen writes: > + > + if (copy_to_user(arg, sa, sizeof(*sa))) > + ret = -EFAULT; > + > + if (trans) { > + err = btrfs_commit_transaction(trans, root); > + if (err && !ret) > + ret = err; > + } It would seem safer to put the copy to user outside the transaction. A cto can in principle cause new writes (e.g. if it causes COW), so you may end up with nested transactions. Even if that works somehow (not sure) it seems to be a thing better avoided. > + > + sa = memdup_user(arg, sizeof(*sa)); > + if (IS_ERR(sa)) > + return PTR_ERR(sa); > + > + trans = btrfs_join_transaction(root); > + if (IS_ERR(trans)) { > + ret = PTR_ERR(trans); > + goto out; > + } This code seems to be duplicated a lot. Can it be consolidated? -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v0 03/18] btrfs: add nested locking mode for paths
Arne Jansen writes: > This patch adds the possibilty to read-lock an extent > even if it is already write-locked from the same thread. > Subvolume quota needs this capability. Recursive locking is generally strongly discouraged, it causes all kinds of problems and tends to eventuall ylead to locking hierarchies nobody can understand anymore. If you can find any other way to solve this problem I would encourage you to do so. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/06/2011 04:30 PM, Andi Kleen wrote: > Jeff Putney writes: >> >> http://en.wikipedia.org/wiki/Release_early,_release_often > > Well the other principle in free software you're forgetting is: > > "It will be released when it's ready" > > If you don't like Chris' ways to do releases you're free to write > something on your own or pay someone to do so. Otherwise you just > have to deal with his time frames, as shifty as they may be. Thanks, I was about to say the same thing. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6OEK8ACgkQLPWxlyuTD7JgKwCfYqyslTkbq/sYUz/rcXj4M1lf mTAAoIbIKNIZlyVFZjzrCH/ss9W3UQuh =6vdd -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v0 07/18] btrfs: generic data structure to build unique lists
Arne Jansen writes: > ulist is a generic data structures to hold a collection of unique u64 > values. The only operations it supports is adding to the list and > enumerating it. > It is possible to store an auxiliary value along with the key. > The implementation is preliminary and can probably be sped up > significantly. > It is used by subvolume quota to translate recursions into iterative > loops. Hmm, sounds like a job for lib/idr.c What do your ulists do that idr doesn't? Ok idr doesn't have merge, but that should be simple enough to add. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
Jeff Putney writes: > > http://en.wikipedia.org/wiki/Release_early,_release_often Well the other principle in free software you're forgetting is: "It will be released when it's ready" If you don't like Chris' ways to do releases you're free to write something on your own or pay someone to do so. Otherwise you just have to deal with his time frames, as shifty as they may be. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v0 14/18] btrfs: quota tree support and startup
Init the quota tree along with the others on open_ctree and close_ctree. Add the quota tree to the list of well known trees in btrfs_read_fs_root_no_name. Signed-off-by: Arne Jansen --- fs/btrfs/disk-io.c | 47 +-- 1 files changed, 41 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index cb25017..06576ed 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1391,6 +1391,9 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct btrfs_fs_info *fs_info, return fs_info->dev_root; if (location->objectid == BTRFS_CSUM_TREE_OBJECTID) return fs_info->csum_root; + if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID) + return fs_info->quota_root ? fs_info->quota_root : +ERR_PTR(-ENOENT); again: spin_lock(&fs_info->fs_roots_radix_lock); root = radix_tree_lookup(&fs_info->fs_roots_radix, @@ -1676,6 +1679,8 @@ struct btrfs_root *open_ctree(struct super_block *sb, GFP_NOFS); struct btrfs_root *dev_root = kzalloc(sizeof(struct btrfs_root), GFP_NOFS); + struct btrfs_root *quota_root = kzalloc(sizeof(struct btrfs_root), + GFP_NOFS); struct btrfs_root *log_tree_root; int ret; @@ -1684,7 +1689,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, struct btrfs_super_block *disk_super; if (!extent_root || !tree_root || !tree_root->fs_info || - !chunk_root || !dev_root || !csum_root) { + !chunk_root || !dev_root || !csum_root || !quota_root) { err = -ENOMEM; goto fail; } @@ -2078,6 +2083,18 @@ struct btrfs_root *open_ctree(struct super_block *sb, csum_root->track_dirty = 1; + ret = find_and_setup_root(tree_root, fs_info, + BTRFS_QUOTA_TREE_OBJECTID, quota_root); + if (ret) { + kfree(quota_root); + quota_root = NULL; + } else { + quota_root->track_dirty = 1; + fs_info->quota_enabled = 1; + fs_info->pending_quota_state = 1; + } + fs_info->quota_root = quota_root; + fs_info->generation = generation; fs_info->last_trans_committed = generation; fs_info->data_alloc_profile = (u64)-1; @@ -2115,6 +2132,10 @@ struct btrfs_root *open_ctree(struct super_block *sb, btrfs_set_opt(fs_info->mount_opt, SSD); } + ret = btrfs_read_qgroup_config(fs_info); + if (ret) + goto fail_trans_kthread; + /* do not make disk changes in broken FS */ if (btrfs_super_log_root(disk_super) != 0 && !(fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR)) { @@ -2124,7 +2145,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, printk(KERN_WARNING "Btrfs log replay required " "on RO media\n"); err = -EIO; - goto fail_trans_kthread; + goto fail_qgroup; } blocksize = btrfs_level_size(tree_root, @@ -2133,7 +2154,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, log_tree_root = kzalloc(sizeof(struct btrfs_root), GFP_NOFS); if (!log_tree_root) { err = -ENOMEM; - goto fail_trans_kthread; + goto fail_qgroup; } __setup_root(nodesize, leafsize, sectorsize, stripesize, @@ -2163,7 +2184,7 @@ struct btrfs_root *open_ctree(struct super_block *sb, printk(KERN_WARNING "btrfs: failed to recover relocation\n"); err = -EINVAL; - goto fail_trans_kthread; + goto fail_qgroup; } } @@ -2173,10 +2194,10 @@ struct btrfs_root *open_ctree(struct super_block *sb, fs_info->fs_root = btrfs_read_fs_root_no_name(fs_info, &location); if (!fs_info->fs_root) - goto fail_trans_kthread; + goto fail_qgroup; if (IS_ERR(fs_info->fs_root)) { err = PTR_ERR(fs_info->fs_root); - goto fail_trans_kthread; + goto fail_qgroup; } if (!(sb->s_flags & MS_RDONLY)) { @@ -2193,6 +2214,8 @@ struct btrfs_root *open_ctree(struct super_block *sb, return tree_root; +fail_qgroup: + btrfs_free_qgroup_config(fs_info); fail_trans_kthread: kthread_stop(fs_info->transaction_kthread); fail_cleaner: @@ -2209,6 +2232,10 @@ fail_block_groups: btrfs_free_block_groups(fs_info); free_extent_buffer(csum_root->node);
[PATCH v0 16/18] btrfs: hooks to reserve qgroup space
Like block reserves, reserve a small piece of space on each transaction start and for delalloc. These are the hooks that can actually return EDQUOT to the user. The amount of space reserved is tracked in the transaction handle. Signed-off-by: Arne Jansen --- fs/btrfs/extent-tree.c | 13 + fs/btrfs/transaction.c | 16 fs/btrfs/transaction.h |1 + 3 files changed, 30 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7fb9650..a2400c4 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4093,6 +4093,14 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) spin_unlock(&BTRFS_I(inode)->lock); to_reserve += calc_csum_metadata_size(inode, num_bytes); + + if (root->fs_info->quota_enabled) { + ret = btrfs_qgroup_reserve(root, num_bytes + + nr_extents * root->leafsize); + if (ret) + return ret; + } + ret = reserve_metadata_bytes(NULL, root, block_rsv, to_reserve, 1); if (ret) { unsigned dropped; @@ -4123,6 +4131,11 @@ void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes) if (dropped > 0) to_free += btrfs_calc_trans_metadata_size(root, dropped); + if (root->fs_info->quota_enabled) { + btrfs_qgroup_free(root, num_bytes + + dropped * root->leafsize); + } + btrfs_block_rsv_release(root, &root->fs_info->delalloc_block_rsv, to_free); } diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 1ae856e..a8b7668 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -260,6 +260,7 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, struct btrfs_transaction *cur_trans; u64 num_bytes = 0; int ret; + u64 qgroup_reserved = 0; if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR) return ERR_PTR(-EROFS); @@ -278,6 +279,14 @@ static struct btrfs_trans_handle *start_transaction(struct btrfs_root *root, * the appropriate flushing if need be. */ if (num_items > 0 && root != root->fs_info->chunk_root) { + if (root->fs_info->quota_enabled && + is_fstree(root->root_key.objectid)) { + qgroup_reserved = num_items * root->leafsize; + ret = btrfs_qgroup_reserve(root, qgroup_reserved); + if (ret) + return ERR_PTR(ret); + } + num_bytes = btrfs_calc_trans_metadata_size(root, num_items); ret = btrfs_block_rsv_add(NULL, root, &root->fs_info->trans_block_rsv, @@ -315,6 +324,7 @@ again: h->use_count = 1; h->block_rsv = NULL; h->orig_rsv = NULL; + h->qgroup_reserved = qgroup_reserved; smp_mb(); if (cur_trans->blocked && may_wait_transaction(root, type)) { @@ -463,6 +473,12 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, * end_transaction. Subvolume quota depends on this. */ WARN_ON(trans->root != root); + + if (trans->qgroup_reserved) { + btrfs_qgroup_free(root, trans->qgroup_reserved); + trans->qgroup_reserved = 0; + } + while (count < 4) { unsigned long cur = trans->delayed_ref_updates; trans->delayed_ref_updates = 0; diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index b120126..5f5d216 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -48,6 +48,7 @@ struct btrfs_transaction { struct btrfs_trans_handle { u64 transid; u64 bytes_reserved; + u64 qgroup_reserved; unsigned long use_count; unsigned long blocks_reserved; unsigned long blocks_used; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v0 17/18] btrfs: add qgroup ioctls
Ioctls to control the qgroup feature like adding and removing qgroups and assigning qgroups. Signed-off-by: Arne Jansen --- fs/btrfs/ioctl.c | 185 ++ fs/btrfs/ioctl.h | 27 2 files changed, 212 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index fade500..f46bc35 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2834,6 +2834,183 @@ static long btrfs_ioctl_scrub_progress(struct btrfs_root *root, return ret; } +static long btrfs_ioctl_quota_ctl(struct btrfs_root *root, void __user *arg) +{ + struct btrfs_ioctl_quota_ctl_args *sa; + struct btrfs_trans_handle *trans = NULL; + int ret; + int err; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (root->fs_info->sb->s_flags & MS_RDONLY) + return -EROFS; + + sa = memdup_user(arg, sizeof(*sa)); + if (IS_ERR(sa)) + return PTR_ERR(sa); + + if (sa->cmd != BTRFS_QUOTA_CTL_RESCAN) { + trans = btrfs_start_transaction(root, 2); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + } + + switch (sa->cmd) { + case BTRFS_QUOTA_CTL_ENABLE: + ret = btrfs_quota_enable(trans, root->fs_info); + break; + case BTRFS_QUOTA_CTL_DISABLE: + ret = btrfs_quota_disable(trans, root->fs_info); + break; + case BTRFS_QUOTA_CTL_RESCAN: + ret = btrfs_quota_rescan(root->fs_info); + break; + default: + ret = -EINVAL; + break; + } + + if (copy_to_user(arg, sa, sizeof(*sa))) + ret = -EFAULT; + + if (trans) { + err = btrfs_commit_transaction(trans, root); + if (err && !ret) + ret = err; + } + +out: + kfree(sa); + return ret; +} + +static long btrfs_ioctl_qgroup_assign(struct btrfs_root *root, void __user *arg) +{ + struct btrfs_ioctl_qgroup_assign_args *sa; + struct btrfs_trans_handle *trans; + int ret; + int err; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (root->fs_info->sb->s_flags & MS_RDONLY) + return -EROFS; + + sa = memdup_user(arg, sizeof(*sa)); + if (IS_ERR(sa)) + return PTR_ERR(sa); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + + /* FIXME: check if the IDs really exist */ + if (sa->assign) { + ret = btrfs_add_qgroup_relation(trans, root->fs_info, + sa->src, sa->dst); + } else { + ret = btrfs_del_qgroup_relation(trans, root->fs_info, + sa->src, sa->dst); + } + + err = btrfs_end_transaction(trans, root); + if (err && !ret) + ret = err; + +out: + kfree(sa); + return ret; +} + +static long btrfs_ioctl_qgroup_create(struct btrfs_root *root, void __user *arg) +{ + struct btrfs_ioctl_qgroup_create_args *sa; + struct btrfs_trans_handle *trans; + int ret; + int err; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (root->fs_info->sb->s_flags & MS_RDONLY) + return -EROFS; + + sa = memdup_user(arg, sizeof(*sa)); + if (IS_ERR(sa)) + return PTR_ERR(sa); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + + /* FIXME: check if the IDs really exist */ + if (sa->create) { + ret = btrfs_create_qgroup(trans, root->fs_info, sa->qgroupid, + NULL); + } else { + ret = btrfs_remove_qgroup(trans, root->fs_info, sa->qgroupid); + } + + err = btrfs_end_transaction(trans, root); + if (err && !ret) + ret = err; + +out: + kfree(sa); + return ret; +} + +static long btrfs_ioctl_qgroup_limit(struct btrfs_root *root, void __user *arg) +{ + struct btrfs_ioctl_qgroup_limit_args *sa; + struct btrfs_trans_handle *trans; + int ret; + int err; + u64 qgroupid; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (root->fs_info->sb->s_flags & MS_RDONLY) + return -EROFS; + + sa = memdup_user(arg, sizeof(*sa)); + if (IS_ERR(sa)) + return PTR_ERR(sa); + + trans = btrfs_join_transaction(root); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + goto out; + } + + qgroupid = sa->qgroupid; + if (!qgroupid) { +
[PATCH v0 15/18] btrfs: hooks for qgroup to record delayed refs
Hooks into qgroup code to record refs and into transaction commit. This is the main entry point for qgroup. Basically every change in extent backrefs got accounted to the appropriate qgroups. Signed-off-by: Arne Jansen --- fs/btrfs/delayed-ref.c | 24 ++-- fs/btrfs/transaction.c |7 +++ 2 files changed, 25 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index d6f934f..bd74b7a 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -442,11 +442,12 @@ update_existing_head_ref(struct btrfs_delayed_ref_node *existing, */ static noinline int add_delayed_ref_head(struct btrfs_fs_info *fs_info, struct btrfs_trans_handle *trans, - struct btrfs_delayed_ref_node *ref, + struct btrfs_delayed_ref_node **pref, u64 bytenr, u64 num_bytes, int action, int is_data) { struct btrfs_delayed_ref_node *existing; + struct btrfs_delayed_ref_node *ref = *pref; struct btrfs_delayed_ref_head *head_ref = NULL; struct btrfs_delayed_ref_root *delayed_refs; int count_mod = 1; @@ -503,6 +504,7 @@ static noinline int add_delayed_ref_head(struct btrfs_fs_info *fs_info, if (existing) { update_existing_head_ref(existing, ref); + *pref = existing; /* * we've updated the existing ref, free the newly * allocated ref @@ -654,6 +656,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info, { struct btrfs_delayed_tree_ref *ref; struct btrfs_delayed_ref_head *head_ref; + struct btrfs_delayed_ref_node *node; struct btrfs_delayed_ref_root *delayed_refs; int ret; struct seq_list seq_elem; @@ -678,7 +681,8 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info, * insert both the head node and the new ref without dropping * the spin lock */ - ret = add_delayed_ref_head(fs_info, trans, &head_ref->node, bytenr, + node = &head_ref->node; + ret = add_delayed_ref_head(fs_info, trans, &node, bytenr, num_bytes, action, 0); BUG_ON(ret); @@ -687,8 +691,10 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info, for_cow, &seq_elem); BUG_ON(ret); spin_unlock(&delayed_refs->lock); - if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root)) + if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root)) { + btrfs_qgroup_record_ref(trans, fs_info, &ref->node, extent_op); put_delayed_seq(delayed_refs, &seq_elem); + } return 0; } @@ -706,6 +712,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, { struct btrfs_delayed_data_ref *ref; struct btrfs_delayed_ref_head *head_ref; + struct btrfs_delayed_ref_node *node; struct btrfs_delayed_ref_root *delayed_refs; int ret; struct seq_list seq_elem; @@ -730,7 +737,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, * insert both the head node and the new ref without dropping * the spin lock */ - ret = add_delayed_ref_head(fs_info, trans, &head_ref->node, bytenr, + node = &head_ref->node; + ret = add_delayed_ref_head(fs_info, trans, &node, bytenr, num_bytes, action, 1); BUG_ON(ret); @@ -739,8 +747,10 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info, action, for_cow, &seq_elem); BUG_ON(ret); spin_unlock(&delayed_refs->lock); - if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root)) + if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root)) { + btrfs_qgroup_record_ref(trans, fs_info, &ref->node, extent_op); put_delayed_seq(delayed_refs, &seq_elem); + } return 0; } @@ -751,6 +761,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info, struct btrfs_delayed_extent_op *extent_op) { struct btrfs_delayed_ref_head *head_ref; + struct btrfs_delayed_ref_node *node; struct btrfs_delayed_ref_root *delayed_refs; int ret; @@ -763,7 +774,8 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info, delayed_refs = &trans->transaction->delayed_refs; spin_lock(&delayed_refs->lock); - ret = add_delayed_ref_head(fs_info, trans, &head_ref->node, bytenr, + node = &head_ref->node; + ret = add_delayed_ref_head(fs_info, trans, &node, bytenr, num_bytes, BTRFS_UPDATE_DELAYED_HEAD,
[PATCH v0 02/18] btrfs: always save ref_root in delayed refs
For qgroup calculation the information to which root a delayed ref belongs is useful even for shared refs. Signed-off-by: Arne Jansen --- fs/btrfs/delayed-ref.c | 18 -- fs/btrfs/delayed-ref.h | 12 2 files changed, 12 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 3a0f0ab..babd37b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -495,13 +495,12 @@ static noinline int add_delayed_tree_ref(struct btrfs_fs_info *fs_info, ref->in_tree = 1; full_ref = btrfs_delayed_node_to_tree_ref(ref); - if (parent) { - full_ref->parent = parent; + full_ref->parent = parent; + full_ref->root = ref_root; + if (parent) ref->type = BTRFS_SHARED_BLOCK_REF_KEY; - } else { - full_ref->root = ref_root; + else ref->type = BTRFS_TREE_BLOCK_REF_KEY; - } full_ref->level = level; trace_btrfs_delayed_tree_ref(ref, full_ref, action); @@ -551,13 +550,12 @@ static noinline int add_delayed_data_ref(struct btrfs_fs_info *fs_info, ref->in_tree = 1; full_ref = btrfs_delayed_node_to_data_ref(ref); - if (parent) { - full_ref->parent = parent; + full_ref->parent = parent; + full_ref->root = ref_root; + if (parent) ref->type = BTRFS_SHARED_DATA_REF_KEY; - } else { - full_ref->root = ref_root; + else ref->type = BTRFS_EXTENT_DATA_REF_KEY; - } full_ref->objectid = owner; full_ref->offset = offset; diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 8316bff..a5fb2bc 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -98,19 +98,15 @@ struct btrfs_delayed_ref_head { struct btrfs_delayed_tree_ref { struct btrfs_delayed_ref_node node; - union { - u64 root; - u64 parent; - }; + u64 root; + u64 parent; int level; }; struct btrfs_delayed_data_ref { struct btrfs_delayed_ref_node node; - union { - u64 root; - u64 parent; - }; + u64 root; + u64 parent; u64 objectid; u64 offset; }; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v0 03/18] btrfs: add nested locking mode for paths
This patch adds the possibilty to read-lock an extent even if it is already write-locked from the same thread. Subvolume quota needs this capability. Signed-off-by: Arne Jansen --- fs/btrfs/ctree.c | 22 fs/btrfs/ctree.h |1 + fs/btrfs/extent_io.c |1 + fs/btrfs/extent_io.h |2 + fs/btrfs/locking.c | 51 +++-- fs/btrfs/locking.h |2 +- 6 files changed, 66 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 51b387b..964ac9a 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -186,13 +186,14 @@ struct extent_buffer *btrfs_lock_root_node(struct btrfs_root *root) * tree until you end up with a lock on the root. A locked buffer * is returned, with a reference held. */ -struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root) +struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root, + int nested) { struct extent_buffer *eb; while (1) { eb = btrfs_root_node(root); - btrfs_tree_read_lock(eb); + btrfs_tree_read_lock(eb, nested); if (eb == root->node) break; btrfs_tree_read_unlock(eb); @@ -1620,6 +1621,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root /* everything at write_lock_level or lower must be write locked */ int write_lock_level = 0; u8 lowest_level = 0; + int nested = p->nested; lowest_level = p->lowest_level; WARN_ON(lowest_level && ins_len > 0); @@ -1661,8 +1663,9 @@ again: b = root->commit_root; extent_buffer_get(b); level = btrfs_header_level(b); + BUG_ON(p->skip_locking && nested); if (!p->skip_locking) - btrfs_tree_read_lock(b); + btrfs_tree_read_lock(b, 0); } else { if (p->skip_locking) { b = btrfs_root_node(root); @@ -1671,7 +1674,7 @@ again: /* we don't know the level of the root node * until we actually have it read locked */ - b = btrfs_read_lock_root_node(root); + b = btrfs_read_lock_root_node(root, nested); level = btrfs_header_level(b); if (level <= write_lock_level) { /* whoops, must trade for write lock */ @@ -1810,7 +1813,8 @@ cow_done: err = btrfs_try_tree_read_lock(b); if (!err) { btrfs_set_path_blocking(p); - btrfs_tree_read_lock(b); + btrfs_tree_read_lock(b, +nested); btrfs_clear_path_blocking(p, b, BTRFS_READ_LOCK); } @@ -3955,7 +3959,7 @@ int btrfs_search_forward(struct btrfs_root *root, struct btrfs_key *min_key, WARN_ON(!path->keep_locks); again: - cur = btrfs_read_lock_root_node(root); + cur = btrfs_read_lock_root_node(root, 0); level = btrfs_header_level(cur); WARN_ON(path->nodes[level]); path->nodes[level] = cur; @@ -4049,7 +4053,7 @@ find_next_key: cur = read_node_slot(root, cur, slot); BUG_ON(!cur); - btrfs_tree_read_lock(cur); + btrfs_tree_read_lock(cur, 0); path->locks[level - 1] = BTRFS_READ_LOCK; path->nodes[level - 1] = cur; @@ -4243,7 +4247,7 @@ again: ret = btrfs_try_tree_read_lock(next); if (!ret) { btrfs_set_path_blocking(path); - btrfs_tree_read_lock(next); + btrfs_tree_read_lock(next, 0); btrfs_clear_path_blocking(path, next, BTRFS_READ_LOCK); } @@ -4280,7 +4284,7 @@ again: ret = btrfs_try_tree_read_lock(next); if (!ret) { btrfs_set_path_blocking(path); - btrfs_tree_read_lock(next); + btrfs_tree_read_lock(next, 0); btrfs_clear_path_blocking(path, next, BTRFS_READ_LOCK); } diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree
[PATCH v0 09/18] btrfs: qgroup state and initialization
Add state to fs_info. Signed-off-by: Arne Jansen --- fs/btrfs/ctree.h | 32 fs/btrfs/disk-io.c |7 +++ 2 files changed, 39 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 09c58e5..49f97d8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -958,6 +958,7 @@ struct btrfs_fs_info { struct btrfs_root *dev_root; struct btrfs_root *fs_root; struct btrfs_root *csum_root; + struct btrfs_root *quota_root; /* the log root tree is a directory of all the other log roots */ struct btrfs_root *log_root_tree; @@ -1185,6 +1186,30 @@ struct btrfs_fs_info { int scrub_workers_refcnt; struct btrfs_workers scrub_workers; + /* +* quota information +*/ + unsigned int quota_enabled:1; + + /* +* quota_enabled only changes state after a commit. This holds the +* next state. +*/ + unsigned int pending_quota_state:1; + + /* is qgroup tracking in a consistent state? */ + u64 qgroup_flags; + + /* holds configuration and tracking. Protected by qgroup_lock */ + struct rb_root qgroup_tree; + spinlock_t qgroup_lock; + + /* list of dirty qgroups to be written at next commit */ + struct list_head dirty_qgroups; + + /* used by btrfs_qgroup_record_ref for an efficient tree traversal */ + u64 qgroup_seq; + /* filesystem state */ u64 fs_state; @@ -2845,4 +2870,11 @@ int btrfs_scrub_cancel_devid(struct btrfs_root *root, u64 devid); int btrfs_scrub_progress(struct btrfs_root *root, u64 devid, struct btrfs_scrub_progress *progress); +static inline int is_fstree(u64 rootid) +{ + if (rootid == BTRFS_FS_TREE_OBJECTID || + (s64)rootid >= (s64)BTRFS_FIRST_FREE_OBJECTID) + return 1; + return 0; +} #endif diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 672747d..cb25017 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1825,6 +1825,13 @@ struct btrfs_root *open_ctree(struct super_block *sb, init_rwsem(&fs_info->cleanup_work_sem); init_rwsem(&fs_info->subvol_sem); + spin_lock_init(&fs_info->qgroup_lock); + fs_info->qgroup_tree = RB_ROOT; + INIT_LIST_HEAD(&fs_info->dirty_qgroups); + fs_info->qgroup_seq = 1; + fs_info->quota_enabled = 0; + fs_info->pending_quota_state = 0; + btrfs_init_free_cluster(&fs_info->meta_alloc_cluster); btrfs_init_free_cluster(&fs_info->data_alloc_cluster); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v0 01/18] btrfs: mark delayed refs as for cow
Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value from every call site. The for_cow parameter will later on be used to determine if a ref will change anything with respect to qgroups. Delayed refs coming from relocation are always counted as for_cow, as they don't change subvol quota. Also pass in the fs_info for later use. Signed-off-by: Arne Jansen --- fs/btrfs/ctree.c | 20 +- fs/btrfs/ctree.h | 13 --- fs/btrfs/delayed-ref.c | 50 -- fs/btrfs/delayed-ref.h | 15 +--- fs/btrfs/extent-tree.c | 95 +--- fs/btrfs/file.c| 10 +++--- fs/btrfs/inode.c |2 +- fs/btrfs/ioctl.c |3 +- fs/btrfs/relocation.c | 18 + fs/btrfs/transaction.c |4 +- fs/btrfs/tree-log.c|2 +- 11 files changed, 136 insertions(+), 96 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 011cab3..51b387b 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -261,9 +261,9 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans, WARN_ON(btrfs_header_generation(buf) > trans->transid); if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID) - ret = btrfs_inc_ref(trans, root, cow, 1); + ret = btrfs_inc_ref(trans, root, cow, 1, 1); else - ret = btrfs_inc_ref(trans, root, cow, 0); + ret = btrfs_inc_ref(trans, root, cow, 0, 1); if (ret) return ret; @@ -350,14 +350,14 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, if ((owner == root->root_key.objectid || root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && !(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF)) { - ret = btrfs_inc_ref(trans, root, buf, 1); + ret = btrfs_inc_ref(trans, root, buf, 1, 1); BUG_ON(ret); if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) { - ret = btrfs_dec_ref(trans, root, buf, 0); + ret = btrfs_dec_ref(trans, root, buf, 0, 1); BUG_ON(ret); - ret = btrfs_inc_ref(trans, root, cow, 1); + ret = btrfs_inc_ref(trans, root, cow, 1, 1); BUG_ON(ret); } new_flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF; @@ -365,9 +365,9 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) - ret = btrfs_inc_ref(trans, root, cow, 1); + ret = btrfs_inc_ref(trans, root, cow, 1, 1); else - ret = btrfs_inc_ref(trans, root, cow, 0); + ret = btrfs_inc_ref(trans, root, cow, 0, 1); BUG_ON(ret); } if (new_flags != 0) { @@ -381,11 +381,11 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, if (flags & BTRFS_BLOCK_FLAG_FULL_BACKREF) { if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) - ret = btrfs_inc_ref(trans, root, cow, 1); + ret = btrfs_inc_ref(trans, root, cow, 1, 1); else - ret = btrfs_inc_ref(trans, root, cow, 0); + ret = btrfs_inc_ref(trans, root, cow, 0, 1); BUG_ON(ret); - ret = btrfs_dec_ref(trans, root, buf, 1); + ret = btrfs_dec_ref(trans, root, buf, 1, 1); BUG_ON(ret); } clean_tree_block(trans, root, buf); diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 03912c5..68f2315 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2183,17 +2183,17 @@ int btrfs_reserve_extent(struct btrfs_trans_handle *trans, u64 search_end, struct btrfs_key *ins, u64 data); int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, - struct extent_buffer *buf, int full_backref); + struct extent_buffer *buf, int full_backref, int for_cow); int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root, - struct extent_buffer *buf, int full_backref); + struct extent_buffer *buf, int full_backref, int for_cow); int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, struct btrfs_root *root,
[PATCH v0 11/18] btrfs: add sequence numbers to delayed refs
Sequence numbers are needed to reconstruct the backrefs of a given extent to a certain point in time. The total set of backrefs consist of the set of backrefs recorded on disk plus the enqueued delayed refs for it that existed at that moment. This patch also add a list that records all delayed refs that are currently in the process of being added. With qgroups enabled add a delayed ref involves walking the backrefs of the extent. During this time, no delayed ref that is newer must be processed. Signed-off-by: Arne Jansen --- fs/btrfs/delayed-ref.c | 71 +--- fs/btrfs/delayed-ref.h | 21 ++ fs/btrfs/transaction.c |4 +++ 3 files changed, 92 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index babd37b..2c8544b 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -23,6 +23,11 @@ #include "delayed-ref.h" #include "transaction.h" +struct seq_list { + struct list_head list; + u64 seq; +}; + /* * delayed back reference update tracking. For subvolume trees * we queue up extent allocations and backref maintenance for @@ -101,6 +106,11 @@ static int comp_entry(struct btrfs_delayed_ref_node *ref2, return -1; if (ref1->type > ref2->type) return 1; + /* with quota enable, merging of refs is not allowed */ + if (ref1->seq < ref2->seq) + return -1; + if (ref1->seq > ref2->seq) + return 1; if (ref1->type == BTRFS_TREE_BLOCK_REF_KEY || ref1->type == BTRFS_SHARED_BLOCK_REF_KEY) { return comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref2), @@ -209,6 +219,39 @@ int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans, return 0; } +static u64 get_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs, + struct seq_list *elem) +{ + assert_spin_locked(&delayed_refs->lock); + elem->seq = ++delayed_refs->seq; + list_add_tail(&elem->list, &delayed_refs->seq_head); + + return elem->seq; +} + +static void put_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs, + struct seq_list *elem) +{ + spin_lock(&delayed_refs->lock); + list_del(&elem->list); + spin_unlock(&delayed_refs->lock); +} + +int btrfs_check_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs, + u64 seq) +{ + struct seq_list *elem; + + assert_spin_locked(&delayed_refs->lock); + if (list_empty(&delayed_refs->seq_head)) + return 0; + + elem = list_first_entry(&delayed_refs->seq_head, struct seq_list, list); + if (seq >= elem->seq) + return 1; + return 0; +} + int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans, struct list_head *cluster, u64 start) { @@ -438,6 +481,7 @@ static noinline int add_delayed_ref_head(struct btrfs_fs_info *fs_info, ref->action = 0; ref->is_head = 1; ref->in_tree = 1; + ref->seq = 0; head_ref = btrfs_delayed_node_to_head(ref); head_ref->must_insert_reserved = must_insert_reserved; @@ -474,11 +518,12 @@ static noinline int add_delayed_tree_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_node *ref, u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root, int level, int action, -int for_cow) +int for_cow, struct seq_list *seq_elem) { struct btrfs_delayed_ref_node *existing; struct btrfs_delayed_tree_ref *full_ref; struct btrfs_delayed_ref_root *delayed_refs; + u64 seq = 0; if (action == BTRFS_ADD_DELAYED_EXTENT) action = BTRFS_ADD_DELAYED_REF; @@ -494,6 +539,10 @@ static noinline int add_delayed_tree_ref(struct btrfs_fs_info *fs_info, ref->is_head = 0; ref->in_tree = 1; + if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root)) + seq = get_delayed_seq(delayed_refs, seq_elem); + ref->seq = seq; + full_ref = btrfs_delayed_node_to_tree_ref(ref); full_ref->parent = parent; full_ref->root = ref_root; @@ -529,11 +578,13 @@ static noinline int add_delayed_data_ref(struct btrfs_fs_info *fs_info, struct btrfs_delayed_ref_node *ref, u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root, u64 owner, u64 offset, -int action, int for_cow) +int action, int for_cow, +struct seq_list *seq_elem) { struct btrfs_delayed_ref_node *exis
[PATCH v0 05/18] btrfs: add helper for tree enumeration
Often no exact match is wanted but just the next lower or higher item. There's a lot of duplicated code throughout btrfs to deal with the corner cases. This patch adds a helper function that can facilitate searching. Signed-off-by: Arne Jansen --- fs/btrfs/ctree.c | 72 ++ fs/btrfs/ctree.h | 10 +++ 2 files changed, 82 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 964ac9a..db79c99 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1862,6 +1862,78 @@ done: } /* + * helper to use instead of search slot if no exact match is needed but + * instead the next or previous item should be returned. + * When find_higher is true, the next higher item is returned, the next lower + * otherwise. + * When return_any and find_higher are both true, and no higher item is found, + * return the next lower instead. + * When return_any is true and find_higher is false, and no lower item is found, + * return the next higher instead. + * It returns 0 if any item is found, 1 if none is found (tree empty), and + * < 0 on error + */ +int btrfs_search_slot_for_read(struct btrfs_root *root, + struct btrfs_key *key, struct btrfs_path *p, + int find_higher, int return_any) +{ + int ret; + struct extent_buffer *leaf; + +again: + ret = btrfs_search_slot(NULL, root, key, p, 0, 0); + if (ret <= 0) + return ret; + /* +* a return value of 1 means the path is at the position where the +* item should be inserted. Normally this is the next bigger item, +* but in case the previous item is the last in a leaf, path points +* to the first free slot in the previous leaf, i.e. at an invalid +* item. +*/ + leaf = p->nodes[0]; + + if (find_higher) { + if (p->slots[0] >= btrfs_header_nritems(leaf)) { + ret = btrfs_next_leaf(root, p); + if (ret <= 0) + return ret; + if (!return_any) + return 1; + /* +* no higher item found, return the next +* lower instead +*/ + return_any = 0; + find_higher = 0; + btrfs_release_path(p); + goto again; + } + } else { + if (p->slots[0] >= btrfs_header_nritems(leaf)) { + /* we're sitting on an invalid slot */ + if (p->slots[0] == 0) { + ret = btrfs_prev_leaf(root, p); + if (ret <= 0) + return ret; + if (!return_any) + return 1; + /* +* no lower item found, return the next +* higher instead +*/ + return_any = 0; + find_higher = 1; + btrfs_release_path(p); + goto again; + } + --p->slots[0]; + } + } + return 0; +} + +/* * adjust the pointers going up the tree, starting at level * making sure the right key of each node is points to 'key'. * This is used after shifting pointers to the left, so it stops diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 7a1ca9c..09c58e5 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2458,6 +2458,9 @@ int btrfs_duplicate_item(struct btrfs_trans_handle *trans, int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_key *key, struct btrfs_path *p, int ins_len, int cow); +int btrfs_search_slot_for_read(struct btrfs_root *root, + struct btrfs_key *key, struct btrfs_path *p, + int find_higher, int return_any); int btrfs_realloc_node(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *parent, int start_slot, int cache_only, u64 *last_ret, @@ -2500,6 +2503,13 @@ static inline int btrfs_insert_empty_item(struct btrfs_trans_handle *trans, } int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path); +static inline int btrfs_next_item(struct btrfs_root *root, struct btrfs_path *p) +{ + ++p->slots[0]; + if (p->slots[0] >= btrfs_header_nritems(p->nodes[0])) + return btrfs_next_leaf(root, p); + return 0; +} int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path); int btrfs_leaf_fr
[PATCH v0 08/18] btrfs: added helper to create new trees
This creates a brand new tree. Will be used to create the quota tree. Signed-off-by: Arne Jansen --- fs/btrfs/disk-io.c | 76 fs/btrfs/disk-io.h |3 ++ 2 files changed, 79 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 074a539..672747d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1145,6 +1145,82 @@ static int find_and_setup_root(struct btrfs_root *tree_root, return 0; } +struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, +struct btrfs_fs_info *fs_info, +u64 objectid) +{ + struct extent_buffer *leaf; + struct btrfs_root *tree_root = fs_info->tree_root; + struct btrfs_root *root; + struct btrfs_key key; + int ret = 0; + u64 bytenr; + + root = kzalloc(sizeof(struct btrfs_root), GFP_NOFS); + if (!root) + return ERR_PTR(-ENOMEM); + + __setup_root(tree_root->nodesize, tree_root->leafsize, +tree_root->sectorsize, tree_root->stripesize, +root, fs_info, objectid); + root->root_key.objectid = objectid; + root->root_key.type = BTRFS_ROOT_ITEM_KEY; + root->root_key.offset = 0; + + leaf = btrfs_alloc_free_block(trans, root, root->leafsize, + 0, objectid, NULL, 0, 0, 0); + if (IS_ERR(leaf)) { + ret = PTR_ERR(leaf); + goto fail; + } + + bytenr = leaf->start; + memset_extent_buffer(leaf, 0, 0, sizeof(struct btrfs_header)); + btrfs_set_header_bytenr(leaf, leaf->start); + btrfs_set_header_generation(leaf, trans->transid); + btrfs_set_header_backref_rev(leaf, BTRFS_MIXED_BACKREF_REV); + btrfs_set_header_owner(leaf, objectid); + root->node = leaf; + + write_extent_buffer(leaf, fs_info->fsid, + (unsigned long)btrfs_header_fsid(leaf), + BTRFS_FSID_SIZE); + write_extent_buffer(leaf, fs_info->chunk_tree_uuid, + (unsigned long)btrfs_header_chunk_tree_uuid(leaf), + BTRFS_UUID_SIZE); + btrfs_mark_buffer_dirty(leaf); + + root->commit_root = btrfs_root_node(root); + root->track_dirty = 1; + + + root->root_item.flags = 0; + root->root_item.byte_limit = 0; + btrfs_set_root_bytenr(&root->root_item, leaf->start); + btrfs_set_root_generation(&root->root_item, trans->transid); + btrfs_set_root_level(&root->root_item, 0); + btrfs_set_root_refs(&root->root_item, 1); + btrfs_set_root_used(&root->root_item, leaf->len); + btrfs_set_root_last_snapshot(&root->root_item, 0); + btrfs_set_root_dirid(&root->root_item, 0); + root->root_item.drop_level = 0; + + key.objectid = objectid; + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = 0; + ret = btrfs_insert_root(trans, tree_root, &key, &root->root_item); + if (ret) + goto fail; + + btrfs_tree_unlock(leaf); + +fail: + if (ret) + return ERR_PTR(ret); + + return root; +} + static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 09a164d..b166beb 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -83,6 +83,9 @@ int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_add_log_tree(struct btrfs_trans_handle *trans, struct btrfs_root *root); +struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans, +struct btrfs_fs_info *fs_info, +u64 objectid); int btree_lock_page_hook(struct page *page); -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v0 12/18] btrfs: put back delayed refs that are too new
When processing a delayed ref, first check if there are still old refs in the process of being added. If so, put this ref back to the tree. To avoid looping on this ref, choose a newer one in the next loop. btrfs_find_ref_cluster has to take care of that. Signed-off-by: Arne Jansen --- fs/btrfs/delayed-ref.c | 43 +-- fs/btrfs/extent-tree.c | 27 ++- 2 files changed, 47 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 2c8544b..d6f934f 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -160,16 +160,22 @@ static struct btrfs_delayed_ref_node *tree_insert(struct rb_root *root, /* * find an head entry based on bytenr. This returns the delayed ref - * head if it was able to find one, or NULL if nothing was in that spot + * head if it was able to find one, or NULL if nothing was in that spot. + * If return_bigger is given, the next bigger entry is returned if no exact + * match is found. */ static struct btrfs_delayed_ref_node *find_ref_head(struct rb_root *root, u64 bytenr, - struct btrfs_delayed_ref_node **last) + struct btrfs_delayed_ref_node **last, + int return_bigger) { - struct rb_node *n = root->rb_node; + struct rb_node *n; struct btrfs_delayed_ref_node *entry; - int cmp; + int cmp = 0; +again: + n = root->rb_node; + entry = NULL; while (n) { entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node); WARN_ON(!entry->in_tree); @@ -192,6 +198,19 @@ static struct btrfs_delayed_ref_node *find_ref_head(struct rb_root *root, else return entry; } + if (entry && return_bigger) { + if (cmp > 0) { + n = rb_next(&entry->rb_node); + if (!n) + n = rb_first(root); + entry = rb_entry(n, struct btrfs_delayed_ref_node, +rb_node); + bytenr = entry->bytenr; + return_bigger = 0; + goto again; + } + return entry; + } return NULL; } @@ -266,20 +285,8 @@ int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans, node = rb_first(&delayed_refs->root); } else { ref = NULL; - find_ref_head(&delayed_refs->root, start, &ref); + find_ref_head(&delayed_refs->root, start+1, &ref, 1); if (ref) { - struct btrfs_delayed_ref_node *tmp; - - node = rb_prev(&ref->rb_node); - while (node) { - tmp = rb_entry(node, - struct btrfs_delayed_ref_node, - rb_node); - if (tmp->bytenr < start) - break; - ref = tmp; - node = rb_prev(&ref->rb_node); - } node = &ref->rb_node; } else node = rb_first(&delayed_refs->root); @@ -777,7 +784,7 @@ btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr) struct btrfs_delayed_ref_root *delayed_refs; delayed_refs = &trans->transaction->delayed_refs; - ref = find_ref_head(&delayed_refs->root, bytenr, NULL); + ref = find_ref_head(&delayed_refs->root, bytenr, NULL, 0); if (ref) return btrfs_delayed_node_to_head(ref); return NULL; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e3b69ce..7fb9650 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2180,6 +2180,28 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, } /* +* locked_ref is the head node, so we have to go one +* node back for any delayed ref updates +*/ + ref = select_delayed_ref(locked_ref); + + if (ref && ref->seq && + btrfs_check_delayed_seq(delayed_refs, ref->seq)) { + /* +* there are still refs with lower seq numbers in the +* process of being added. Don't run this ref yet. +*/ + list_del_init(&locked_ref->cluster); + mutex_unlock(&locked_ref->mutex); + locked_ref = NULL; + delayed_refs->num_heads_ready++; + spin_unlock(&delayed_refs->lock); +
[PATCH v0 18/18] btrfs: add qgroup inheritance
When creating a subvolume or snapshot, it is necessary to initialize the qgroup account with a copy of some other (tracking) qgroup. This patch adds parameters to the ioctls to pass the information from which qgroup to inherit. Signed-off-by: Arne Jansen --- fs/btrfs/ioctl.c | 59 ++- fs/btrfs/ioctl.h | 11 - fs/btrfs/transaction.c |8 ++ fs/btrfs/transaction.h |1 + 4 files changed, 61 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f46bc35..54fefef 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -315,7 +315,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, void __user *arg) static noinline int create_subvol(struct btrfs_root *root, struct dentry *dentry, char *name, int namelen, - u64 *async_transid) + u64 *async_transid, + struct btrfs_qgroup_inherit **inherit) { struct btrfs_trans_handle *trans; struct btrfs_key key; @@ -347,6 +348,11 @@ static noinline int create_subvol(struct btrfs_root *root, if (IS_ERR(trans)) return PTR_ERR(trans); + ret = btrfs_qgroup_inherit(trans, root->fs_info, 0, objectid, + inherit ? *inherit : NULL); + if (ret) + goto fail; + leaf = btrfs_alloc_free_block(trans, root, root->leafsize, 0, objectid, NULL, 0, 0, 0); if (IS_ERR(leaf)) { @@ -448,7 +454,7 @@ fail: static int create_snapshot(struct btrfs_root *root, struct dentry *dentry, char *name, int namelen, u64 *async_transid, - bool readonly) + bool readonly, struct btrfs_qgroup_inherit **inherit) { struct inode *inode; struct btrfs_pending_snapshot *pending_snapshot; @@ -466,6 +472,10 @@ static int create_snapshot(struct btrfs_root *root, struct dentry *dentry, pending_snapshot->dentry = dentry; pending_snapshot->root = root; pending_snapshot->readonly = readonly; + if (inherit) { + pending_snapshot->inherit = *inherit; + *inherit = NULL;/* take responsibility to free it */ + } trans = btrfs_start_transaction(root->fs_info->extent_root, 5); if (IS_ERR(trans)) { @@ -599,7 +609,8 @@ static inline int btrfs_may_create(struct inode *dir, struct dentry *child) static noinline int btrfs_mksubvol(struct path *parent, char *name, int namelen, struct btrfs_root *snap_src, - u64 *async_transid, bool readonly) + u64 *async_transid, bool readonly, + struct btrfs_qgroup_inherit **inherit) { struct inode *dir = parent->dentry->d_inode; struct dentry *dentry; @@ -630,11 +641,11 @@ static noinline int btrfs_mksubvol(struct path *parent, goto out_up_read; if (snap_src) { - error = create_snapshot(snap_src, dentry, - name, namelen, async_transid, readonly); + error = create_snapshot(snap_src, dentry, name, namelen, + async_transid, readonly, inherit); } else { error = create_subvol(BTRFS_I(dir)->root, dentry, - name, namelen, async_transid); + name, namelen, async_transid, inherit); } if (!error) fsnotify_mkdir(dir, dentry); @@ -1253,11 +1264,9 @@ out_unlock: } static noinline int btrfs_ioctl_snap_create_transid(struct file *file, - char *name, - unsigned long fd, - int subvol, - u64 *transid, - bool readonly) + char *name, unsigned long fd, int subvol, + u64 *transid, bool readonly, + struct btrfs_qgroup_inherit **inherit) { struct btrfs_root *root = BTRFS_I(fdentry(file)->d_inode)->root; struct file *src_file; @@ -1275,7 +1284,7 @@ static noinline int btrfs_ioctl_snap_create_transid(struct file *file, if (subvol) { ret = btrfs_mksubvol(&file->f_path, name, namelen, -NULL, transid, readonly); +NULL, transid, readonly, inherit); } else { struct inode *src_inode; src_file =
[PATCH v0 06/18] btrfs: check the root passed to btrfs_end_transaction
This patch only add a consistancy check to validate that the same root is passed to start_transaction and end_transaction. Subvolume quota depends on this. Signed-off-by: Arne Jansen --- fs/btrfs/transaction.c |6 ++ fs/btrfs/transaction.h |6 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index b5ee16b..d7f32da 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -306,6 +306,7 @@ again: h->transaction = cur_trans; h->blocks_used = 0; h->bytes_reserved = 0; + h->root = root; h->delayed_ref_updates = 0; h->use_count = 1; h->block_rsv = NULL; @@ -453,6 +454,11 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, return 0; } + /* +* the same root has to be passed to start_transaction and +* end_transaction. Subvolume quota depends on this. +*/ + WARN_ON(trans->root != root); while (count < 4) { unsigned long cur = trans->delayed_ref_updates; trans->delayed_ref_updates = 0; diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 02564e6..b120126 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -55,6 +55,12 @@ struct btrfs_trans_handle { struct btrfs_transaction *transaction; struct btrfs_block_rsv *block_rsv; struct btrfs_block_rsv *orig_rsv; + /* +* this root is only needed to validate that the root passed to +* start_transaction is the same as the one passed to end_transaction. +* Subvolume quota depends on this +*/ + struct btrfs_root *root; }; struct btrfs_pending_snapshot { -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v0 13/18] btrfs: qgroup implementation and prototypes
Signed-off-by: Arne Jansen --- fs/btrfs/Makefile |2 +- fs/btrfs/ctree.h | 32 + fs/btrfs/ioctl.h | 24 + fs/btrfs/qgroup.c | 2151 + 4 files changed, 2208 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 9ff560b..7738ecc 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -8,6 +8,6 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ - ulist.o + qgroup.o ulist.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 49f97d8..1deb6b8 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2870,6 +2870,38 @@ int btrfs_scrub_cancel_devid(struct btrfs_root *root, u64 devid); int btrfs_scrub_progress(struct btrfs_root *root, u64 devid, struct btrfs_scrub_progress *progress); +/* quota.c */ +int btrfs_quota_enable(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info); +int btrfs_quota_disable(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info); +int btrfs_quota_rescan(struct btrfs_fs_info *fs_info); +int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, u64 src, u64 dst); +int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, u64 src, u64 dst); +int btrfs_create_qgroup(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, u64 qgroupid, + char *name); +int btrfs_remove_qgroup(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, u64 qgroupid); +int btrfs_limit_qgroup(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, u64 qgroupid, + struct btrfs_qgroup_limit *limit); +int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info); +void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info); +struct btrfs_delayed_extent_op; +int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info, + struct btrfs_delayed_ref_node *node, + struct btrfs_delayed_extent_op *extent_op); +int btrfs_run_qgroups(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info); +int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans, +struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid, +struct btrfs_qgroup_inherit *inherit); +int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes); +void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes); static inline int is_fstree(u64 rootid) { if (rootid == BTRFS_FS_TREE_OBJECTID || diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h index ad1ea78..36d14a4 100644 --- a/fs/btrfs/ioctl.h +++ b/fs/btrfs/ioctl.h @@ -35,6 +35,30 @@ struct btrfs_ioctl_vol_args { #define BTRFS_FSID_SIZE 16 #define BTRFS_UUID_SIZE 16 +#define BTRFS_QGROUP_INHERIT_SET_LIMITS(1ULL << 0) + +struct btrfs_qgroup_limit { + __u64 flags; + __u64 max_rfer; + __u64 max_excl; + __u64 rsv_rfer; + __u64 rsv_excl; +}; + +struct btrfs_qgroup_inherit { + __u64 flags; + __u64 num_qgroups; + __u64 num_ref_copies; + __u64 num_excl_copies; + struct btrfs_qgroup_limit lim; + __u64 qgroups[0]; +}; + +struct btrfs_ioctl_qgroup_limit_args { + __u64 qgroupid; + struct btrfs_qgroup_limit lim; +}; + #define BTRFS_SUBVOL_NAME_MAX 4039 struct btrfs_ioctl_vol_args_v2 { __s64 fd; diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c new file mode 100644 index 000..0140aef --- /dev/null +++ b/fs/btrfs/qgroup.c @@ -0,0 +1,2151 @@ +/* + * Copyright (C) 2011 STRATO. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ + +#include +#include +#include +#include +#include +#include +#include + +
[PATCH v0 07/18] btrfs: generic data structure to build unique lists
ulist is a generic data structures to hold a collection of unique u64 values. The only operations it supports is adding to the list and enumerating it. It is possible to store an auxiliary value along with the key. The implementation is preliminary and can probably be sped up significantly. It is used by subvolume quota to translate recursions into iterative loops. Signed-off-by: Arne Jansen --- fs/btrfs/Makefile |3 +- fs/btrfs/ulist.c | 122 + fs/btrfs/ulist.h | 59 + 3 files changed, 183 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 40e6ac0..9ff560b 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -7,6 +7,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \ extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o \ - compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o + compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ + ulist.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c new file mode 100644 index 000..756a937 --- /dev/null +++ b/fs/btrfs/ulist.c @@ -0,0 +1,122 @@ +/* + * Copyright (C) 2011 STRATO. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program; if not, write to the + * Free Software Foundation, Inc., 59 Temple Place - Suite 330, + * Boston, MA 021110-1307, USA. + */ + +#include +#include +#include + +#include "ulist.h" + +void ulist_init(struct ulist *ulist, unsigned long gfp_mask) +{ + ulist->nnodes = 0; + ulist->gfp_mask = gfp_mask; + ulist->nodes = ulist->int_nodes; + ulist->nodes_alloced = ULIST_SIZE; +} + +void ulist_fini(struct ulist *ulist) +{ + if (ulist->nodes_alloced > ULIST_SIZE) + kfree(ulist->nodes); +} + +void ulist_reinit(struct ulist *ulist) +{ + ulist_fini(ulist); + ulist_init(ulist, ulist->gfp_mask); +} + +struct ulist *ulist_alloc(unsigned long gfp_mask) +{ + struct ulist *ulist = kmalloc(sizeof(*ulist), gfp_mask); + + if (!ulist) + return NULL; + + ulist_init(ulist, gfp_mask); + + return ulist; +} + +void ulist_free(struct ulist *ulist) +{ + if (!ulist) + return; + ulist_fini(ulist); + kfree(ulist); +} + +int ulist_add(struct ulist *ulist, u64 val, unsigned long aux) +{ + u64 i; + + for (i = 0; i < ulist->nnodes; ++i) { + if (ulist->nodes[i].val == val) + return 0; + } + + if (ulist->nnodes > ulist->nodes_alloced) { + u64 new_alloced = ulist->nodes_alloced + 128; + struct ulist_node *new_nodes = kmalloc(sizeof(*new_nodes) * + new_alloced, ulist->gfp_mask); + + if (!new_nodes) + return -ENOMEM; + memcpy(new_nodes, ulist->nodes, + sizeof(*new_nodes) * ulist->nnodes); + if (ulist->nodes_alloced > ULIST_SIZE) + kfree(ulist->nodes); + ulist->nodes = new_nodes; + ulist->nodes_alloced = new_alloced; + } + ulist->nodes[ulist->nnodes].val = val; + ulist->nodes[ulist->nnodes].aux = aux; + ulist->nodes[ulist->nnodes].next = ulist->nnodes + 1; + ++ulist->nnodes; + + return 1; +} + +struct ulist_node *ulist_next(struct ulist *ulist, struct ulist_node *prev) +{ + if (ulist->nnodes == 0) + return NULL; + + if (!prev) + return &ulist->nodes[0]; + + if (prev->next < 0 || prev->next >= ulist->nnodes) + return NULL; + + return &ulist->nodes[prev->next]; +} + +int ulist_merge(struct ulist *dst, struct ulist *src) +{ + struct ulist_node *node = NULL; + int ret; + + while ((node = ulist_next(src, node))) { + ret = ulist_add(dst, node->val, node->aux); + if (ret) + return ret; + } + + return 0; +} diff --git a/fs/btrfs/ulist.h b/fs/btrfs/ulist.h new file mode 100644 index 000..2eb7e9d --- /dev/null +++ b/fs/btrfs/ulist.h @@ -0,0 +1,59 @@ +/* + * Copyright (C) 2011 STRATO. All rights reserved.
[PATCH v0 00/18] btfs: Subvolume Quota Groups
This is a first draft of a subvolume quota implementation. It is possible to limit subvolumes and any group of subvolumes and also to track the amount of space that will get freed when deleting snapshots. The current version is functionally incomplete, with the main missing feature being the initial scan and rescan of an existing filesystem. I put some effort into writing an introduction into the concepts and implementation which can be found at http://sensille.com/qgroups.pdf The purpose of getting it out in this early stage is to get as much input as possible, in regard to concepts, implementation and testing. The accompanying user mode parts will take some additional days to gather. Thanks, Arne Arne Jansen (18): btrfs: mark delayed refs as for cow btrfs: always save ref_root in delayed refs btrfs: add nested locking mode for paths btrfs: qgroup on-disk format btrfs: add helper for tree enumeration btrfs: check the root passed to btrfs_end_transaction btrfs: generic data structure to build unique lists btrfs: added helper to create new trees btrfs: qgroup state and initialization btrfs: Test code to change the order of delayed-ref processing btrfs: add sequence numbers to delayed refs btrfs: put back delayed refs that are too new btrfs: qgroup implementation and prototypes btrfs: quota tree support and startup btrfs: hooks for qgroup to record delayed refs btrfs: hooks to reserve qgroup space btrfs: add qgroup ioctls btrfs: add qgroup inheritance fs/btrfs/Makefile |3 +- fs/btrfs/ctree.c | 114 +++- fs/btrfs/ctree.h | 224 +- fs/btrfs/delayed-ref.c | 188 -- fs/btrfs/delayed-ref.h | 48 +- fs/btrfs/disk-io.c | 130 +++- fs/btrfs/disk-io.h |3 + fs/btrfs/extent-tree.c | 185 - fs/btrfs/extent_io.c |1 + fs/btrfs/extent_io.h |2 + fs/btrfs/file.c| 10 +- fs/btrfs/inode.c |2 +- fs/btrfs/ioctl.c | 247 +- fs/btrfs/ioctl.h | 62 ++- fs/btrfs/locking.c | 51 ++- fs/btrfs/locking.h |2 +- fs/btrfs/qgroup.c | 2151 fs/btrfs/relocation.c | 18 +- fs/btrfs/transaction.c | 45 +- fs/btrfs/transaction.h |8 + fs/btrfs/tree-log.c|2 +- fs/btrfs/ulist.c | 122 +++ fs/btrfs/ulist.h | 59 ++ 23 files changed, 3501 insertions(+), 176 deletions(-) create mode 100644 fs/btrfs/qgroup.c create mode 100644 fs/btrfs/ulist.c create mode 100644 fs/btrfs/ulist.h -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v0 10/18] btrfs: Test code to change the order of delayed-ref processing
Normally delayed refs get processed in ascending bytenr order. This correlates in most cases to the order added. To expose dependencies on this order, we start to process the tree in the middle instead of the beginning. This code is only effective when SCRAMBLE_DELAYED_REFS is defined. Signed-off-by: Arne Jansen --- fs/btrfs/extent-tree.c | 50 1 files changed, 50 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 29ac93e..e3b69ce 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -33,6 +33,8 @@ #include "locking.h" #include "free-space-cache.h" +#undef SCRAMBLE_DELAYED_REFS + /* control flags for do_chunk_alloc's force field * CHUNK_ALLOC_NO_FORCE means to only allocate a chunk * if we really need one. @@ -2241,6 +2243,49 @@ static noinline int run_clustered_refs(struct btrfs_trans_handle *trans, return count; } +#ifdef SCRAMBLE_DELAYED_REFS +/* + * Normally delayed refs get processed in ascending bytenr order. This + * correlates in most cases to the order added. To expose dependencies on this + * order, we start to process the tree in the middle instead of the beginning + */ +static u64 find_middle(struct rb_root *root) +{ + struct rb_node *n = root->rb_node; + struct btrfs_delayed_ref_node *entry; + int alt = 1; + u64 middle; + u64 first = 0, last = 0; + + n = rb_first(root); + if (n) { + entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node); + first = entry->bytenr; + } + n = rb_last(root); + if (n) { + entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node); + last = entry->bytenr; + } + n = root->rb_node; + + while (n) { + entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node); + WARN_ON(!entry->in_tree); + + middle = entry->bytenr; + + if (alt) + n = n->rb_left; + else + n = n->rb_right; + + alt = 1 - alt; + } + return middle; +} +#endif + /* * this starts processing the delayed reference count updates and * extent insertions we have queued up so far. count can be @@ -2266,6 +2311,11 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, INIT_LIST_HEAD(&cluster); again: spin_lock(&delayed_refs->lock); + +#ifdef SCRAMBLE_DELAYED_REFS +delayed_refs->run_delayed_start = find_middle(&delayed_refs->root); +#endif + if (count == 0) { count = delayed_refs->num_entries * 2; run_most = 1; -- 1.7.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v0 04/18] btrfs: qgroup on-disk format
Not all features are in use by the current version and thus may change in the future. Signed-off-by: Arne Jansen --- fs/btrfs/ctree.h | 136 ++ 1 files changed, 136 insertions(+), 0 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 2765b8d..7a1ca9c 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -85,6 +85,9 @@ struct btrfs_ordered_sum; /* holds checksums of all the data extents */ #define BTRFS_CSUM_TREE_OBJECTID 7ULL +/* holds quota configuration and tracking */ +#define BTRFS_QUOTA_TREE_OBJECTID 8ULL + /* orhpan objectid for tracking unlinked/truncated files */ #define BTRFS_ORPHAN_OBJECTID -5ULL @@ -724,6 +727,72 @@ struct btrfs_block_group_item { __le64 flags; } __attribute__ ((__packed__)); +/* + * is subvolume quota turned on? + */ +#define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL << 0) +/* + * SCANNING is set during the initialization phase + */ +#define BTRFS_QGROUP_STATUS_FLAG_SCANNING (1ULL << 1) +/* + * Some qgroup entries are known to be out of date, + * either because the configuration has changed in a way that + * makes a rescan necessary, or because the fs has been mounted + * with a non-qgroup-aware version. + * Turning qouta off and on again makes it inconsistent, too. + */ +#define BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT (1ULL << 2) + +#define BTRFS_QGROUP_STATUS_VERSION1 + +struct btrfs_qgroup_status_item { + __le64 version; + /* +* the generation is updated during every commit. As older +* versions of btrfs are not aware of qgroups, it will be +* possible to detect inconsistencies by checking the +* generation on mount time +*/ + __le64 generation; + + /* flag definitions see above */ + __le64 flags; + + /* +* only used during scanning to record the progress +* of the scan. It contains a logical address +*/ + __le64 scan; +} __attribute__ ((__packed__)); + +struct btrfs_qgroup_info_item { + __le64 generation; + __le64 rfer; + __le64 rfer_cmpr; + __le64 excl; + __le64 excl_cmpr; +} __attribute__ ((__packed__)); + +/* flags definition for qgroup limits */ +#define BTRFS_QGROUP_LIMIT_MAX_RFER(1ULL << 0) +#define BTRFS_QGROUP_LIMIT_MAX_EXCL(1ULL << 1) +#define BTRFS_QGROUP_LIMIT_RSV_RFER(1ULL << 2) +#define BTRFS_QGROUP_LIMIT_RSV_EXCL(1ULL << 3) +#define BTRFS_QGROUP_LIMIT_RFER_CMPR (1ULL << 4) +#define BTRFS_QGROUP_LIMIT_EXCL_CMPR (1ULL << 5) + +struct btrfs_qgroup_limit_item { + /* +* only updated when any of the other values change +*/ + __le64 flags; + __le64 max_rfer; + __le64 max_excl; + __le64 rsv_rfer; + __le64 rsv_excl; +} __attribute__ ((__packed__)); + struct btrfs_space_info { u64 flags; @@ -1336,6 +1405,30 @@ struct btrfs_ioctl_defrag_range_args { #define BTRFS_CHUNK_ITEM_KEY 228 /* + * Records the overall state of the qgroups. + * There's only one instance of this key present, + * (0, BTRFS_QGROUP_STATUS_KEY, 0) + */ +#define BTRFS_QGROUP_STATUS_KEY 240 +/* + * Records the currently used space of the qgroup. + * One key per qgroup, (0, BTRFS_QGROUP_INFO_KEY, qgroupid). + */ +#define BTRFS_QGROUP_INFO_KEY 242 +/* + * Contains the user configured limits for the qgroup. + * One key per qgroup, (0, BTRFS_QGROUP_LIMIT_KEY, qgroupid). + */ +#define BTRFS_QGROUP_LIMIT_KEY 244 +/* + * Records the child-parent relationship of qgroups. For + * each relation, 2 keys are present: + * (childid, BTRFS_QGROUP_RELATION_KEY, parentid) + * (parentid, BTRFS_QGROUP_RELATION_KEY, childid) + */ +#define BTRFS_QGROUP_RELATION_KEY 246 + +/* * string items are for debugging. They just store a short string of * data in the FS */ @@ -2098,6 +2191,49 @@ static inline u32 btrfs_file_extent_inline_item_len(struct extent_buffer *eb, return btrfs_item_size(eb, e) - offset; } +/* btrfs_qgroup_status_item */ +BTRFS_SETGET_FUNCS(qgroup_status_generation, struct btrfs_qgroup_status_item, + generation, 64); +BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item, + version, 64); +BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item, + flags, 64); +BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item, + scan, 64); + +/* btrfs_qgroup_info_item */ +BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item, + generation, 64); +BTRFS_SETGET_FUNCS(qgroup_info_rfer, struct btrfs_qgroup_info_item, rfer, 64); +BTRFS_SETGET_FUNCS(qgroup_info_rfer_cmpr, struct btrfs_qgroup_info_item, + rfer_cmpr, 64); +BTRFS_SETGET_FUNCS(qgroup_info_excl, struct btrfs_qgroup_info_item, excl, 64); +BTRFS_SETGET_FUNCS(qgroup_info_excl_cmpr, struct btrfs_qgroup_info_item, +
Re: [PATCH] Btrfs: fix recursive auto-defrag
On Thu, Oct 06, 2011 at 11:39:54AM +0800, Li Zefan wrote: > Follow those steps: > > # mount -o autodefrag /dev/sda7 /mnt > # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1 > # sync > # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc > > and then it'll go into a loop: writeback -> defrag -> writeback ... > > It's because writeback writes [8K, 200K] and then writes [0, 8K]. > > I tried to make writeback know if the pages are dirtied by defrag, > but the patch was a bit intrusive. Here I simply set writeback_index > when we defrag a file. Really nice and small fix. I'll definitely send this for 3.1 -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Honest timeline for btrfsck
> No, in this case it means we're confident it will get rolled out. On Aug 18th confidence was high enough to declare a possible release that very day. This confidence turned into 7 weeks of silence followed by another 2 week estimate. These confident declarations are why things like mniederle's btrfs_rescue are considered 'interim' and not worth building on. Had this confidence of imminent release not been the prevalent message for the last year, others would have stepped in to fill the void. > I've given a number of hard dates recently and I'd prefer to show up > with the code instead. I don't think it makes sense to put a partial > implementation out there, we'll just have a bunch of people reporting > problems that I know exist. > > -chris > This strategy of 'Lone Wolfing it' has delayed the release by a year. Either you are flying solo because you think that you can make more meaningful progress without the involvement of the btrfs community, or you are willing to forfeit the contributions of the community in order to not have to listen to any complaints. The other problem of this flying solo plan, is that you are making the assumption that the problems you know about are more significant than the problems you are unaware of and could be flushed out with more eyes on the code. The longer you delay the release of the source, the longer it will be until confidence can be generated that major issues have been resolved. http://en.wikipedia.org/wiki/Release_early,_release_often -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix recursive auto-defrag
On Thu, Oct 06, 2011 at 11:39:54AM +0800, Li Zefan wrote: > Follow those steps: > > # mount -o autodefrag /dev/sda7 /mnt > # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1 > # sync > # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc > > and then it'll go into a loop: writeback -> defrag -> writeback ... > > It's because writeback writes [8K, 200K] and then writes [0, 8K]. > > I tried to make writeback know if the pages are dirtied by defrag, > but the patch was a bit intrusive. Here I simply set writeback_index > when we defrag a file. > > Signed-off-by: Li Zefan Tested-by: David Sterba > --- > fs/btrfs/ioctl.c |7 +++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 970977a..7a10f94 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1047,6 +1047,13 @@ int btrfs_defrag_file(struct inode *inode, struct file > *file, > if (!max_to_defrag) > max_to_defrag = last_index - 1; > > + /* > + * make writeback starts from i, so the defrag range can be > + * written sequentially. > + */ > + if (i < inode->i_mapping->writeback_index) > + inode->i_mapping->writeback_index = i; > + > while (i <= last_index && defrag_count < max_to_defrag) { > /* >* make sure we stop running if someone unmounts > -- 1.7.3.1 > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html