Re: Honest timeline for btrfsck

2011-10-06 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/06/2011 10:50 PM, Chris Mason wrote:
> On Thu, Oct 06, 2011 at 10:31:41AM -0500, Jeff Putney wrote:
>>> No, in this case it means we're confident it will get rolled
>>> out.
>> 
>> On Aug 18th confidence was high enough to declare a possible
>> release that very day.  This confidence turned into 7 weeks of
>> silence followed by another 2 week estimate.
>> 
>> These confident declarations are why things like mniederle's 
>> btrfs_rescue are considered 'interim' and not worth building on.
>> Had this confidence of imminent release not been the prevalent
>> message for the last year, others would have stepped in to fill
>> the void.
>> 
>>> I've given a number of hard dates recently and I'd prefer to
>>> show up with the code instead.  I don't think it makes sense to
>>> put a partial implementation out there, we'll just have a bunch
>>> of people reporting problems that I know exist.
>>> 
>>> -chris
>>> 
>> 
>> This strategy of 'Lone Wolfing it' has delayed the release by a
>> year. Either you are flying solo because you think that you can
>> make more meaningful progress without the involvement of the
>> btrfs community, or you are willing to forfeit the contributions
>> of the community in order to not have to listen to any
>> complaints.
>> 
>> The other problem of this flying solo plan, is that you are
>> making the assumption that the problems you know about are more
>> significant than the problems you are unaware of and could be
>> flushed out with more eyes on the code.  The longer you delay the
>> release of the source, the longer it will be until confidence can
>> be generated that major issues have been resolved.
>> 
>> http://en.wikipedia.org/wiki/Release_early,_release_often
> 
> [ Thanks for everyone's comments! ]
> 
> Keep in mind that btrfs was released and ran for a long time while 
> intentionally crashing when we ran out of space.   This was a
> really important part of our development because we attracted a
> huge number of contributors, and some very brave users.
> 
> For fsck, even the stuff I have here does have a way to go before
> it is at the level of an e2fsck or xfs_repair.  But I do want to
> make sure that I'm surprised by any bugs before I send it out, and
> that's just not the case today.  The release has been delayed
> because I've alternated between a few different ways of repairing,
> and because I got distracted by some important features in the
> kernel.

Yes. The single biggest rule of file system recovery tools is that you
never leave the file system more broken than when you found it. Beta
testing fsck, when the author him/herself isn't comfortable releasing
the code, is insane when you have data you care about. If you
disagree, I'll hit the pause button until you learn some very hard
lessons.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6Og+4ACgkQLPWxlyuTD7KcywCeNmC9N5pwuHaLu1++YhoSQYWC
+Y0An0wgtv3dxsH6ZZCdPy2JihJWOe14
=g/pv
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-10-06 Thread Roman Mamedov
On Thu, 6 Oct 2011 23:20:45 + (UTC)
Yalonda Gishtaka  wrote:

> and tarnishing Oracle's name. 

Thank you sir you just made my day.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Honest timeline for btrfsck

2011-10-06 Thread Chris Mason
On Thu, Oct 06, 2011 at 10:31:41AM -0500, Jeff Putney wrote:
> > No, in this case it means we're confident it will get rolled out.
> 
> On Aug 18th confidence was high enough to declare a possible release
> that very day.  This confidence turned into 7 weeks of silence
> followed by another 2 week estimate.
> 
> These confident declarations are why things like mniederle's
> btrfs_rescue are considered 'interim' and not worth building on.  Had
> this confidence of imminent release not been the prevalent message for
> the last year, others would have stepped in to fill the void.
> 
> > I've given a number of hard dates recently and I'd prefer to show up
> > with the code instead.  I don't think it makes sense to put a partial
> > implementation out there, we'll just have a bunch of people reporting
> > problems that I know exist.
> >
> > -chris
> >
> 
> This strategy of 'Lone Wolfing it' has delayed the release by a year.
> Either you are flying solo because you think that you can make more
> meaningful progress without the involvement of the btrfs community, or
> you are willing to forfeit the contributions of the community in order
> to not have to listen to any complaints.
> 
> The other problem of this flying solo plan, is that you are making the
> assumption that the problems you know about are more significant than
> the problems you are unaware of and could be flushed out with more
> eyes on the code.  The longer you delay the release of the source, the
> longer it will be until confidence can be generated that major issues
> have been resolved.
> 
> http://en.wikipedia.org/wiki/Release_early,_release_often

[ Thanks for everyone's comments! ]

Keep in mind that btrfs was released and ran for a long time while
intentionally crashing when we ran out of space.   This was a really
important part of our development because we attracted a huge number of
contributors, and some very brave users.

For fsck, even the stuff I have here does have a way to go before it is
at the level of an e2fsck or xfs_repair.  But I do want to make sure
that I'm surprised by any bugs before I send it out, and that's just not
the case today.  The release has been delayed because I've alternated
between a few different ways of repairing, and because I got distracted
by some important features in the kernel.

That's how software goes sometimes, and I'll take the criticism because
it hasn't gone as well as it should have.  But, I can't stress enough how
much I appreciate everyone's contributions and interest in btrfs.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-10-06 Thread Chester
On Thu, Oct 6, 2011 at 10:31 AM, Jeff Putney  wrote:
>> No, in this case it means we're confident it will get rolled out.
>
> On Aug 18th confidence was high enough to declare a possible release
> that very day.  This confidence turned into 7 weeks of silence
> followed by another 2 week estimate.
>
> These confident declarations are why things like mniederle's
> btrfs_rescue are considered 'interim' and not worth building on.  Had
> this confidence of imminent release not been the prevalent message for
> the last year, others would have stepped in to fill the void.
>
>> I've given a number of hard dates recently and I'd prefer to show up
>> with the code instead.  I don't think it makes sense to put a partial
>> implementation out there, we'll just have a bunch of people reporting
>> problems that I know exist.
>>
>> -chris
>>
>
> This strategy of 'Lone Wolfing it' has delayed the release by a year.
> Either you are flying solo because you think that you can make more
> meaningful progress without the involvement of the btrfs community, or
> you are willing to forfeit the contributions of the community in order
> to not have to listen to any complaints.
>
> The other problem of this flying solo plan, is that you are making the
> assumption that the problems you know about are more significant than
> the problems you are unaware of and could be flushed out with more
> eyes on the code.  The longer you delay the release of the source, the
> longer it will be until confidence can be generated that major issues
> have been resolved.
>
> http://en.wikipedia.org/wiki/Release_early,_release_often
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

The problem with this is that people naturally look for a fsck tool
when something bad goes wrong. Something as important as a fsck
utility shouldn't be released (unofficially or officially) half baked.
It can irreparably destroy a filesystem which could've otherwise been
repaired with a fully functional fsck.

I think Chris is trying to prevent that from happening.

Perhaps Chris can set up a private developer repo and ask for help
from redhat, fujitsu, etc..?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-10-06 Thread Chris Samuel
On 07/10/11 10:20, Yalonda Gishtaka wrote:

> Couldn't have put it better.  It's really time for Chris Mason
> to stop disgracing the open source community and tarnishing
> Oracle's name. 

Oh come on - he's working *for* Oracle to do this and we are
getting the benefits for free.  We can hardly complain when
he's trying to deal with LKML, doing btrfs devel for Oracle
and having a life as well (i.e. his recent vacation).  I've
known too many people burn out in IT due to overcommitment
and I don't want to see that happen to Chris.

If you wish to direct his priorities then I suggest that you
should be paying Oracle to do so, or else attempt to employ
him yourself.

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-10-06 Thread Yalonda Gishtaka
Jeff Putney  gmail.com> writes:
> This strategy of 'Lone Wolfing it' has delayed the release by a year.
> Either you are flying solo because you think that you can make more
> meaningful progress without the involvement of the btrfs community, or
> you are willing to forfeit the contributions of the community in order
> to not have to listen to any complaints.
> 

Couldn't have put it better.  It's really time for Chris Mason to stop
disgracing the open source community and tarnishing Oracle's name. 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v0 07/18] btrfs: generic data structure to build unique lists

2011-10-06 Thread David Sterba
On Thu, Oct 06, 2011 at 01:33:00PM -0700, Andi Kleen wrote:
> Arne Jansen  writes:
> 
> > ulist is a generic data structures to hold a collection of unique u64
> > values. The only operations it supports is adding to the list and
> > enumerating it.
> > It is possible to store an auxiliary value along with the key.
> > The implementation is preliminary and can probably be sped up
> > significantly.
> > It is used by subvolume quota to translate recursions into iterative
> > loops.
> 
> Hmm, sounds like a job for lib/idr.c 
> 
> What do your ulists do that idr doesn't?

Arne's ulists keep full u64 values, IDR are int based.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-10-06 Thread Randy Barlow
On 10/06/2011 11:31 AM, Jeff Putney wrote:
> http://en.wikipedia.org/wiki/Release_early,_release_often

I can appreciate both Jeff's and Andi's positions on this issue. I do
wonder why the fsck isn't publicly available as it is as a non-release
version, just so people can begin getting their eyes on it and make
contributions. I think that would really help with getting a higher
quality product in less time, which is a good goal to attempt to
achieve. I've only played with btrfs at this point, and I'm mostly
waiting for an fsck tool to exist (in a mature form) before using this
fine filesystem on any of my systems, so I am interested in seeing the
fsck reach maturity as I am very excited about all the features that
btrfs offers.

That said, I also think that we ought not to complain to Chris when he
is doing work that will benefit us all, without any cost to us. We may
prefer that he take a different approach in developing this tool, but in
the end he is serving us and we ought not to look a gift horse in the
mouth, as they say.

Chris, I respectfully request that the code you have be placed into a
public repository. It is your choice of course, but I believe it would
be a good thing for btrfs. However and whenever it is delivered to the
community, I am confident that btrfs will be ready for production use
very soon. Thanks to you and all the devs for working so hard to bring
Linux into the future of filesystems!

-- 
R



signature.asc
Description: OpenPGP digital signature


Re: Honest timeline for btrfsck

2011-10-06 Thread Francesco Riosa
2011/10/6 Andi Kleen :
> Jeff Putney  writes:
>>
>> http://en.wikipedia.org/wiki/Release_early,_release_often
>
> Well the other principle in free software you're forgetting
> is:
>
> "It will be released when it's ready"
>
> If you don't like Chris' ways to do releases you're free to write
> something on your own or pay someone to do so. Otherwise
> you just have to deal with his time frames, as shifty
> as they may be.

I did a different thing, I've offered Chris money to help rescue an
hosed btrfs or to point to someone who could do, we ended in doing
some tests (for free) but nothing else materialized.
While the time passed has diminished the value of the data to be
rescued I'm more on the "show us some code we can start from" than "it
will be released when ready" vagon.

Francesco R.

>
> -Andi
> --
> a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v0 03/18] btrfs: add nested locking mode for paths

2011-10-06 Thread Andi Kleen
On Fri, Oct 07, 2011 at 12:44:30AM +0400, Andrey Kuzmin wrote:
> Perhaps you could just elaborate on "needs this feature"? In general, write
> lock gives one exclusive access, so the need for additional read
> (non-exclusive) lock does not appear easily understandable.

Usually it's because the low level code can be called both with and
without locking and it doesn't know.

But that usually can be avoided with some restructuring.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v0 17/18] btrfs: add qgroup ioctls

2011-10-06 Thread Andi Kleen
Arne Jansen  writes:
> +
> + if (copy_to_user(arg, sa, sizeof(*sa)))
> + ret = -EFAULT;
> +
> + if (trans) {
> + err = btrfs_commit_transaction(trans, root);
> + if (err && !ret)
> + ret = err;
> + }

It would seem safer to put the copy to user outside the transaction.
A cto can in principle cause new writes (e.g. if it causes COW), so 
you may end up with nested transactions. Even if that works somehow
(not sure) it seems to be a thing better avoided.

> +
> + sa = memdup_user(arg, sizeof(*sa));
> + if (IS_ERR(sa))
> + return PTR_ERR(sa);
> +
> + trans = btrfs_join_transaction(root);
> + if (IS_ERR(trans)) {
> + ret = PTR_ERR(trans);
> + goto out;
> + }

This code seems to be duplicated a lot. Can it be consolidated?

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v0 03/18] btrfs: add nested locking mode for paths

2011-10-06 Thread Andi Kleen
Arne Jansen  writes:

> This patch adds the possibilty to read-lock an extent
> even if it is already write-locked from the same thread.
> Subvolume quota needs this capability.

Recursive locking is generally strongly discouraged, it causes all kinds
of problems and tends to eventuall ylead to locking hierarchies nobody
can understand anymore.

If you can find any other way to solve this problem I would
encourage you to do so.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-10-06 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/06/2011 04:30 PM, Andi Kleen wrote:
> Jeff Putney  writes:
>> 
>> http://en.wikipedia.org/wiki/Release_early,_release_often
> 
> Well the other principle in free software you're forgetting is:
> 
> "It will be released when it's ready"
> 
> If you don't like Chris' ways to do releases you're free to write 
> something on your own or pay someone to do so. Otherwise you just
> have to deal with his time frames, as shifty as they may be.

Thanks, I was about to say the same thing.

- -Jeff

- -- 
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk6OEK8ACgkQLPWxlyuTD7JgKwCfYqyslTkbq/sYUz/rcXj4M1lf
mTAAoIbIKNIZlyVFZjzrCH/ss9W3UQuh
=6vdd
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v0 07/18] btrfs: generic data structure to build unique lists

2011-10-06 Thread Andi Kleen
Arne Jansen  writes:

> ulist is a generic data structures to hold a collection of unique u64
> values. The only operations it supports is adding to the list and
> enumerating it.
> It is possible to store an auxiliary value along with the key.
> The implementation is preliminary and can probably be sped up
> significantly.
> It is used by subvolume quota to translate recursions into iterative
> loops.

Hmm, sounds like a job for lib/idr.c 

What do your ulists do that idr doesn't?
Ok idr doesn't have merge, but that should be simple
enough to add.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-10-06 Thread Andi Kleen
Jeff Putney  writes:
>
> http://en.wikipedia.org/wiki/Release_early,_release_often

Well the other principle in free software you're forgetting 
is:

"It will be released when it's ready"

If you don't like Chris' ways to do releases you're free to write
something on your own or pay someone to do so. Otherwise
you just have to deal with his time frames, as shifty
as they may be.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v0 14/18] btrfs: quota tree support and startup

2011-10-06 Thread Arne Jansen
Init the quota tree along with the others on open_ctree
and close_ctree. Add the quota tree to the list of well
known trees in btrfs_read_fs_root_no_name.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/disk-io.c |   47 +--
 1 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index cb25017..06576ed 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1391,6 +1391,9 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct 
btrfs_fs_info *fs_info,
return fs_info->dev_root;
if (location->objectid == BTRFS_CSUM_TREE_OBJECTID)
return fs_info->csum_root;
+   if (location->objectid == BTRFS_QUOTA_TREE_OBJECTID)
+   return fs_info->quota_root ? fs_info->quota_root :
+ERR_PTR(-ENOENT);
 again:
spin_lock(&fs_info->fs_roots_radix_lock);
root = radix_tree_lookup(&fs_info->fs_roots_radix,
@@ -1676,6 +1679,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
GFP_NOFS);
struct btrfs_root *dev_root = kzalloc(sizeof(struct btrfs_root),
  GFP_NOFS);
+   struct btrfs_root *quota_root = kzalloc(sizeof(struct btrfs_root),
+ GFP_NOFS);
struct btrfs_root *log_tree_root;
 
int ret;
@@ -1684,7 +1689,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
struct btrfs_super_block *disk_super;
 
if (!extent_root || !tree_root || !tree_root->fs_info ||
-   !chunk_root || !dev_root || !csum_root) {
+   !chunk_root || !dev_root || !csum_root || !quota_root) {
err = -ENOMEM;
goto fail;
}
@@ -2078,6 +2083,18 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
csum_root->track_dirty = 1;
 
+   ret = find_and_setup_root(tree_root, fs_info,
+ BTRFS_QUOTA_TREE_OBJECTID, quota_root);
+   if (ret) {
+   kfree(quota_root);
+   quota_root = NULL;
+   } else {
+   quota_root->track_dirty = 1;
+   fs_info->quota_enabled = 1;
+   fs_info->pending_quota_state = 1;
+   }
+   fs_info->quota_root = quota_root;
+
fs_info->generation = generation;
fs_info->last_trans_committed = generation;
fs_info->data_alloc_profile = (u64)-1;
@@ -2115,6 +2132,10 @@ struct btrfs_root *open_ctree(struct super_block *sb,
btrfs_set_opt(fs_info->mount_opt, SSD);
}
 
+   ret = btrfs_read_qgroup_config(fs_info);
+   if (ret)
+   goto fail_trans_kthread;
+
/* do not make disk changes in broken FS */
if (btrfs_super_log_root(disk_super) != 0 &&
!(fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR)) {
@@ -2124,7 +2145,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
printk(KERN_WARNING "Btrfs log replay required "
   "on RO media\n");
err = -EIO;
-   goto fail_trans_kthread;
+   goto fail_qgroup;
}
blocksize =
 btrfs_level_size(tree_root,
@@ -2133,7 +2154,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
log_tree_root = kzalloc(sizeof(struct btrfs_root), GFP_NOFS);
if (!log_tree_root) {
err = -ENOMEM;
-   goto fail_trans_kthread;
+   goto fail_qgroup;
}
 
__setup_root(nodesize, leafsize, sectorsize, stripesize,
@@ -2163,7 +2184,7 @@ struct btrfs_root *open_ctree(struct super_block *sb,
printk(KERN_WARNING
   "btrfs: failed to recover relocation\n");
err = -EINVAL;
-   goto fail_trans_kthread;
+   goto fail_qgroup;
}
}
 
@@ -2173,10 +2194,10 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
fs_info->fs_root = btrfs_read_fs_root_no_name(fs_info, &location);
if (!fs_info->fs_root)
-   goto fail_trans_kthread;
+   goto fail_qgroup;
if (IS_ERR(fs_info->fs_root)) {
err = PTR_ERR(fs_info->fs_root);
-   goto fail_trans_kthread;
+   goto fail_qgroup;
}
 
if (!(sb->s_flags & MS_RDONLY)) {
@@ -2193,6 +2214,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
 
return tree_root;
 
+fail_qgroup:
+   btrfs_free_qgroup_config(fs_info);
 fail_trans_kthread:
kthread_stop(fs_info->transaction_kthread);
 fail_cleaner:
@@ -2209,6 +2232,10 @@ fail_block_groups:
btrfs_free_block_groups(fs_info);
free_extent_buffer(csum_root->node);

[PATCH v0 16/18] btrfs: hooks to reserve qgroup space

2011-10-06 Thread Arne Jansen
Like block reserves, reserve a small piece of space on each
transaction start and for delalloc. These are the hooks that
can actually return EDQUOT to the user.
The amount of space reserved is tracked in the transaction
handle.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/extent-tree.c |   13 +
 fs/btrfs/transaction.c |   16 
 fs/btrfs/transaction.h |1 +
 3 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7fb9650..a2400c4 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4093,6 +4093,14 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
spin_unlock(&BTRFS_I(inode)->lock);
 
to_reserve += calc_csum_metadata_size(inode, num_bytes);
+
+   if (root->fs_info->quota_enabled) {
+   ret = btrfs_qgroup_reserve(root, num_bytes +
+  nr_extents * root->leafsize);
+   if (ret)
+   return ret;
+   }
+
ret = reserve_metadata_bytes(NULL, root, block_rsv, to_reserve, 1);
if (ret) {
unsigned dropped;
@@ -4123,6 +4131,11 @@ void btrfs_delalloc_release_metadata(struct inode 
*inode, u64 num_bytes)
if (dropped > 0)
to_free += btrfs_calc_trans_metadata_size(root, dropped);
 
+   if (root->fs_info->quota_enabled) {
+   btrfs_qgroup_free(root, num_bytes +
+   dropped * root->leafsize);
+   }
+
btrfs_block_rsv_release(root, &root->fs_info->delalloc_block_rsv,
to_free);
 }
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 1ae856e..a8b7668 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -260,6 +260,7 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
struct btrfs_transaction *cur_trans;
u64 num_bytes = 0;
int ret;
+   u64 qgroup_reserved = 0;
 
if (root->fs_info->fs_state & BTRFS_SUPER_FLAG_ERROR)
return ERR_PTR(-EROFS);
@@ -278,6 +279,14 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
 * the appropriate flushing if need be.
 */
if (num_items > 0 && root != root->fs_info->chunk_root) {
+   if (root->fs_info->quota_enabled &&
+   is_fstree(root->root_key.objectid)) {
+   qgroup_reserved = num_items * root->leafsize;
+   ret = btrfs_qgroup_reserve(root, qgroup_reserved);
+   if (ret)
+   return ERR_PTR(ret);
+   }
+
num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
ret = btrfs_block_rsv_add(NULL, root,
  &root->fs_info->trans_block_rsv,
@@ -315,6 +324,7 @@ again:
h->use_count = 1;
h->block_rsv = NULL;
h->orig_rsv = NULL;
+   h->qgroup_reserved = qgroup_reserved;
 
smp_mb();
if (cur_trans->blocked && may_wait_transaction(root, type)) {
@@ -463,6 +473,12 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
 * end_transaction. Subvolume quota depends on this.
 */
WARN_ON(trans->root != root);
+
+   if (trans->qgroup_reserved) {
+   btrfs_qgroup_free(root, trans->qgroup_reserved);
+   trans->qgroup_reserved = 0;
+   }
+
while (count < 4) {
unsigned long cur = trans->delayed_ref_updates;
trans->delayed_ref_updates = 0;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index b120126..5f5d216 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -48,6 +48,7 @@ struct btrfs_transaction {
 struct btrfs_trans_handle {
u64 transid;
u64 bytes_reserved;
+   u64 qgroup_reserved;
unsigned long use_count;
unsigned long blocks_reserved;
unsigned long blocks_used;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v0 17/18] btrfs: add qgroup ioctls

2011-10-06 Thread Arne Jansen
Ioctls to control the qgroup feature like adding and
removing qgroups and assigning qgroups.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/ioctl.c |  185 ++
 fs/btrfs/ioctl.h |   27 
 2 files changed, 212 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index fade500..f46bc35 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2834,6 +2834,183 @@ static long btrfs_ioctl_scrub_progress(struct 
btrfs_root *root,
return ret;
 }
 
+static long btrfs_ioctl_quota_ctl(struct btrfs_root *root, void __user *arg)
+{
+   struct btrfs_ioctl_quota_ctl_args *sa;
+   struct btrfs_trans_handle *trans = NULL;
+   int ret;
+   int err;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (root->fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa))
+   return PTR_ERR(sa);
+
+   if (sa->cmd != BTRFS_QUOTA_CTL_RESCAN) {
+   trans = btrfs_start_transaction(root, 2);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+   }
+
+   switch (sa->cmd) {
+   case BTRFS_QUOTA_CTL_ENABLE:
+   ret = btrfs_quota_enable(trans, root->fs_info);
+   break;
+   case BTRFS_QUOTA_CTL_DISABLE:
+   ret = btrfs_quota_disable(trans, root->fs_info);
+   break;
+   case BTRFS_QUOTA_CTL_RESCAN:
+   ret = btrfs_quota_rescan(root->fs_info);
+   break;
+   default:
+   ret = -EINVAL;
+   break;
+   }
+
+   if (copy_to_user(arg, sa, sizeof(*sa)))
+   ret = -EFAULT;
+
+   if (trans) {
+   err = btrfs_commit_transaction(trans, root);
+   if (err && !ret)
+   ret = err;
+   }
+
+out:
+   kfree(sa);
+   return ret;
+}
+
+static long btrfs_ioctl_qgroup_assign(struct btrfs_root *root, void __user 
*arg)
+{
+   struct btrfs_ioctl_qgroup_assign_args *sa;
+   struct btrfs_trans_handle *trans;
+   int ret;
+   int err;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (root->fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa))
+   return PTR_ERR(sa);
+
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+
+   /* FIXME: check if the IDs really exist */
+   if (sa->assign) {
+   ret = btrfs_add_qgroup_relation(trans, root->fs_info,
+   sa->src, sa->dst);
+   } else {
+   ret = btrfs_del_qgroup_relation(trans, root->fs_info,
+   sa->src, sa->dst);
+   }
+
+   err = btrfs_end_transaction(trans, root);
+   if (err && !ret)
+   ret = err;
+
+out:
+   kfree(sa);
+   return ret;
+}
+
+static long btrfs_ioctl_qgroup_create(struct btrfs_root *root, void __user 
*arg)
+{
+   struct btrfs_ioctl_qgroup_create_args *sa;
+   struct btrfs_trans_handle *trans;
+   int ret;
+   int err;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (root->fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa))
+   return PTR_ERR(sa);
+
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+
+   /* FIXME: check if the IDs really exist */
+   if (sa->create) {
+   ret = btrfs_create_qgroup(trans, root->fs_info, sa->qgroupid,
+ NULL);
+   } else {
+   ret = btrfs_remove_qgroup(trans, root->fs_info, sa->qgroupid);
+   }
+
+   err = btrfs_end_transaction(trans, root);
+   if (err && !ret)
+   ret = err;
+
+out:
+   kfree(sa);
+   return ret;
+}
+
+static long btrfs_ioctl_qgroup_limit(struct btrfs_root *root, void __user *arg)
+{
+   struct btrfs_ioctl_qgroup_limit_args *sa;
+   struct btrfs_trans_handle *trans;
+   int ret;
+   int err;
+   u64 qgroupid;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (root->fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa))
+   return PTR_ERR(sa);
+
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+
+   qgroupid = sa->qgroupid;
+   if (!qgroupid) {
+  

[PATCH v0 15/18] btrfs: hooks for qgroup to record delayed refs

2011-10-06 Thread Arne Jansen
Hooks into qgroup code to record refs and into transaction commit.
This is the main entry point for qgroup. Basically every change in
extent backrefs got accounted to the appropriate qgroups.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/delayed-ref.c |   24 ++--
 fs/btrfs/transaction.c |7 +++
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index d6f934f..bd74b7a 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -442,11 +442,12 @@ update_existing_head_ref(struct btrfs_delayed_ref_node 
*existing,
  */
 static noinline int add_delayed_ref_head(struct btrfs_fs_info *fs_info,
struct btrfs_trans_handle *trans,
-   struct btrfs_delayed_ref_node *ref,
+   struct btrfs_delayed_ref_node **pref,
u64 bytenr, u64 num_bytes,
int action, int is_data)
 {
struct btrfs_delayed_ref_node *existing;
+   struct btrfs_delayed_ref_node *ref = *pref;
struct btrfs_delayed_ref_head *head_ref = NULL;
struct btrfs_delayed_ref_root *delayed_refs;
int count_mod = 1;
@@ -503,6 +504,7 @@ static noinline int add_delayed_ref_head(struct 
btrfs_fs_info *fs_info,
 
if (existing) {
update_existing_head_ref(existing, ref);
+   *pref = existing;
/*
 * we've updated the existing ref, free the newly
 * allocated ref
@@ -654,6 +656,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
 {
struct btrfs_delayed_tree_ref *ref;
struct btrfs_delayed_ref_head *head_ref;
+   struct btrfs_delayed_ref_node *node;
struct btrfs_delayed_ref_root *delayed_refs;
int ret;
struct seq_list seq_elem;
@@ -678,7 +681,8 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
 * insert both the head node and the new ref without dropping
 * the spin lock
 */
-   ret = add_delayed_ref_head(fs_info, trans, &head_ref->node, bytenr,
+   node = &head_ref->node;
+   ret = add_delayed_ref_head(fs_info, trans, &node, bytenr,
   num_bytes, action, 0);
BUG_ON(ret);
 
@@ -687,8 +691,10 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
   for_cow, &seq_elem);
BUG_ON(ret);
spin_unlock(&delayed_refs->lock);
-   if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root))
+   if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root)) {
+   btrfs_qgroup_record_ref(trans, fs_info, &ref->node, extent_op);
put_delayed_seq(delayed_refs, &seq_elem);
+   }
 
return 0;
 }
@@ -706,6 +712,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 {
struct btrfs_delayed_data_ref *ref;
struct btrfs_delayed_ref_head *head_ref;
+   struct btrfs_delayed_ref_node *node;
struct btrfs_delayed_ref_root *delayed_refs;
int ret;
struct seq_list seq_elem;
@@ -730,7 +737,8 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 * insert both the head node and the new ref without dropping
 * the spin lock
 */
-   ret = add_delayed_ref_head(fs_info, trans, &head_ref->node, bytenr,
+   node = &head_ref->node;
+   ret = add_delayed_ref_head(fs_info, trans, &node, bytenr,
   num_bytes, action, 1);
BUG_ON(ret);
 
@@ -739,8 +747,10 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
   action, for_cow, &seq_elem);
BUG_ON(ret);
spin_unlock(&delayed_refs->lock);
-   if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root))
+   if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root)) {
+   btrfs_qgroup_record_ref(trans, fs_info, &ref->node, extent_op);
put_delayed_seq(delayed_refs, &seq_elem);
+   }
 
return 0;
 }
@@ -751,6 +761,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info 
*fs_info,
struct btrfs_delayed_extent_op *extent_op)
 {
struct btrfs_delayed_ref_head *head_ref;
+   struct btrfs_delayed_ref_node *node;
struct btrfs_delayed_ref_root *delayed_refs;
int ret;
 
@@ -763,7 +774,8 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info 
*fs_info,
delayed_refs = &trans->transaction->delayed_refs;
spin_lock(&delayed_refs->lock);
 
-   ret = add_delayed_ref_head(fs_info, trans, &head_ref->node, bytenr,
+   node = &head_ref->node;
+   ret = add_delayed_ref_head(fs_info, trans, &node, bytenr,
   num_bytes, BTRFS_UPDATE_DELAYED_HEAD,
  

[PATCH v0 02/18] btrfs: always save ref_root in delayed refs

2011-10-06 Thread Arne Jansen
For qgroup calculation the information to which root a
delayed ref belongs is useful even for shared refs.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/delayed-ref.c |   18 --
 fs/btrfs/delayed-ref.h |   12 
 2 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 3a0f0ab..babd37b 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -495,13 +495,12 @@ static noinline int add_delayed_tree_ref(struct 
btrfs_fs_info *fs_info,
ref->in_tree = 1;
 
full_ref = btrfs_delayed_node_to_tree_ref(ref);
-   if (parent) {
-   full_ref->parent = parent;
+   full_ref->parent = parent;
+   full_ref->root = ref_root;
+   if (parent)
ref->type = BTRFS_SHARED_BLOCK_REF_KEY;
-   } else {
-   full_ref->root = ref_root;
+   else
ref->type = BTRFS_TREE_BLOCK_REF_KEY;
-   }
full_ref->level = level;
 
trace_btrfs_delayed_tree_ref(ref, full_ref, action);
@@ -551,13 +550,12 @@ static noinline int add_delayed_data_ref(struct 
btrfs_fs_info *fs_info,
ref->in_tree = 1;
 
full_ref = btrfs_delayed_node_to_data_ref(ref);
-   if (parent) {
-   full_ref->parent = parent;
+   full_ref->parent = parent;
+   full_ref->root = ref_root;
+   if (parent)
ref->type = BTRFS_SHARED_DATA_REF_KEY;
-   } else {
-   full_ref->root = ref_root;
+   else
ref->type = BTRFS_EXTENT_DATA_REF_KEY;
-   }
 
full_ref->objectid = owner;
full_ref->offset = offset;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 8316bff..a5fb2bc 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -98,19 +98,15 @@ struct btrfs_delayed_ref_head {
 
 struct btrfs_delayed_tree_ref {
struct btrfs_delayed_ref_node node;
-   union {
-   u64 root;
-   u64 parent;
-   };
+   u64 root;
+   u64 parent;
int level;
 };
 
 struct btrfs_delayed_data_ref {
struct btrfs_delayed_ref_node node;
-   union {
-   u64 root;
-   u64 parent;
-   };
+   u64 root;
+   u64 parent;
u64 objectid;
u64 offset;
 };
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v0 03/18] btrfs: add nested locking mode for paths

2011-10-06 Thread Arne Jansen
This patch adds the possibilty to read-lock an extent
even if it is already write-locked from the same thread.
Subvolume quota needs this capability.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/ctree.c |   22 
 fs/btrfs/ctree.h |1 +
 fs/btrfs/extent_io.c |1 +
 fs/btrfs/extent_io.h |2 +
 fs/btrfs/locking.c   |   51 +++--
 fs/btrfs/locking.h   |2 +-
 6 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 51b387b..964ac9a 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -186,13 +186,14 @@ struct extent_buffer *btrfs_lock_root_node(struct 
btrfs_root *root)
  * tree until you end up with a lock on the root.  A locked buffer
  * is returned, with a reference held.
  */
-struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root)
+struct extent_buffer *btrfs_read_lock_root_node(struct btrfs_root *root,
+   int nested)
 {
struct extent_buffer *eb;
 
while (1) {
eb = btrfs_root_node(root);
-   btrfs_tree_read_lock(eb);
+   btrfs_tree_read_lock(eb, nested);
if (eb == root->node)
break;
btrfs_tree_read_unlock(eb);
@@ -1620,6 +1621,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root
/* everything at write_lock_level or lower must be write locked */
int write_lock_level = 0;
u8 lowest_level = 0;
+   int nested = p->nested;
 
lowest_level = p->lowest_level;
WARN_ON(lowest_level && ins_len > 0);
@@ -1661,8 +1663,9 @@ again:
b = root->commit_root;
extent_buffer_get(b);
level = btrfs_header_level(b);
+   BUG_ON(p->skip_locking && nested);
if (!p->skip_locking)
-   btrfs_tree_read_lock(b);
+   btrfs_tree_read_lock(b, 0);
} else {
if (p->skip_locking) {
b = btrfs_root_node(root);
@@ -1671,7 +1674,7 @@ again:
/* we don't know the level of the root node
 * until we actually have it read locked
 */
-   b = btrfs_read_lock_root_node(root);
+   b = btrfs_read_lock_root_node(root, nested);
level = btrfs_header_level(b);
if (level <= write_lock_level) {
/* whoops, must trade for write lock */
@@ -1810,7 +1813,8 @@ cow_done:
err = btrfs_try_tree_read_lock(b);
if (!err) {
btrfs_set_path_blocking(p);
-   btrfs_tree_read_lock(b);
+   btrfs_tree_read_lock(b,
+nested);
btrfs_clear_path_blocking(p, b,
  
BTRFS_READ_LOCK);
}
@@ -3955,7 +3959,7 @@ int btrfs_search_forward(struct btrfs_root *root, struct 
btrfs_key *min_key,
 
WARN_ON(!path->keep_locks);
 again:
-   cur = btrfs_read_lock_root_node(root);
+   cur = btrfs_read_lock_root_node(root, 0);
level = btrfs_header_level(cur);
WARN_ON(path->nodes[level]);
path->nodes[level] = cur;
@@ -4049,7 +4053,7 @@ find_next_key:
cur = read_node_slot(root, cur, slot);
BUG_ON(!cur);
 
-   btrfs_tree_read_lock(cur);
+   btrfs_tree_read_lock(cur, 0);
 
path->locks[level - 1] = BTRFS_READ_LOCK;
path->nodes[level - 1] = cur;
@@ -4243,7 +4247,7 @@ again:
ret = btrfs_try_tree_read_lock(next);
if (!ret) {
btrfs_set_path_blocking(path);
-   btrfs_tree_read_lock(next);
+   btrfs_tree_read_lock(next, 0);
btrfs_clear_path_blocking(path, next,
  BTRFS_READ_LOCK);
}
@@ -4280,7 +4284,7 @@ again:
ret = btrfs_try_tree_read_lock(next);
if (!ret) {
btrfs_set_path_blocking(path);
-   btrfs_tree_read_lock(next);
+   btrfs_tree_read_lock(next, 0);
btrfs_clear_path_blocking(path, next,
  BTRFS_READ_LOCK);
}
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree

[PATCH v0 09/18] btrfs: qgroup state and initialization

2011-10-06 Thread Arne Jansen
Add state to fs_info.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/ctree.h   |   32 
 fs/btrfs/disk-io.c |7 +++
 2 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 09c58e5..49f97d8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -958,6 +958,7 @@ struct btrfs_fs_info {
struct btrfs_root *dev_root;
struct btrfs_root *fs_root;
struct btrfs_root *csum_root;
+   struct btrfs_root *quota_root;
 
/* the log root tree is a directory of all the other log roots */
struct btrfs_root *log_root_tree;
@@ -1185,6 +1186,30 @@ struct btrfs_fs_info {
int scrub_workers_refcnt;
struct btrfs_workers scrub_workers;
 
+   /*
+* quota information
+*/
+   unsigned int quota_enabled:1;
+
+   /*
+* quota_enabled only changes state after a commit. This holds the
+* next state.
+*/
+   unsigned int pending_quota_state:1;
+
+   /* is qgroup tracking in a consistent state? */
+   u64 qgroup_flags;
+
+   /* holds configuration and tracking. Protected by qgroup_lock */
+   struct rb_root qgroup_tree;
+   spinlock_t qgroup_lock;
+
+   /* list of dirty qgroups to be written at next commit */
+   struct list_head dirty_qgroups;
+
+   /* used by btrfs_qgroup_record_ref for an efficient tree traversal */
+   u64 qgroup_seq;
+
/* filesystem state */
u64 fs_state;
 
@@ -2845,4 +2870,11 @@ int btrfs_scrub_cancel_devid(struct btrfs_root *root, 
u64 devid);
 int btrfs_scrub_progress(struct btrfs_root *root, u64 devid,
 struct btrfs_scrub_progress *progress);
 
+static inline int is_fstree(u64 rootid)
+{
+   if (rootid == BTRFS_FS_TREE_OBJECTID ||
+   (s64)rootid >= (s64)BTRFS_FIRST_FREE_OBJECTID)
+   return 1;
+   return 0;
+}
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 672747d..cb25017 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1825,6 +1825,13 @@ struct btrfs_root *open_ctree(struct super_block *sb,
init_rwsem(&fs_info->cleanup_work_sem);
init_rwsem(&fs_info->subvol_sem);
 
+   spin_lock_init(&fs_info->qgroup_lock);
+   fs_info->qgroup_tree = RB_ROOT;
+   INIT_LIST_HEAD(&fs_info->dirty_qgroups);
+   fs_info->qgroup_seq = 1;
+   fs_info->quota_enabled = 0;
+   fs_info->pending_quota_state = 0;
+
btrfs_init_free_cluster(&fs_info->meta_alloc_cluster);
btrfs_init_free_cluster(&fs_info->data_alloc_cluster);
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v0 01/18] btrfs: mark delayed refs as for cow

2011-10-06 Thread Arne Jansen
Add a for_cow parameter to add_delayed_*_ref and pass the
appropriate value from every call site. The for_cow parameter
will later on be used to determine if a ref will change anything
with respect to qgroups.
Delayed refs coming from relocation are always counted as for_cow,
as they don't change subvol quota.
Also pass in the fs_info for later use.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/ctree.c   |   20 +-
 fs/btrfs/ctree.h   |   13 ---
 fs/btrfs/delayed-ref.c |   50 --
 fs/btrfs/delayed-ref.h |   15 +---
 fs/btrfs/extent-tree.c |   95 +---
 fs/btrfs/file.c|   10 +++---
 fs/btrfs/inode.c   |2 +-
 fs/btrfs/ioctl.c   |3 +-
 fs/btrfs/relocation.c  |   18 +
 fs/btrfs/transaction.c |4 +-
 fs/btrfs/tree-log.c|2 +-
 11 files changed, 136 insertions(+), 96 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 011cab3..51b387b 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -261,9 +261,9 @@ int btrfs_copy_root(struct btrfs_trans_handle *trans,
 
WARN_ON(btrfs_header_generation(buf) > trans->transid);
if (new_root_objectid == BTRFS_TREE_RELOC_OBJECTID)
-   ret = btrfs_inc_ref(trans, root, cow, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 1, 1);
else
-   ret = btrfs_inc_ref(trans, root, cow, 0);
+   ret = btrfs_inc_ref(trans, root, cow, 0, 1);
 
if (ret)
return ret;
@@ -350,14 +350,14 @@ static noinline int update_ref_for_cow(struct 
btrfs_trans_handle *trans,
if ((owner == root->root_key.objectid ||
 root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) &&
!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF)) {
-   ret = btrfs_inc_ref(trans, root, buf, 1);
+   ret = btrfs_inc_ref(trans, root, buf, 1, 1);
BUG_ON(ret);
 
if (root->root_key.objectid ==
BTRFS_TREE_RELOC_OBJECTID) {
-   ret = btrfs_dec_ref(trans, root, buf, 0);
+   ret = btrfs_dec_ref(trans, root, buf, 0, 1);
BUG_ON(ret);
-   ret = btrfs_inc_ref(trans, root, cow, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 1, 1);
BUG_ON(ret);
}
new_flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
@@ -365,9 +365,9 @@ static noinline int update_ref_for_cow(struct 
btrfs_trans_handle *trans,
 
if (root->root_key.objectid ==
BTRFS_TREE_RELOC_OBJECTID)
-   ret = btrfs_inc_ref(trans, root, cow, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 1, 1);
else
-   ret = btrfs_inc_ref(trans, root, cow, 0);
+   ret = btrfs_inc_ref(trans, root, cow, 0, 1);
BUG_ON(ret);
}
if (new_flags != 0) {
@@ -381,11 +381,11 @@ static noinline int update_ref_for_cow(struct 
btrfs_trans_handle *trans,
if (flags & BTRFS_BLOCK_FLAG_FULL_BACKREF) {
if (root->root_key.objectid ==
BTRFS_TREE_RELOC_OBJECTID)
-   ret = btrfs_inc_ref(trans, root, cow, 1);
+   ret = btrfs_inc_ref(trans, root, cow, 1, 1);
else
-   ret = btrfs_inc_ref(trans, root, cow, 0);
+   ret = btrfs_inc_ref(trans, root, cow, 0, 1);
BUG_ON(ret);
-   ret = btrfs_dec_ref(trans, root, buf, 1);
+   ret = btrfs_dec_ref(trans, root, buf, 1, 1);
BUG_ON(ret);
}
clean_tree_block(trans, root, buf);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 03912c5..68f2315 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2183,17 +2183,17 @@ int btrfs_reserve_extent(struct btrfs_trans_handle 
*trans,
  u64 search_end, struct btrfs_key *ins,
  u64 data);
 int btrfs_inc_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
- struct extent_buffer *buf, int full_backref);
+ struct extent_buffer *buf, int full_backref, int for_cow);
 int btrfs_dec_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
- struct extent_buffer *buf, int full_backref);
+ struct extent_buffer *buf, int full_backref, int for_cow);
 int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans,
struct btrfs_root *root,

[PATCH v0 11/18] btrfs: add sequence numbers to delayed refs

2011-10-06 Thread Arne Jansen
Sequence numbers are needed to reconstruct the backrefs
of a given extent to a certain point in time. The total
set of backrefs consist of the set of backrefs recorded
on disk plus the enqueued delayed refs for it that existed
at that moment.
This patch also add a list that records all delayed refs
that are currently in the process of being added. With
qgroups enabled add a delayed ref involves walking the
backrefs of the extent. During this time, no delayed ref
that is newer must be processed.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/delayed-ref.c |   71 +---
 fs/btrfs/delayed-ref.h |   21 ++
 fs/btrfs/transaction.c |4 +++
 3 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index babd37b..2c8544b 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -23,6 +23,11 @@
 #include "delayed-ref.h"
 #include "transaction.h"
 
+struct seq_list {
+   struct list_head list;
+   u64 seq;
+};
+
 /*
  * delayed back reference update tracking.  For subvolume trees
  * we queue up extent allocations and backref maintenance for
@@ -101,6 +106,11 @@ static int comp_entry(struct btrfs_delayed_ref_node *ref2,
return -1;
if (ref1->type > ref2->type)
return 1;
+   /* with quota enable, merging of refs is not allowed */
+   if (ref1->seq < ref2->seq)
+   return -1;
+   if (ref1->seq > ref2->seq)
+   return 1;
if (ref1->type == BTRFS_TREE_BLOCK_REF_KEY ||
ref1->type == BTRFS_SHARED_BLOCK_REF_KEY) {
return comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref2),
@@ -209,6 +219,39 @@ int btrfs_delayed_ref_lock(struct btrfs_trans_handle 
*trans,
return 0;
 }
 
+static u64 get_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs,
+  struct seq_list *elem)
+{
+   assert_spin_locked(&delayed_refs->lock);
+   elem->seq = ++delayed_refs->seq;
+   list_add_tail(&elem->list, &delayed_refs->seq_head);
+
+   return elem->seq;
+}
+
+static void put_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs,
+   struct seq_list *elem)
+{
+   spin_lock(&delayed_refs->lock);
+   list_del(&elem->list);
+   spin_unlock(&delayed_refs->lock);
+}
+
+int btrfs_check_delayed_seq(struct btrfs_delayed_ref_root *delayed_refs,
+   u64 seq)
+{
+   struct seq_list *elem;
+
+   assert_spin_locked(&delayed_refs->lock);
+   if (list_empty(&delayed_refs->seq_head))
+   return 0;
+
+   elem = list_first_entry(&delayed_refs->seq_head, struct seq_list, list);
+   if (seq >= elem->seq)
+   return 1;
+   return 0;
+}
+
 int btrfs_find_ref_cluster(struct btrfs_trans_handle *trans,
   struct list_head *cluster, u64 start)
 {
@@ -438,6 +481,7 @@ static noinline int add_delayed_ref_head(struct 
btrfs_fs_info *fs_info,
ref->action  = 0;
ref->is_head = 1;
ref->in_tree = 1;
+   ref->seq = 0;
 
head_ref = btrfs_delayed_node_to_head(ref);
head_ref->must_insert_reserved = must_insert_reserved;
@@ -474,11 +518,12 @@ static noinline int add_delayed_tree_ref(struct 
btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_node *ref,
 u64 bytenr, u64 num_bytes, u64 parent,
 u64 ref_root, int level, int action,
-int for_cow)
+int for_cow, struct seq_list *seq_elem)
 {
struct btrfs_delayed_ref_node *existing;
struct btrfs_delayed_tree_ref *full_ref;
struct btrfs_delayed_ref_root *delayed_refs;
+   u64 seq = 0;
 
if (action == BTRFS_ADD_DELAYED_EXTENT)
action = BTRFS_ADD_DELAYED_REF;
@@ -494,6 +539,10 @@ static noinline int add_delayed_tree_ref(struct 
btrfs_fs_info *fs_info,
ref->is_head = 0;
ref->in_tree = 1;
 
+   if (fs_info->quota_enabled && !for_cow && is_fstree(ref_root))
+   seq = get_delayed_seq(delayed_refs, seq_elem);
+   ref->seq = seq;
+
full_ref = btrfs_delayed_node_to_tree_ref(ref);
full_ref->parent = parent;
full_ref->root = ref_root;
@@ -529,11 +578,13 @@ static noinline int add_delayed_data_ref(struct 
btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_node *ref,
 u64 bytenr, u64 num_bytes, u64 parent,
 u64 ref_root, u64 owner, u64 offset,
-int action, int for_cow)
+int action, int for_cow,
+struct seq_list *seq_elem)
 {
struct btrfs_delayed_ref_node *exis

[PATCH v0 05/18] btrfs: add helper for tree enumeration

2011-10-06 Thread Arne Jansen
Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/ctree.c |   72 ++
 fs/btrfs/ctree.h |   10 +++
 2 files changed, 82 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 964ac9a..db79c99 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1862,6 +1862,78 @@ done:
 }
 
 /*
+ * helper to use instead of search slot if no exact match is needed but
+ * instead the next or previous item should be returned.
+ * When find_higher is true, the next higher item is returned, the next lower
+ * otherwise.
+ * When return_any and find_higher are both true, and no higher item is found,
+ * return the next lower instead.
+ * When return_any is true and find_higher is false, and no lower item is 
found,
+ * return the next higher instead.
+ * It returns 0 if any item is found, 1 if none is found (tree empty), and
+ * < 0 on error
+ */
+int btrfs_search_slot_for_read(struct btrfs_root *root,
+  struct btrfs_key *key, struct btrfs_path *p,
+  int find_higher, int return_any)
+{
+   int ret;
+   struct extent_buffer *leaf;
+
+again:
+   ret = btrfs_search_slot(NULL, root, key, p, 0, 0);
+   if (ret <= 0)
+   return ret;
+   /*
+* a return value of 1 means the path is at the position where the
+* item should be inserted. Normally this is the next bigger item,
+* but in case the previous item is the last in a leaf, path points
+* to the first free slot in the previous leaf, i.e. at an invalid
+* item.
+*/
+   leaf = p->nodes[0];
+
+   if (find_higher) {
+   if (p->slots[0] >= btrfs_header_nritems(leaf)) {
+   ret = btrfs_next_leaf(root, p);
+   if (ret <= 0)
+   return ret;
+   if (!return_any)
+   return 1;
+   /*
+* no higher item found, return the next
+* lower instead
+*/
+   return_any = 0;
+   find_higher = 0;
+   btrfs_release_path(p);
+   goto again;
+   }
+   } else {
+   if (p->slots[0] >= btrfs_header_nritems(leaf)) {
+   /* we're sitting on an invalid slot */
+   if (p->slots[0] == 0) {
+   ret = btrfs_prev_leaf(root, p);
+   if (ret <= 0)
+   return ret;
+   if (!return_any)
+   return 1;
+   /*
+* no lower item found, return the next
+* higher instead
+*/
+   return_any = 0;
+   find_higher = 1;
+   btrfs_release_path(p);
+   goto again;
+   }
+   --p->slots[0];
+   }
+   }
+   return 0;
+}
+
+/*
  * adjust the pointers going up the tree, starting at level
  * making sure the right key of each node is points to 'key'.
  * This is used after shifting pointers to the left, so it stops
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 7a1ca9c..09c58e5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2458,6 +2458,9 @@ int btrfs_duplicate_item(struct btrfs_trans_handle *trans,
 int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root
  *root, struct btrfs_key *key, struct btrfs_path *p, int
  ins_len, int cow);
+int btrfs_search_slot_for_read(struct btrfs_root *root,
+  struct btrfs_key *key, struct btrfs_path *p,
+  int find_higher, int return_any);
 int btrfs_realloc_node(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, struct extent_buffer *parent,
   int start_slot, int cache_only, u64 *last_ret,
@@ -2500,6 +2503,13 @@ static inline int btrfs_insert_empty_item(struct 
btrfs_trans_handle *trans,
 }
 
 int btrfs_next_leaf(struct btrfs_root *root, struct btrfs_path *path);
+static inline int btrfs_next_item(struct btrfs_root *root, struct btrfs_path 
*p)
+{
+   ++p->slots[0];
+   if (p->slots[0] >= btrfs_header_nritems(p->nodes[0]))
+   return btrfs_next_leaf(root, p);
+   return 0;
+}
 int btrfs_prev_leaf(struct btrfs_root *root, struct btrfs_path *path);
 int btrfs_leaf_fr

[PATCH v0 08/18] btrfs: added helper to create new trees

2011-10-06 Thread Arne Jansen
This creates a brand new tree. Will be used to create
the quota tree.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/disk-io.c |   76 
 fs/btrfs/disk-io.h |3 ++
 2 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 074a539..672747d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1145,6 +1145,82 @@ static int find_and_setup_root(struct btrfs_root 
*tree_root,
return 0;
 }
 
+struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info,
+u64 objectid)
+{
+   struct extent_buffer *leaf;
+   struct btrfs_root *tree_root = fs_info->tree_root;
+   struct btrfs_root *root;
+   struct btrfs_key key;
+   int ret = 0;
+   u64 bytenr;
+
+   root = kzalloc(sizeof(struct btrfs_root), GFP_NOFS);
+   if (!root)
+   return ERR_PTR(-ENOMEM);
+
+   __setup_root(tree_root->nodesize, tree_root->leafsize,
+tree_root->sectorsize, tree_root->stripesize,
+root, fs_info, objectid);
+   root->root_key.objectid = objectid;
+   root->root_key.type = BTRFS_ROOT_ITEM_KEY;
+   root->root_key.offset = 0;
+
+   leaf = btrfs_alloc_free_block(trans, root, root->leafsize,
+ 0, objectid, NULL, 0, 0, 0);
+   if (IS_ERR(leaf)) {
+   ret = PTR_ERR(leaf);
+   goto fail;
+   }
+
+   bytenr = leaf->start;
+   memset_extent_buffer(leaf, 0, 0, sizeof(struct btrfs_header));
+   btrfs_set_header_bytenr(leaf, leaf->start);
+   btrfs_set_header_generation(leaf, trans->transid);
+   btrfs_set_header_backref_rev(leaf, BTRFS_MIXED_BACKREF_REV);
+   btrfs_set_header_owner(leaf, objectid);
+   root->node = leaf;
+
+   write_extent_buffer(leaf, fs_info->fsid,
+   (unsigned long)btrfs_header_fsid(leaf),
+   BTRFS_FSID_SIZE);
+   write_extent_buffer(leaf, fs_info->chunk_tree_uuid,
+   (unsigned long)btrfs_header_chunk_tree_uuid(leaf),
+   BTRFS_UUID_SIZE);
+   btrfs_mark_buffer_dirty(leaf);
+
+   root->commit_root = btrfs_root_node(root);
+   root->track_dirty = 1;
+
+
+   root->root_item.flags = 0;
+   root->root_item.byte_limit = 0;
+   btrfs_set_root_bytenr(&root->root_item, leaf->start);
+   btrfs_set_root_generation(&root->root_item, trans->transid);
+   btrfs_set_root_level(&root->root_item, 0);
+   btrfs_set_root_refs(&root->root_item, 1);
+   btrfs_set_root_used(&root->root_item, leaf->len);
+   btrfs_set_root_last_snapshot(&root->root_item, 0);
+   btrfs_set_root_dirid(&root->root_item, 0);
+   root->root_item.drop_level = 0;
+
+   key.objectid = objectid;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = 0;
+   ret = btrfs_insert_root(trans, tree_root, &key, &root->root_item);
+   if (ret)
+   goto fail;
+
+   btrfs_tree_unlock(leaf);
+
+fail:
+   if (ret)
+   return ERR_PTR(ret);
+
+   return root;
+}
+
 static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info)
 {
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 09a164d..b166beb 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -83,6 +83,9 @@ int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info);
 int btrfs_add_log_tree(struct btrfs_trans_handle *trans,
   struct btrfs_root *root);
+struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info,
+u64 objectid);
 int btree_lock_page_hook(struct page *page);
 
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v0 12/18] btrfs: put back delayed refs that are too new

2011-10-06 Thread Arne Jansen
When processing a delayed ref, first check if there are still
old refs in the process of being added. If so, put this ref
back to the tree.
To avoid looping on this ref, choose a newer one in the next
loop. btrfs_find_ref_cluster has to take care of that.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/delayed-ref.c |   43 +--
 fs/btrfs/extent-tree.c |   27 ++-
 2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2c8544b..d6f934f 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -160,16 +160,22 @@ static struct btrfs_delayed_ref_node *tree_insert(struct 
rb_root *root,
 
 /*
  * find an head entry based on bytenr. This returns the delayed ref
- * head if it was able to find one, or NULL if nothing was in that spot
+ * head if it was able to find one, or NULL if nothing was in that spot.
+ * If return_bigger is given, the next bigger entry is returned if no exact
+ * match is found.
  */
 static struct btrfs_delayed_ref_node *find_ref_head(struct rb_root *root,
  u64 bytenr,
- struct btrfs_delayed_ref_node **last)
+ struct btrfs_delayed_ref_node **last,
+ int return_bigger)
 {
-   struct rb_node *n = root->rb_node;
+   struct rb_node *n;
struct btrfs_delayed_ref_node *entry;
-   int cmp;
+   int cmp = 0;
 
+again:
+   n = root->rb_node;
+   entry = NULL;
while (n) {
entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
WARN_ON(!entry->in_tree);
@@ -192,6 +198,19 @@ static struct btrfs_delayed_ref_node *find_ref_head(struct 
rb_root *root,
else
return entry;
}
+   if (entry && return_bigger) {
+   if (cmp > 0) {
+   n = rb_next(&entry->rb_node);
+   if (!n)
+   n = rb_first(root);
+   entry = rb_entry(n, struct btrfs_delayed_ref_node,
+rb_node);
+   bytenr = entry->bytenr;
+   return_bigger = 0;
+   goto again;
+   }
+   return entry;
+   }
return NULL;
 }
 
@@ -266,20 +285,8 @@ int btrfs_find_ref_cluster(struct btrfs_trans_handle 
*trans,
node = rb_first(&delayed_refs->root);
} else {
ref = NULL;
-   find_ref_head(&delayed_refs->root, start, &ref);
+   find_ref_head(&delayed_refs->root, start+1, &ref, 1);
if (ref) {
-   struct btrfs_delayed_ref_node *tmp;
-
-   node = rb_prev(&ref->rb_node);
-   while (node) {
-   tmp = rb_entry(node,
-  struct btrfs_delayed_ref_node,
-  rb_node);
-   if (tmp->bytenr < start)
-   break;
-   ref = tmp;
-   node = rb_prev(&ref->rb_node);
-   }
node = &ref->rb_node;
} else
node = rb_first(&delayed_refs->root);
@@ -777,7 +784,7 @@ btrfs_find_delayed_ref_head(struct btrfs_trans_handle 
*trans, u64 bytenr)
struct btrfs_delayed_ref_root *delayed_refs;
 
delayed_refs = &trans->transaction->delayed_refs;
-   ref = find_ref_head(&delayed_refs->root, bytenr, NULL);
+   ref = find_ref_head(&delayed_refs->root, bytenr, NULL, 0);
if (ref)
return btrfs_delayed_node_to_head(ref);
return NULL;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e3b69ce..7fb9650 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2180,6 +2180,28 @@ static noinline int run_clustered_refs(struct 
btrfs_trans_handle *trans,
}
 
/*
+* locked_ref is the head node, so we have to go one
+* node back for any delayed ref updates
+*/
+   ref = select_delayed_ref(locked_ref);
+
+   if (ref && ref->seq &&
+   btrfs_check_delayed_seq(delayed_refs, ref->seq)) {
+   /*
+* there are still refs with lower seq numbers in the
+* process of being added. Don't run this ref yet.
+*/
+   list_del_init(&locked_ref->cluster);
+   mutex_unlock(&locked_ref->mutex);
+   locked_ref = NULL;
+   delayed_refs->num_heads_ready++;
+   spin_unlock(&delayed_refs->lock);
+  

[PATCH v0 18/18] btrfs: add qgroup inheritance

2011-10-06 Thread Arne Jansen
When creating a subvolume or snapshot, it is necessary
to initialize the qgroup account with a copy of some
other (tracking) qgroup. This patch adds parameters
to the ioctls to pass the information from which qgroup
to inherit.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/ioctl.c   |   59 ++-
 fs/btrfs/ioctl.h   |   11 -
 fs/btrfs/transaction.c |8 ++
 fs/btrfs/transaction.h |1 +
 4 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index f46bc35..54fefef 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -315,7 +315,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, 
void __user *arg)
 static noinline int create_subvol(struct btrfs_root *root,
  struct dentry *dentry,
  char *name, int namelen,
- u64 *async_transid)
+ u64 *async_transid,
+ struct btrfs_qgroup_inherit **inherit)
 {
struct btrfs_trans_handle *trans;
struct btrfs_key key;
@@ -347,6 +348,11 @@ static noinline int create_subvol(struct btrfs_root *root,
if (IS_ERR(trans))
return PTR_ERR(trans);
 
+   ret = btrfs_qgroup_inherit(trans, root->fs_info, 0, objectid,
+  inherit ? *inherit : NULL);
+   if (ret)
+   goto fail;
+
leaf = btrfs_alloc_free_block(trans, root, root->leafsize,
  0, objectid, NULL, 0, 0, 0);
if (IS_ERR(leaf)) {
@@ -448,7 +454,7 @@ fail:
 
 static int create_snapshot(struct btrfs_root *root, struct dentry *dentry,
   char *name, int namelen, u64 *async_transid,
-  bool readonly)
+  bool readonly, struct btrfs_qgroup_inherit **inherit)
 {
struct inode *inode;
struct btrfs_pending_snapshot *pending_snapshot;
@@ -466,6 +472,10 @@ static int create_snapshot(struct btrfs_root *root, struct 
dentry *dentry,
pending_snapshot->dentry = dentry;
pending_snapshot->root = root;
pending_snapshot->readonly = readonly;
+   if (inherit) {
+   pending_snapshot->inherit = *inherit;
+   *inherit = NULL;/* take responsibility to free it */
+   }
 
trans = btrfs_start_transaction(root->fs_info->extent_root, 5);
if (IS_ERR(trans)) {
@@ -599,7 +609,8 @@ static inline int btrfs_may_create(struct inode *dir, 
struct dentry *child)
 static noinline int btrfs_mksubvol(struct path *parent,
   char *name, int namelen,
   struct btrfs_root *snap_src,
-  u64 *async_transid, bool readonly)
+  u64 *async_transid, bool readonly,
+  struct btrfs_qgroup_inherit **inherit)
 {
struct inode *dir  = parent->dentry->d_inode;
struct dentry *dentry;
@@ -630,11 +641,11 @@ static noinline int btrfs_mksubvol(struct path *parent,
goto out_up_read;
 
if (snap_src) {
-   error = create_snapshot(snap_src, dentry,
-   name, namelen, async_transid, readonly);
+   error = create_snapshot(snap_src, dentry, name, namelen,
+   async_transid, readonly, inherit);
} else {
error = create_subvol(BTRFS_I(dir)->root, dentry,
- name, namelen, async_transid);
+ name, namelen, async_transid, inherit);
}
if (!error)
fsnotify_mkdir(dir, dentry);
@@ -1253,11 +1264,9 @@ out_unlock:
 }
 
 static noinline int btrfs_ioctl_snap_create_transid(struct file *file,
-   char *name,
-   unsigned long fd,
-   int subvol,
-   u64 *transid,
-   bool readonly)
+   char *name, unsigned long fd, int subvol,
+   u64 *transid, bool readonly,
+   struct btrfs_qgroup_inherit **inherit)
 {
struct btrfs_root *root = BTRFS_I(fdentry(file)->d_inode)->root;
struct file *src_file;
@@ -1275,7 +1284,7 @@ static noinline int 
btrfs_ioctl_snap_create_transid(struct file *file,
 
if (subvol) {
ret = btrfs_mksubvol(&file->f_path, name, namelen,
-NULL, transid, readonly);
+NULL, transid, readonly, inherit);
} else {
struct inode *src_inode;
src_file =

[PATCH v0 06/18] btrfs: check the root passed to btrfs_end_transaction

2011-10-06 Thread Arne Jansen
This patch only add a consistancy check to validate that the
same root is passed to start_transaction and end_transaction.
Subvolume quota depends on this.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/transaction.c |6 ++
 fs/btrfs/transaction.h |6 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b5ee16b..d7f32da 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -306,6 +306,7 @@ again:
h->transaction = cur_trans;
h->blocks_used = 0;
h->bytes_reserved = 0;
+   h->root = root;
h->delayed_ref_updates = 0;
h->use_count = 1;
h->block_rsv = NULL;
@@ -453,6 +454,11 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
return 0;
}
 
+   /*
+* the same root has to be passed to start_transaction and
+* end_transaction. Subvolume quota depends on this.
+*/
+   WARN_ON(trans->root != root);
while (count < 4) {
unsigned long cur = trans->delayed_ref_updates;
trans->delayed_ref_updates = 0;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 02564e6..b120126 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -55,6 +55,12 @@ struct btrfs_trans_handle {
struct btrfs_transaction *transaction;
struct btrfs_block_rsv *block_rsv;
struct btrfs_block_rsv *orig_rsv;
+   /*
+* this root is only needed to validate that the root passed to
+* start_transaction is the same as the one passed to end_transaction.
+* Subvolume quota depends on this
+*/
+   struct btrfs_root *root;
 };
 
 struct btrfs_pending_snapshot {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v0 13/18] btrfs: qgroup implementation and prototypes

2011-10-06 Thread Arne Jansen
Signed-off-by: Arne Jansen 
---
 fs/btrfs/Makefile |2 +-
 fs/btrfs/ctree.h  |   32 +
 fs/btrfs/ioctl.h  |   24 +
 fs/btrfs/qgroup.c | 2151 +
 4 files changed, 2208 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 9ff560b..7738ecc 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,6 +8,6 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-  ulist.o
+  qgroup.o ulist.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 49f97d8..1deb6b8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2870,6 +2870,38 @@ int btrfs_scrub_cancel_devid(struct btrfs_root *root, 
u64 devid);
 int btrfs_scrub_progress(struct btrfs_root *root, u64 devid,
 struct btrfs_scrub_progress *progress);
 
+/* quota.c */
+int btrfs_quota_enable(struct btrfs_trans_handle *trans,
+  struct btrfs_fs_info *fs_info);
+int btrfs_quota_disable(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info);
+int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info, u64 src, u64 dst);
+int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info, u64 src, u64 dst);
+int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info, u64 qgroupid,
+   char *name);
+int btrfs_remove_qgroup(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info, u64 qgroupid);
+int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
+  struct btrfs_fs_info *fs_info, u64 qgroupid,
+  struct btrfs_qgroup_limit *limit);
+int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info);
+void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info);
+struct btrfs_delayed_extent_op;
+int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info,
+   struct btrfs_delayed_ref_node *node,
+   struct btrfs_delayed_extent_op *extent_op);
+int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
+struct btrfs_qgroup_inherit *inherit);
+int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
+void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
 static inline int is_fstree(u64 rootid)
 {
if (rootid == BTRFS_FS_TREE_OBJECTID ||
diff --git a/fs/btrfs/ioctl.h b/fs/btrfs/ioctl.h
index ad1ea78..36d14a4 100644
--- a/fs/btrfs/ioctl.h
+++ b/fs/btrfs/ioctl.h
@@ -35,6 +35,30 @@ struct btrfs_ioctl_vol_args {
 #define BTRFS_FSID_SIZE 16
 #define BTRFS_UUID_SIZE 16
 
+#define BTRFS_QGROUP_INHERIT_SET_LIMITS(1ULL << 0)
+
+struct btrfs_qgroup_limit {
+   __u64   flags;
+   __u64   max_rfer;
+   __u64   max_excl;
+   __u64   rsv_rfer;
+   __u64   rsv_excl;
+};
+
+struct btrfs_qgroup_inherit {
+   __u64   flags;
+   __u64   num_qgroups;
+   __u64   num_ref_copies;
+   __u64   num_excl_copies;
+   struct btrfs_qgroup_limit lim;
+   __u64   qgroups[0];
+};
+
+struct btrfs_ioctl_qgroup_limit_args {
+   __u64   qgroupid;
+   struct btrfs_qgroup_limit lim;
+};
+
 #define BTRFS_SUBVOL_NAME_MAX 4039
 struct btrfs_ioctl_vol_args_v2 {
__s64 fd;
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
new file mode 100644
index 000..0140aef
--- /dev/null
+++ b/fs/btrfs/qgroup.c
@@ -0,0 +1,2151 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+

[PATCH v0 07/18] btrfs: generic data structure to build unique lists

2011-10-06 Thread Arne Jansen
ulist is a generic data structures to hold a collection of unique u64
values. The only operations it supports is adding to the list and
enumerating it.
It is possible to store an auxiliary value along with the key.
The implementation is preliminary and can probably be sped up
significantly.
It is used by subvolume quota to translate recursions into iterative
loops.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/Makefile |3 +-
 fs/btrfs/ulist.c  |  122 +
 fs/btrfs/ulist.h  |   59 +
 3 files changed, 183 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 40e6ac0..9ff560b 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -7,6 +7,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
-  compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o
+  compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
+  ulist.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
diff --git a/fs/btrfs/ulist.c b/fs/btrfs/ulist.c
new file mode 100644
index 000..756a937
--- /dev/null
+++ b/fs/btrfs/ulist.c
@@ -0,0 +1,122 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include 
+#include 
+#include 
+
+#include "ulist.h"
+
+void ulist_init(struct ulist *ulist, unsigned long gfp_mask)
+{
+   ulist->nnodes = 0;
+   ulist->gfp_mask = gfp_mask;
+   ulist->nodes = ulist->int_nodes;
+   ulist->nodes_alloced = ULIST_SIZE;
+}
+
+void ulist_fini(struct ulist *ulist)
+{
+   if (ulist->nodes_alloced > ULIST_SIZE)
+   kfree(ulist->nodes);
+}
+
+void ulist_reinit(struct ulist *ulist)
+{
+   ulist_fini(ulist);
+   ulist_init(ulist, ulist->gfp_mask);
+}
+
+struct ulist *ulist_alloc(unsigned long gfp_mask)
+{
+   struct ulist *ulist = kmalloc(sizeof(*ulist), gfp_mask);
+
+   if (!ulist)
+   return NULL;
+
+   ulist_init(ulist, gfp_mask);
+
+   return ulist;
+}
+
+void ulist_free(struct ulist *ulist)
+{
+   if (!ulist)
+   return;
+   ulist_fini(ulist);
+   kfree(ulist);
+}
+
+int ulist_add(struct ulist *ulist, u64 val, unsigned long aux)
+{
+   u64 i;
+
+   for (i = 0; i < ulist->nnodes; ++i) {
+   if (ulist->nodes[i].val == val)
+   return 0;
+   }
+
+   if (ulist->nnodes > ulist->nodes_alloced) {
+   u64 new_alloced = ulist->nodes_alloced + 128;
+   struct ulist_node *new_nodes = kmalloc(sizeof(*new_nodes) *
+  new_alloced, ulist->gfp_mask);
+
+   if (!new_nodes)
+   return -ENOMEM;
+   memcpy(new_nodes, ulist->nodes,
+  sizeof(*new_nodes) * ulist->nnodes);
+   if (ulist->nodes_alloced > ULIST_SIZE)
+   kfree(ulist->nodes);
+   ulist->nodes = new_nodes;
+   ulist->nodes_alloced = new_alloced;
+   }
+   ulist->nodes[ulist->nnodes].val = val;
+   ulist->nodes[ulist->nnodes].aux = aux;
+   ulist->nodes[ulist->nnodes].next = ulist->nnodes + 1;
+   ++ulist->nnodes;
+
+   return 1;
+}
+
+struct ulist_node *ulist_next(struct ulist *ulist, struct ulist_node *prev)
+{
+   if (ulist->nnodes == 0)
+   return NULL;
+
+   if (!prev)
+   return &ulist->nodes[0];
+
+   if (prev->next < 0 || prev->next >= ulist->nnodes)
+   return NULL;
+
+   return &ulist->nodes[prev->next];
+}
+
+int ulist_merge(struct ulist *dst, struct ulist *src)
+{
+   struct ulist_node *node = NULL;
+   int ret;
+
+   while ((node = ulist_next(src, node))) {
+   ret = ulist_add(dst, node->val, node->aux);
+   if (ret)
+   return ret;
+   }
+
+   return 0;
+}
diff --git a/fs/btrfs/ulist.h b/fs/btrfs/ulist.h
new file mode 100644
index 000..2eb7e9d
--- /dev/null
+++ b/fs/btrfs/ulist.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2011 STRATO.  All rights reserved.

[PATCH v0 00/18] btfs: Subvolume Quota Groups

2011-10-06 Thread Arne Jansen
This is a first draft of a subvolume quota implementation. It is possible
to limit subvolumes and any group of subvolumes and also to track the amount
of space that will get freed when deleting snapshots.

The current version is functionally incomplete, with the main missing feature
being the initial scan and rescan of an existing filesystem.

I put some effort into writing an introduction into the concepts and
implementation which can be found at

http://sensille.com/qgroups.pdf

The purpose of getting it out in this early stage is to get as much input
as possible, in regard to concepts, implementation and testing.
The accompanying user mode parts will take some additional days to gather.

Thanks,
Arne

Arne Jansen (18):
  btrfs: mark delayed refs as for cow
  btrfs: always save ref_root in delayed refs
  btrfs: add nested locking mode for paths
  btrfs: qgroup on-disk format
  btrfs: add helper for tree enumeration
  btrfs: check the root passed to btrfs_end_transaction
  btrfs: generic data structure to build unique lists
  btrfs: added helper to create new trees
  btrfs: qgroup state and initialization
  btrfs: Test code to change the order of delayed-ref processing
  btrfs: add sequence numbers to delayed refs
  btrfs: put back delayed refs that are too new
  btrfs: qgroup implementation and prototypes
  btrfs: quota tree support and startup
  btrfs: hooks for qgroup to record delayed refs
  btrfs: hooks to reserve qgroup space
  btrfs: add qgroup ioctls
  btrfs: add qgroup inheritance

 fs/btrfs/Makefile  |3 +-
 fs/btrfs/ctree.c   |  114 +++-
 fs/btrfs/ctree.h   |  224 +-
 fs/btrfs/delayed-ref.c |  188 --
 fs/btrfs/delayed-ref.h |   48 +-
 fs/btrfs/disk-io.c |  130 +++-
 fs/btrfs/disk-io.h |3 +
 fs/btrfs/extent-tree.c |  185 -
 fs/btrfs/extent_io.c   |1 +
 fs/btrfs/extent_io.h   |2 +
 fs/btrfs/file.c|   10 +-
 fs/btrfs/inode.c   |2 +-
 fs/btrfs/ioctl.c   |  247 +-
 fs/btrfs/ioctl.h   |   62 ++-
 fs/btrfs/locking.c |   51 ++-
 fs/btrfs/locking.h |2 +-
 fs/btrfs/qgroup.c  | 2151 
 fs/btrfs/relocation.c  |   18 +-
 fs/btrfs/transaction.c |   45 +-
 fs/btrfs/transaction.h |8 +
 fs/btrfs/tree-log.c|2 +-
 fs/btrfs/ulist.c   |  122 +++
 fs/btrfs/ulist.h   |   59 ++
 23 files changed, 3501 insertions(+), 176 deletions(-)
 create mode 100644 fs/btrfs/qgroup.c
 create mode 100644 fs/btrfs/ulist.c
 create mode 100644 fs/btrfs/ulist.h

-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v0 10/18] btrfs: Test code to change the order of delayed-ref processing

2011-10-06 Thread Arne Jansen
Normally delayed refs get processed in ascending bytenr order. This
correlates in most cases to the order added. To expose dependencies
on this order, we start to process the tree in the middle instead of
the beginning.
This code is only effective when SCRAMBLE_DELAYED_REFS is defined.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/extent-tree.c |   50 
 1 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 29ac93e..e3b69ce 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -33,6 +33,8 @@
 #include "locking.h"
 #include "free-space-cache.h"
 
+#undef SCRAMBLE_DELAYED_REFS
+
 /* control flags for do_chunk_alloc's force field
  * CHUNK_ALLOC_NO_FORCE means to only allocate a chunk
  * if we really need one.
@@ -2241,6 +2243,49 @@ static noinline int run_clustered_refs(struct 
btrfs_trans_handle *trans,
return count;
 }
 
+#ifdef SCRAMBLE_DELAYED_REFS
+/*
+ * Normally delayed refs get processed in ascending bytenr order. This
+ * correlates in most cases to the order added. To expose dependencies on this
+ * order, we start to process the tree in the middle instead of the beginning
+ */
+static u64 find_middle(struct rb_root *root)
+{
+   struct rb_node *n = root->rb_node;
+   struct btrfs_delayed_ref_node *entry;
+   int alt = 1;
+   u64 middle;
+   u64 first = 0, last = 0;
+
+   n = rb_first(root);
+   if (n) {
+   entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+   first = entry->bytenr;
+   }
+   n = rb_last(root);
+   if (n) {
+   entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+   last = entry->bytenr;
+   }
+   n = root->rb_node;
+
+   while (n) {
+   entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+   WARN_ON(!entry->in_tree);
+
+   middle = entry->bytenr;
+
+   if (alt)
+   n = n->rb_left;
+   else
+   n = n->rb_right;
+
+   alt = 1 - alt;
+   }
+   return middle;
+}
+#endif
+
 /*
  * this starts processing the delayed reference count updates and
  * extent insertions we have queued up so far.  count can be
@@ -2266,6 +2311,11 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle 
*trans,
INIT_LIST_HEAD(&cluster);
 again:
spin_lock(&delayed_refs->lock);
+
+#ifdef SCRAMBLE_DELAYED_REFS
+delayed_refs->run_delayed_start = find_middle(&delayed_refs->root);
+#endif
+
if (count == 0) {
count = delayed_refs->num_entries * 2;
run_most = 1;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v0 04/18] btrfs: qgroup on-disk format

2011-10-06 Thread Arne Jansen
Not all features are in use by the current version
and thus may change in the future.

Signed-off-by: Arne Jansen 
---
 fs/btrfs/ctree.h |  136 ++
 1 files changed, 136 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2765b8d..7a1ca9c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -85,6 +85,9 @@ struct btrfs_ordered_sum;
 /* holds checksums of all the data extents */
 #define BTRFS_CSUM_TREE_OBJECTID 7ULL
 
+/* holds quota configuration and tracking */
+#define BTRFS_QUOTA_TREE_OBJECTID 8ULL
+
 /* orhpan objectid for tracking unlinked/truncated files */
 #define BTRFS_ORPHAN_OBJECTID -5ULL
 
@@ -724,6 +727,72 @@ struct btrfs_block_group_item {
__le64 flags;
 } __attribute__ ((__packed__));
 
+/*
+ * is subvolume quota turned on?
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL << 0)
+/*
+ * SCANNING is set during the initialization phase
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_SCANNING  (1ULL << 1)
+/*
+ * Some qgroup entries are known to be out of date,
+ * either because the configuration has changed in a way that
+ * makes a rescan necessary, or because the fs has been mounted
+ * with a non-qgroup-aware version.
+ * Turning qouta off and on again makes it inconsistent, too.
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT  (1ULL << 2)
+
+#define BTRFS_QGROUP_STATUS_VERSION1
+
+struct btrfs_qgroup_status_item {
+   __le64 version;
+   /*
+* the generation is updated during every commit. As older
+* versions of btrfs are not aware of qgroups, it will be
+* possible to detect inconsistencies by checking the
+* generation on mount time
+*/
+   __le64 generation;
+
+   /* flag definitions see above */
+   __le64 flags;
+
+   /*
+* only used during scanning to record the progress
+* of the scan. It contains a logical address
+*/
+   __le64 scan;
+} __attribute__ ((__packed__));
+
+struct btrfs_qgroup_info_item {
+   __le64 generation;
+   __le64 rfer;
+   __le64 rfer_cmpr;
+   __le64 excl;
+   __le64 excl_cmpr;
+} __attribute__ ((__packed__));
+
+/* flags definition for qgroup limits */
+#define BTRFS_QGROUP_LIMIT_MAX_RFER(1ULL << 0)
+#define BTRFS_QGROUP_LIMIT_MAX_EXCL(1ULL << 1)
+#define BTRFS_QGROUP_LIMIT_RSV_RFER(1ULL << 2)
+#define BTRFS_QGROUP_LIMIT_RSV_EXCL(1ULL << 3)
+#define BTRFS_QGROUP_LIMIT_RFER_CMPR   (1ULL << 4)
+#define BTRFS_QGROUP_LIMIT_EXCL_CMPR   (1ULL << 5)
+
+struct btrfs_qgroup_limit_item {
+   /*
+* only updated when any of the other values change
+*/
+   __le64 flags;
+   __le64 max_rfer;
+   __le64 max_excl;
+   __le64 rsv_rfer;
+   __le64 rsv_excl;
+} __attribute__ ((__packed__));
+
 struct btrfs_space_info {
u64 flags;
 
@@ -1336,6 +1405,30 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_CHUNK_ITEM_KEY   228
 
 /*
+ * Records the overall state of the qgroups.
+ * There's only one instance of this key present,
+ * (0, BTRFS_QGROUP_STATUS_KEY, 0)
+ */
+#define BTRFS_QGROUP_STATUS_KEY 240
+/*
+ * Records the currently used space of the qgroup.
+ * One key per qgroup, (0, BTRFS_QGROUP_INFO_KEY, qgroupid).
+ */
+#define BTRFS_QGROUP_INFO_KEY   242
+/*
+ * Contains the user configured limits for the qgroup.
+ * One key per qgroup, (0, BTRFS_QGROUP_LIMIT_KEY, qgroupid).
+ */
+#define BTRFS_QGROUP_LIMIT_KEY  244
+/*
+ * Records the child-parent relationship of qgroups. For
+ * each relation, 2 keys are present:
+ * (childid, BTRFS_QGROUP_RELATION_KEY, parentid)
+ * (parentid, BTRFS_QGROUP_RELATION_KEY, childid)
+ */
+#define BTRFS_QGROUP_RELATION_KEY   246
+
+/*
  * string items are for debugging.  They just store a short string of
  * data in the FS
  */
@@ -2098,6 +2191,49 @@ static inline u32 
btrfs_file_extent_inline_item_len(struct extent_buffer *eb,
return btrfs_item_size(eb, e) - offset;
 }
 
+/* btrfs_qgroup_status_item */
+BTRFS_SETGET_FUNCS(qgroup_status_generation, struct btrfs_qgroup_status_item,
+  generation, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item,
+  version, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
+  flags, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
+  scan, 64);
+
+/* btrfs_qgroup_info_item */
+BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
+  generation, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_rfer, struct btrfs_qgroup_info_item, rfer, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_rfer_cmpr, struct btrfs_qgroup_info_item,
+  rfer_cmpr, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_excl, struct btrfs_qgroup_info_item, excl, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_excl_cmpr, struct btrfs_qgroup_info_item,
+ 

Re: [PATCH] Btrfs: fix recursive auto-defrag

2011-10-06 Thread Chris Mason
On Thu, Oct 06, 2011 at 11:39:54AM +0800, Li Zefan wrote:
> Follow those steps:
> 
>   # mount -o autodefrag /dev/sda7 /mnt
>   # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1
>   # sync
>   # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc
> 
> and then it'll go into a loop: writeback -> defrag -> writeback ...
> 
> It's because writeback writes [8K, 200K] and then writes [0, 8K].
> 
> I tried to make writeback know if the pages are dirtied by defrag,
> but the patch was a bit intrusive. Here I simply set writeback_index
> when we defrag a file.

Really nice and small fix.  I'll definitely send this for 3.1

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Honest timeline for btrfsck

2011-10-06 Thread Jeff Putney
> No, in this case it means we're confident it will get rolled out.

On Aug 18th confidence was high enough to declare a possible release
that very day.  This confidence turned into 7 weeks of silence
followed by another 2 week estimate.

These confident declarations are why things like mniederle's
btrfs_rescue are considered 'interim' and not worth building on.  Had
this confidence of imminent release not been the prevalent message for
the last year, others would have stepped in to fill the void.

> I've given a number of hard dates recently and I'd prefer to show up
> with the code instead.  I don't think it makes sense to put a partial
> implementation out there, we'll just have a bunch of people reporting
> problems that I know exist.
>
> -chris
>

This strategy of 'Lone Wolfing it' has delayed the release by a year.
Either you are flying solo because you think that you can make more
meaningful progress without the involvement of the btrfs community, or
you are willing to forfeit the contributions of the community in order
to not have to listen to any complaints.

The other problem of this flying solo plan, is that you are making the
assumption that the problems you know about are more significant than
the problems you are unaware of and could be flushed out with more
eyes on the code.  The longer you delay the release of the source, the
longer it will be until confidence can be generated that major issues
have been resolved.

http://en.wikipedia.org/wiki/Release_early,_release_often
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix recursive auto-defrag

2011-10-06 Thread David Sterba
On Thu, Oct 06, 2011 at 11:39:54AM +0800, Li Zefan wrote:
> Follow those steps:
> 
>   # mount -o autodefrag /dev/sda7 /mnt
>   # dd if=/dev/urandom of=/mnt/tmp bs=200K count=1
>   # sync
>   # dd if=/dev/urandom of=/mnt/tmp bs=8K count=1 conv=notrunc
> 
> and then it'll go into a loop: writeback -> defrag -> writeback ...
> 
> It's because writeback writes [8K, 200K] and then writes [0, 8K].
> 
> I tried to make writeback know if the pages are dirtied by defrag,
> but the patch was a bit intrusive. Here I simply set writeback_index
> when we defrag a file.
> 
> Signed-off-by: Li Zefan 
Tested-by: David Sterba 

> ---
>  fs/btrfs/ioctl.c |7 +++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 970977a..7a10f94 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -1047,6 +1047,13 @@ int btrfs_defrag_file(struct inode *inode, struct file 
> *file,
>   if (!max_to_defrag)
>   max_to_defrag = last_index - 1;
>  
> + /*
> +  * make writeback starts from i, so the defrag range can be
> +  * written sequentially.
> +  */
> + if (i < inode->i_mapping->writeback_index)
> + inode->i_mapping->writeback_index = i;
> +
>   while (i <= last_index && defrag_count < max_to_defrag) {
>   /*
>* make sure we stop running if someone unmounts
> -- 1.7.3.1 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html