Re: btrfs fi df won't show total=

2012-07-12 Thread Andrei Popa
On Tue, 2012-07-10 at 00:52 +0300, Ilya Dryomov wrote: 
 mkfs creates those non-raid chunks for a pretty stupid reason, they
 really shouldn't be there if you create a raid10 fs.  Balance does the
 right thing and removes them.  Fixing this along with another mkfs
 annoyance related to this one is on my TODO list.

Hi Ilya,

I've also studied this bug and found where the problem is, if this will
help you:

In mkfs.c, make_root_dir function creates the first system and
metadata chunk:
ret = btrfs_make_block_group(trans, root, bytes_used,
 BTRFS_BLOCK_GROUP_SYSTEM,
 BTRFS_FIRST_CHUNK_TREE_OBJECTID,
 0, BTRFS_MKFS_SYSTEM_GROUP_SIZE);

ret = btrfs_alloc_chunk(trans,
root-fs_info-extent_root,
chunk_start, chunk_size,
BTRFS_BLOCK_GROUP_METADATA |
BTRFS_BLOCK_GROUP_DATA);
BUG_ON(ret);
ret = btrfs_make_block_group(trans, root, 0,
 BTRFS_BLOCK_GROUP_METADATA
|
 BTRFS_BLOCK_GROUP_DATA,

BTRFS_FIRST_CHUNK_TREE_OBJECTID,
 chunk_start, chunk_size);

And these chunks are created again in create_raid_groups function.
Both functions, make_root_dir and create_raid_groups are called from
main function.

Andrei 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs benchmark

2012-07-12 Thread Bernd Kohler
Hi @ all,

in the last edition of the german Linux-Magazin, there has been an
article about Linux filesystem performance test - the article is titled
Formel Storage - Linux-Dateisystem im Leistungstest.

The author of this article, Mr Michael Kromer provides his benchmark
script on link [1] and as btrfs is mentioned there, I felt free to
publish this here ;)

best

Bernd Kohler




[1]
http://medozas.de/2012-lm-fs-benchmark.tar.gz

-- 
UMIC - RWTH Aachen
http://www.umic.rwth-aachen.de

Mies-van-der-Rohe Str. 15
52074 Aachen

Tel.:   +49 241 80 20791
Fax:+49 241 80 22731
E-Mail: koh...@umic.rwth-aachen.de

The future started 6/6/12
~~
0100 1001 0101  0111 0110 0011 0110




smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH v1 07/15] Btrfs: qgroup state and initialization

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

Add state to fs_info.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/ctree.h   |   24 
 fs/btrfs/disk-io.c |7 +++
 2 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 27cf995..a5269d4 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1120,6 +1120,7 @@ struct btrfs_fs_info {
struct btrfs_root *dev_root;
struct btrfs_root *fs_root;
struct btrfs_root *csum_root;
+   struct btrfs_root *quota_root;
 
/* the log root tree is a directory of all the other log roots */
struct btrfs_root *log_root_tree;
@@ -1374,6 +1375,29 @@ struct btrfs_fs_info {
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
u32 check_integrity_print_mask;
 #endif
+   /*
+* quota information
+*/
+   unsigned int quota_enabled:1;
+
+   /*
+* quota_enabled only changes state after a commit. This holds the
+* next state.
+*/
+   unsigned int pending_quota_state:1;
+
+   /* is qgroup tracking in a consistent state? */
+   u64 qgroup_flags;
+
+   /* holds configuration and tracking. Protected by qgroup_lock */
+   struct rb_root qgroup_tree;
+   spinlock_t qgroup_lock;
+
+   /* list of dirty qgroups to be written at next commit */
+   struct list_head dirty_qgroups;
+
+   /* used by btrfs_qgroup_record_ref for an efficient tree traversal */
+   u64 qgroup_seq;
 
/* filesystem state */
u64 fs_state;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6fc243e..eca0549 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2110,6 +2110,13 @@ int open_ctree(struct super_block *sb,
init_rwsem(fs_info-cleanup_work_sem);
init_rwsem(fs_info-subvol_sem);
 
+   spin_lock_init(fs_info-qgroup_lock);
+   fs_info-qgroup_tree = RB_ROOT;
+   INIT_LIST_HEAD(fs_info-dirty_qgroups);
+   fs_info-qgroup_seq = 1;
+   fs_info-quota_enabled = 0;
+   fs_info-pending_quota_state = 0;
+
btrfs_init_free_cluster(fs_info-meta_alloc_cluster);
btrfs_init_free_cluster(fs_info-data_alloc_cluster);
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 01/15] Btrfs: fix buffer leak in btrfs_next_old_leaf

2012-07-12 Thread Jan Schmidt
When calling btrfs_next_old_leaf, we were leaking an extent buffer in the
rare case of using the deadlock avoidance code needed for the tree mod log.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/ctree.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 8206b39..67fe46f 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -5127,6 +5127,7 @@ again:
 * locked. To solve this situation, we give up
 * on our lock and cycle.
 */
+   free_extent_buffer(next);
btrfs_release_path(path);
cond_resched();
goto again;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 13/15] Btrfs: hooks to reserve qgroup space

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

Like block reserves, reserve a small piece of space on each
transaction start and for delalloc. These are the hooks that
can actually return EDQUOT to the user.
The amount of space reserved is tracked in the transaction
handle.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/extent-tree.c |   12 
 fs/btrfs/transaction.c |   16 
 fs/btrfs/transaction.h |1 +
 3 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index c08337a..2ce16f9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4565,6 +4565,13 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
csum_bytes = BTRFS_I(inode)-csum_bytes;
spin_unlock(BTRFS_I(inode)-lock);
 
+   if (root-fs_info-quota_enabled) {
+   ret = btrfs_qgroup_reserve(root, num_bytes +
+  nr_extents * root-leafsize);
+   if (ret)
+   return ret;
+   }
+
ret = reserve_metadata_bytes(root, block_rsv, to_reserve, flush);
if (ret) {
u64 to_free = 0;
@@ -4643,6 +4650,11 @@ void btrfs_delalloc_release_metadata(struct inode 
*inode, u64 num_bytes)
 
trace_btrfs_space_reservation(root-fs_info, delalloc,
  btrfs_ino(inode), to_free, 0);
+   if (root-fs_info-quota_enabled) {
+   btrfs_qgroup_free(root, num_bytes +
+   dropped * root-leafsize);
+   }
+
btrfs_block_rsv_release(root, root-fs_info-delalloc_block_rsv,
to_free);
 }
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 21c768c..f1e29fb 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -295,6 +295,7 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
struct btrfs_transaction *cur_trans;
u64 num_bytes = 0;
int ret;
+   u64 qgroup_reserved = 0;
 
if (root-fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR)
return ERR_PTR(-EROFS);
@@ -313,6 +314,14 @@ static struct btrfs_trans_handle *start_transaction(struct 
btrfs_root *root,
 * the appropriate flushing if need be.
 */
if (num_items  0  root != root-fs_info-chunk_root) {
+   if (root-fs_info-quota_enabled 
+   is_fstree(root-root_key.objectid)) {
+   qgroup_reserved = num_items * root-leafsize;
+   ret = btrfs_qgroup_reserve(root, qgroup_reserved);
+   if (ret)
+   return ERR_PTR(ret);
+   }
+
num_bytes = btrfs_calc_trans_metadata_size(root, num_items);
ret = btrfs_block_rsv_add(root,
  root-fs_info-trans_block_rsv,
@@ -351,6 +360,7 @@ again:
h-block_rsv = NULL;
h-orig_rsv = NULL;
h-aborted = 0;
+   h-qgroup_reserved = qgroup_reserved;
h-delayed_ref_elem.seq = 0;
INIT_LIST_HEAD(h-qgroup_ref_list);
 
@@ -524,6 +534,12 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
 * end_transaction. Subvolume quota depends on this.
 */
WARN_ON(trans-root != root);
+
+   if (trans-qgroup_reserved) {
+   btrfs_qgroup_free(root, trans-qgroup_reserved);
+   trans-qgroup_reserved = 0;
+   }
+
while (count  2) {
unsigned long cur = trans-delayed_ref_updates;
trans-delayed_ref_updates = 0;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 16ba008..2759e05 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -50,6 +50,7 @@ struct btrfs_transaction {
 struct btrfs_trans_handle {
u64 transid;
u64 bytes_reserved;
+   u64 qgroup_reserved;
unsigned long use_count;
unsigned long blocks_reserved;
unsigned long blocks_used;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 03/15] Btrfs: qgroup on-disk format

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

Not all features are in use by the current version
and thus may change in the future.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/ctree.h |  136 ++
 1 files changed, 136 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 8f8dc46..33088b0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -91,6 +91,9 @@ struct btrfs_ordered_sum;
 /* for storing balance parameters in the root tree */
 #define BTRFS_BALANCE_OBJECTID -4ULL
 
+/* holds quota configuration and tracking */
+#define BTRFS_QUOTA_TREE_OBJECTID 8ULL
+
 /* orhpan objectid for tracking unlinked/truncated files */
 #define BTRFS_ORPHAN_OBJECTID -5ULL
 
@@ -883,6 +886,72 @@ struct btrfs_block_group_item {
__le64 flags;
 } __attribute__ ((__packed__));
 
+/*
+ * is subvolume quota turned on?
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_ON(1ULL  0)
+/*
+ * SCANNING is set during the initialization phase
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_SCANNING  (1ULL  1)
+/*
+ * Some qgroup entries are known to be out of date,
+ * either because the configuration has changed in a way that
+ * makes a rescan necessary, or because the fs has been mounted
+ * with a non-qgroup-aware version.
+ * Turning qouta off and on again makes it inconsistent, too.
+ */
+#define BTRFS_QGROUP_STATUS_FLAG_INCONSISTENT  (1ULL  2)
+
+#define BTRFS_QGROUP_STATUS_VERSION1
+
+struct btrfs_qgroup_status_item {
+   __le64 version;
+   /*
+* the generation is updated during every commit. As older
+* versions of btrfs are not aware of qgroups, it will be
+* possible to detect inconsistencies by checking the
+* generation on mount time
+*/
+   __le64 generation;
+
+   /* flag definitions see above */
+   __le64 flags;
+
+   /*
+* only used during scanning to record the progress
+* of the scan. It contains a logical address
+*/
+   __le64 scan;
+} __attribute__ ((__packed__));
+
+struct btrfs_qgroup_info_item {
+   __le64 generation;
+   __le64 rfer;
+   __le64 rfer_cmpr;
+   __le64 excl;
+   __le64 excl_cmpr;
+} __attribute__ ((__packed__));
+
+/* flags definition for qgroup limits */
+#define BTRFS_QGROUP_LIMIT_MAX_RFER(1ULL  0)
+#define BTRFS_QGROUP_LIMIT_MAX_EXCL(1ULL  1)
+#define BTRFS_QGROUP_LIMIT_RSV_RFER(1ULL  2)
+#define BTRFS_QGROUP_LIMIT_RSV_EXCL(1ULL  3)
+#define BTRFS_QGROUP_LIMIT_RFER_CMPR   (1ULL  4)
+#define BTRFS_QGROUP_LIMIT_EXCL_CMPR   (1ULL  5)
+
+struct btrfs_qgroup_limit_item {
+   /*
+* only updated when any of the other values change
+*/
+   __le64 flags;
+   __le64 max_rfer;
+   __le64 max_excl;
+   __le64 rsv_rfer;
+   __le64 rsv_excl;
+} __attribute__ ((__packed__));
+
 struct btrfs_space_info {
u64 flags;
 
@@ -1534,6 +1603,30 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_DEV_ITEM_KEY 216
 #define BTRFS_CHUNK_ITEM_KEY   228
 
+/*
+ * Records the overall state of the qgroups.
+ * There's only one instance of this key present,
+ * (0, BTRFS_QGROUP_STATUS_KEY, 0)
+ */
+#define BTRFS_QGROUP_STATUS_KEY 240
+/*
+ * Records the currently used space of the qgroup.
+ * One key per qgroup, (0, BTRFS_QGROUP_INFO_KEY, qgroupid).
+ */
+#define BTRFS_QGROUP_INFO_KEY   242
+/*
+ * Contains the user configured limits for the qgroup.
+ * One key per qgroup, (0, BTRFS_QGROUP_LIMIT_KEY, qgroupid).
+ */
+#define BTRFS_QGROUP_LIMIT_KEY  244
+/*
+ * Records the child-parent relationship of qgroups. For
+ * each relation, 2 keys are present:
+ * (childid, BTRFS_QGROUP_RELATION_KEY, parentid)
+ * (parentid, BTRFS_QGROUP_RELATION_KEY, childid)
+ */
+#define BTRFS_QGROUP_RELATION_KEY   246
+
 #define BTRFS_BALANCE_ITEM_KEY 248
 
 /*
@@ -2474,6 +2567,49 @@ static inline void btrfs_set_dev_stats_value(struct 
extent_buffer *eb,
sizeof(val));
 }
 
+/* btrfs_qgroup_status_item */
+BTRFS_SETGET_FUNCS(qgroup_status_generation, struct btrfs_qgroup_status_item,
+  generation, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_version, struct btrfs_qgroup_status_item,
+  version, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_flags, struct btrfs_qgroup_status_item,
+  flags, 64);
+BTRFS_SETGET_FUNCS(qgroup_status_scan, struct btrfs_qgroup_status_item,
+  scan, 64);
+
+/* btrfs_qgroup_info_item */
+BTRFS_SETGET_FUNCS(qgroup_info_generation, struct btrfs_qgroup_info_item,
+  generation, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_rfer, struct btrfs_qgroup_info_item, rfer, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_rfer_cmpr, struct btrfs_qgroup_info_item,
+  rfer_cmpr, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_excl, struct btrfs_qgroup_info_item, excl, 64);
+BTRFS_SETGET_FUNCS(qgroup_info_excl_cmpr, struct btrfs_qgroup_info_item,
+  

[PATCH v1 04/15] Btrfs: add helper for tree enumeration

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/ctree.c |   72 ++
 fs/btrfs/ctree.h |3 ++
 2 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index bef68ab..fb21431 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2789,6 +2789,78 @@ done:
 }
 
 /*
+ * helper to use instead of search slot if no exact match is needed but
+ * instead the next or previous item should be returned.
+ * When find_higher is true, the next higher item is returned, the next lower
+ * otherwise.
+ * When return_any and find_higher are both true, and no higher item is found,
+ * return the next lower instead.
+ * When return_any is true and find_higher is false, and no lower item is 
found,
+ * return the next higher instead.
+ * It returns 0 if any item is found, 1 if none is found (tree empty), and
+ *  0 on error
+ */
+int btrfs_search_slot_for_read(struct btrfs_root *root,
+  struct btrfs_key *key, struct btrfs_path *p,
+  int find_higher, int return_any)
+{
+   int ret;
+   struct extent_buffer *leaf;
+
+again:
+   ret = btrfs_search_slot(NULL, root, key, p, 0, 0);
+   if (ret = 0)
+   return ret;
+   /*
+* a return value of 1 means the path is at the position where the
+* item should be inserted. Normally this is the next bigger item,
+* but in case the previous item is the last in a leaf, path points
+* to the first free slot in the previous leaf, i.e. at an invalid
+* item.
+*/
+   leaf = p-nodes[0];
+
+   if (find_higher) {
+   if (p-slots[0] = btrfs_header_nritems(leaf)) {
+   ret = btrfs_next_leaf(root, p);
+   if (ret = 0)
+   return ret;
+   if (!return_any)
+   return 1;
+   /*
+* no higher item found, return the next
+* lower instead
+*/
+   return_any = 0;
+   find_higher = 0;
+   btrfs_release_path(p);
+   goto again;
+   }
+   } else {
+   if (p-slots[0] = btrfs_header_nritems(leaf)) {
+   /* we're sitting on an invalid slot */
+   if (p-slots[0] == 0) {
+   ret = btrfs_prev_leaf(root, p);
+   if (ret = 0)
+   return ret;
+   if (!return_any)
+   return 1;
+   /*
+* no lower item found, return the next
+* higher instead
+*/
+   return_any = 0;
+   find_higher = 1;
+   btrfs_release_path(p);
+   goto again;
+   }
+   --p-slots[0];
+   }
+   }
+   return 0;
+}
+
+/*
  * adjust the pointers going up the tree, starting at level
  * making sure the right key of each node is points to 'key'.
  * This is used after shifting pointers to the left, so it stops
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 33088b0..27cf995 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2856,6 +2856,9 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, 
struct btrfs_root
  ins_len, int cow);
 int btrfs_search_old_slot(struct btrfs_root *root, struct btrfs_key *key,
  struct btrfs_path *p, u64 time_seq);
+int btrfs_search_slot_for_read(struct btrfs_root *root,
+  struct btrfs_key *key, struct btrfs_path *p,
+  int find_higher, int return_any);
 int btrfs_realloc_node(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, struct extent_buffer *parent,
   int start_slot, int cache_only, u64 *last_ret,
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 05/15] Btrfs: check the root passed to btrfs_end_transaction

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

This patch only add a consistancy check to validate that the
same root is passed to start_transaction and end_transaction.
Subvolume quota depends on this.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/transaction.c |6 ++
 fs/btrfs/transaction.h |6 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 621c8dc..23cbda0 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -345,6 +345,7 @@ again:
h-transaction = cur_trans;
h-blocks_used = 0;
h-bytes_reserved = 0;
+   h-root = root;
h-delayed_ref_updates = 0;
h-use_count = 1;
h-block_rsv = NULL;
@@ -511,6 +512,11 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
 
btrfs_trans_release_metadata(trans, root);
trans-block_rsv = NULL;
+   /*
+* the same root has to be passed to start_transaction and
+* end_transaction. Subvolume quota depends on this.
+*/
+   WARN_ON(trans-root != root);
while (count  2) {
unsigned long cur = trans-delayed_ref_updates;
trans-delayed_ref_updates = 0;
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index fe27379..0107294 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -57,6 +57,12 @@ struct btrfs_trans_handle {
struct btrfs_block_rsv *block_rsv;
struct btrfs_block_rsv *orig_rsv;
int aborted;
+   /*
+* this root is only needed to validate that the root passed to
+* start_transaction is the same as the one passed to end_transaction.
+* Subvolume quota depends on this
+*/
+   struct btrfs_root *root;
 };
 
 struct btrfs_pending_snapshot {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 00/15] Btrfs: subvolume quota groups (qgroups)

2012-07-12 Thread Jan Schmidt
This is a new version of Arne's qgroup patches from last October. The
old patches didn't get the backref walking right, which is now based on
the tree modification log.

You can limit the space available to subvolumes or any group of
subvolumes. You can determine the amount of space that will get freed
when deleting a snapshot.

The initial scan is still missing, so expect negative counters when you
enable quotas on a non-empty volume and then delete stuff.

Arne's introduction and concept description can still be found at

http://sensille.com/qgroups.pdf

You can pull these patches from my git repository

git://git.jan-o-sch.net/btrfs-unstable qgroup

The user mode patches required were sent at October 11, 2011 by Arne,
subject [PATCH v0] btrfs-progs: add qgroup commands.

I tried to include some fair benchmark results with this cover letter.
However, I tried several disk benchmarks from the phoronix test suite,
none of them resulted in any write throughput decrease. I will have to
create a more realistic setup on my own to benchmark the impact of
qgroups (suggestions welcome). For now, I just wanted to get that patch
set out :-)

Thanks,
-Jan

Arne Jansen (11):
  Btrfs: qgroup on-disk format
  Btrfs: add helper for tree enumeration
  Btrfs: check the root passed to btrfs_end_transaction
  Btrfs: added helper to create new trees
  Btrfs: qgroup state and initialization
  Btrfs: Test code to change the order of delayed-ref processing
  Btrfs: qgroup implementation and prototypes
  Btrfs: quota tree support and startup
  Btrfs: hooks to reserve qgroup space
  Btrfs: add qgroup ioctls
  Btrfs: add qgroup inheritance

Jan Schmidt (4):
  Btrfs: fix buffer leak in btrfs_next_old_leaf
  Btrfs: join tree mod log code with the code holding back delayed refs
  Btrfs: call the qgroup accounting functions
  Btrfs: hooks for qgroup to record delayed refs

 fs/btrfs/Makefile  |2 +-
 fs/btrfs/backref.c |   30 +-
 fs/btrfs/backref.h |3 +-
 fs/btrfs/ctree.c   |  348 
 fs/btrfs/ctree.h   |  233 +++-
 fs/btrfs/delayed-ref.c |   56 +-
 fs/btrfs/delayed-ref.h |   62 +--
 fs/btrfs/disk-io.c |  134 -
 fs/btrfs/disk-io.h |6 +
 fs/btrfs/extent-tree.c |  119 -
 fs/btrfs/ioctl.c   |  244 +++-
 fs/btrfs/ioctl.h   |   62 ++-
 fs/btrfs/qgroup.c  | 1571 
 fs/btrfs/transaction.c |   57 ++-
 fs/btrfs/transaction.h |   11 +
 15 files changed, 2696 insertions(+), 242 deletions(-)
 create mode 100644 fs/btrfs/qgroup.c

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 06/15] Btrfs: added helper to create new trees

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

This creates a brand new tree. Will be used to create
the quota tree.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/disk-io.c |   78 +++-
 fs/btrfs/disk-io.h |6 
 2 files changed, 83 insertions(+), 1 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 19a39e1..6fc243e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1225,6 +1225,82 @@ static struct btrfs_root *btrfs_alloc_root(struct 
btrfs_fs_info *fs_info)
return root;
 }
 
+struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info,
+u64 objectid)
+{
+   struct extent_buffer *leaf;
+   struct btrfs_root *tree_root = fs_info-tree_root;
+   struct btrfs_root *root;
+   struct btrfs_key key;
+   int ret = 0;
+   u64 bytenr;
+
+   root = btrfs_alloc_root(fs_info);
+   if (!root)
+   return ERR_PTR(-ENOMEM);
+
+   __setup_root(tree_root-nodesize, tree_root-leafsize,
+tree_root-sectorsize, tree_root-stripesize,
+root, fs_info, objectid);
+   root-root_key.objectid = objectid;
+   root-root_key.type = BTRFS_ROOT_ITEM_KEY;
+   root-root_key.offset = 0;
+
+   leaf = btrfs_alloc_free_block(trans, root, root-leafsize,
+ 0, objectid, NULL, 0, 0, 0);
+   if (IS_ERR(leaf)) {
+   ret = PTR_ERR(leaf);
+   goto fail;
+   }
+
+   bytenr = leaf-start;
+   memset_extent_buffer(leaf, 0, 0, sizeof(struct btrfs_header));
+   btrfs_set_header_bytenr(leaf, leaf-start);
+   btrfs_set_header_generation(leaf, trans-transid);
+   btrfs_set_header_backref_rev(leaf, BTRFS_MIXED_BACKREF_REV);
+   btrfs_set_header_owner(leaf, objectid);
+   root-node = leaf;
+
+   write_extent_buffer(leaf, fs_info-fsid,
+   (unsigned long)btrfs_header_fsid(leaf),
+   BTRFS_FSID_SIZE);
+   write_extent_buffer(leaf, fs_info-chunk_tree_uuid,
+   (unsigned long)btrfs_header_chunk_tree_uuid(leaf),
+   BTRFS_UUID_SIZE);
+   btrfs_mark_buffer_dirty(leaf);
+
+   root-commit_root = btrfs_root_node(root);
+   root-track_dirty = 1;
+
+
+   root-root_item.flags = 0;
+   root-root_item.byte_limit = 0;
+   btrfs_set_root_bytenr(root-root_item, leaf-start);
+   btrfs_set_root_generation(root-root_item, trans-transid);
+   btrfs_set_root_level(root-root_item, 0);
+   btrfs_set_root_refs(root-root_item, 1);
+   btrfs_set_root_used(root-root_item, leaf-len);
+   btrfs_set_root_last_snapshot(root-root_item, 0);
+   btrfs_set_root_dirid(root-root_item, 0);
+   root-root_item.drop_level = 0;
+
+   key.objectid = objectid;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = 0;
+   ret = btrfs_insert_root(trans, tree_root, key, root-root_item);
+   if (ret)
+   goto fail;
+
+   btrfs_tree_unlock(leaf);
+
+fail:
+   if (ret)
+   return ERR_PTR(ret);
+
+   return root;
+}
+
 static struct btrfs_root *alloc_log_tree(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info)
 {
@@ -3260,7 +3336,7 @@ int btrfs_read_buffer(struct extent_buffer *buf, u64 
parent_transid)
return btree_read_extent_buffer_pages(root, buf, 0, parent_transid);
 }
 
-static int btree_lock_page_hook(struct page *page, void *data,
+int btree_lock_page_hook(struct page *page, void *data,
void (*flush_fn)(void *))
 {
struct inode *inode = page-mapping-host;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h
index 05b3fab..95e147e 100644
--- a/fs/btrfs/disk-io.h
+++ b/fs/btrfs/disk-io.h
@@ -89,6 +89,12 @@ int btrfs_add_log_tree(struct btrfs_trans_handle *trans,
 int btrfs_cleanup_transaction(struct btrfs_root *root);
 void btrfs_cleanup_one_transaction(struct btrfs_transaction *trans,
  struct btrfs_root *root);
+void btrfs_abort_devices(struct btrfs_root *root);
+struct btrfs_root *btrfs_create_tree(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info,
+u64 objectid);
+int btree_lock_page_hook(struct page *page, void *data,
+   void (*flush_fn)(void *));
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void btrfs_init_lockdep(void);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 02/15] Btrfs: join tree mod log code with the code holding back delayed refs

2012-07-12 Thread Jan Schmidt
We've got two mechanisms both required for reliable backref resolving (tree
mod log and holding back delayed refs). You cannot make use of one without
the other. So instead of requiring the user of this mechanism to setup both
correctly, we join them into a single interface.

Additionally, we stop inserting non-blockers into fs_info-tree_mod_seq_list
as we did before, which was of no value.

Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/backref.c |   30 ++
 fs/btrfs/backref.h |3 +-
 fs/btrfs/ctree.c   |  275 ++--
 fs/btrfs/ctree.h   |   31 --
 fs/btrfs/delayed-ref.c |   44 
 fs/btrfs/delayed-ref.h |   49 +
 fs/btrfs/disk-io.c |2 +
 fs/btrfs/extent-tree.c |   21 ++--
 fs/btrfs/transaction.c |4 -
 9 files changed, 240 insertions(+), 219 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index a383c18..7d80ddd 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -773,9 +773,8 @@ static int __add_keyed_refs(struct btrfs_fs_info *fs_info,
  */
 static int find_parent_nodes(struct btrfs_trans_handle *trans,
 struct btrfs_fs_info *fs_info, u64 bytenr,
-u64 delayed_ref_seq, u64 time_seq,
-struct ulist *refs, struct ulist *roots,
-const u64 *extent_item_pos)
+u64 time_seq, struct ulist *refs,
+struct ulist *roots, const u64 *extent_item_pos)
 {
struct btrfs_key key;
struct btrfs_path *path;
@@ -837,7 +836,7 @@ again:
btrfs_put_delayed_ref(head-node);
goto again;
}
-   ret = __add_delayed_refs(head, delayed_ref_seq,
+   ret = __add_delayed_refs(head, time_seq,
 prefs_delayed);
mutex_unlock(head-mutex);
if (ret) {
@@ -981,8 +980,7 @@ static void free_leaf_list(struct ulist *blocks)
  */
 static int btrfs_find_all_leafs(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 bytenr,
-   u64 delayed_ref_seq, u64 time_seq,
-   struct ulist **leafs,
+   u64 time_seq, struct ulist **leafs,
const u64 *extent_item_pos)
 {
struct ulist *tmp;
@@ -997,7 +995,7 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle 
*trans,
return -ENOMEM;
}
 
-   ret = find_parent_nodes(trans, fs_info, bytenr, delayed_ref_seq,
+   ret = find_parent_nodes(trans, fs_info, bytenr,
time_seq, *leafs, tmp, extent_item_pos);
ulist_free(tmp);
 
@@ -1024,8 +1022,7 @@ static int btrfs_find_all_leafs(struct btrfs_trans_handle 
*trans,
  */
 int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 bytenr,
-   u64 delayed_ref_seq, u64 time_seq,
-   struct ulist **roots)
+   u64 time_seq, struct ulist **roots)
 {
struct ulist *tmp;
struct ulist_node *node = NULL;
@@ -1043,7 +1040,7 @@ int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
 
ULIST_ITER_INIT(uiter);
while (1) {
-   ret = find_parent_nodes(trans, fs_info, bytenr, delayed_ref_seq,
+   ret = find_parent_nodes(trans, fs_info, bytenr,
time_seq, tmp, *roots, NULL);
if (ret  0  ret != -ENOENT) {
ulist_free(tmp);
@@ -1376,11 +1373,9 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
struct ulist *roots = NULL;
struct ulist_node *ref_node = NULL;
struct ulist_node *root_node = NULL;
-   struct seq_list seq_elem = {};
struct seq_list tree_mod_seq_elem = {};
struct ulist_iterator ref_uiter;
struct ulist_iterator root_uiter;
-   struct btrfs_delayed_ref_root *delayed_refs = NULL;
 
pr_debug(resolving all inodes for extent %llu\n,
extent_item_objectid);
@@ -1391,16 +1386,11 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
trans = btrfs_join_transaction(fs_info-extent_root);
if (IS_ERR(trans))
return PTR_ERR(trans);
-
-   delayed_refs = trans-transaction-delayed_refs;
-   spin_lock(delayed_refs-lock);
-   btrfs_get_delayed_seq(delayed_refs, seq_elem);
-   spin_unlock(delayed_refs-lock);
btrfs_get_tree_mod_seq(fs_info, tree_mod_seq_elem);
}
 
ret = btrfs_find_all_leafs(trans, fs_info, 

[PATCH v1 10/15] Btrfs: call the qgroup accounting functions

2012-07-12 Thread Jan Schmidt
Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/extent-tree.c |3 +++
 fs/btrfs/transaction.c |   14 ++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 1a63b83..c08337a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2479,6 +2479,8 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle 
*trans,
   2 * 1024 * 1024, btrfs_get_alloc_profile(root, 0),
   CHUNK_ALLOC_NO_FORCE);
 
+   btrfs_delayed_refs_qgroup_accounting(trans, root-fs_info);
+
delayed_refs = trans-transaction-delayed_refs;
INIT_LIST_HEAD(cluster);
 again:
@@ -2588,6 +2590,7 @@ again:
}
 out:
spin_unlock(delayed_refs-lock);
+   assert_qgroups_uptodate(trans);
return 0;
 }
 
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 0d6c881..d20d2e2 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -512,6 +512,11 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
return 0;
}
 
+   /*
+* do the qgroup accounting as early as possible
+*/
+   err = btrfs_delayed_refs_qgroup_accounting(trans, info);
+
btrfs_trans_release_metadata(trans, root);
trans-block_rsv = NULL;
/*
@@ -571,6 +576,7 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
root-fs_info-fs_state  BTRFS_SUPER_FLAG_ERROR) {
err = -EIO;
}
+   assert_qgroups_uptodate(trans);
 
memset(trans, 0, sizeof(*trans));
kmem_cache_free(btrfs_trans_handle_cachep, trans);
@@ -1356,6 +1362,13 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
goto cleanup_transaction;
 
/*
+* running the delayed items may have added new refs. account
+* them now so that they hinder processing of more delayed refs
+* as little as possible.
+*/
+   btrfs_delayed_refs_qgroup_accounting(trans, root-fs_info);
+
+   /*
 * rename don't use btrfs_join_transaction, so, once we
 * set the transaction to blocked above, we aren't going
 * to get any new ordered operations.  We can safely run
@@ -1467,6 +1480,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
root-fs_info-chunk_root-node);
switch_commit_root(root-fs_info-chunk_root);
 
+   assert_qgroups_uptodate(trans);
update_super_roots(root);
 
if (!root-fs_info-log_root_recovering) {
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 15/15] Btrfs: add qgroup inheritance

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

When creating a subvolume or snapshot, it is necessary
to initialize the qgroup account with a copy of some
other (tracking) qgroup. This patch adds parameters
to the ioctls to pass the information from which qgroup
to inherit.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/ioctl.c   |   59 ++-
 fs/btrfs/ioctl.h   |   11 -
 fs/btrfs/transaction.c |8 ++
 fs/btrfs/transaction.h |1 +
 4 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 55a7283..1dffd0a 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -336,7 +336,8 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, 
void __user *arg)
 static noinline int create_subvol(struct btrfs_root *root,
  struct dentry *dentry,
  char *name, int namelen,
- u64 *async_transid)
+ u64 *async_transid,
+ struct btrfs_qgroup_inherit **inherit)
 {
struct btrfs_trans_handle *trans;
struct btrfs_key key;
@@ -368,6 +369,11 @@ static noinline int create_subvol(struct btrfs_root *root,
if (IS_ERR(trans))
return PTR_ERR(trans);
 
+   ret = btrfs_qgroup_inherit(trans, root-fs_info, 0, objectid,
+  inherit ? *inherit : NULL);
+   if (ret)
+   goto fail;
+
leaf = btrfs_alloc_free_block(trans, root, root-leafsize,
  0, objectid, NULL, 0, 0, 0);
if (IS_ERR(leaf)) {
@@ -484,7 +490,7 @@ fail:
 
 static int create_snapshot(struct btrfs_root *root, struct dentry *dentry,
   char *name, int namelen, u64 *async_transid,
-  bool readonly)
+  bool readonly, struct btrfs_qgroup_inherit **inherit)
 {
struct inode *inode;
struct btrfs_pending_snapshot *pending_snapshot;
@@ -502,6 +508,10 @@ static int create_snapshot(struct btrfs_root *root, struct 
dentry *dentry,
pending_snapshot-dentry = dentry;
pending_snapshot-root = root;
pending_snapshot-readonly = readonly;
+   if (inherit) {
+   pending_snapshot-inherit = *inherit;
+   *inherit = NULL;/* take responsibility to free it */
+   }
 
trans = btrfs_start_transaction(root-fs_info-extent_root, 5);
if (IS_ERR(trans)) {
@@ -635,7 +645,8 @@ static inline int btrfs_may_create(struct inode *dir, 
struct dentry *child)
 static noinline int btrfs_mksubvol(struct path *parent,
   char *name, int namelen,
   struct btrfs_root *snap_src,
-  u64 *async_transid, bool readonly)
+  u64 *async_transid, bool readonly,
+  struct btrfs_qgroup_inherit **inherit)
 {
struct inode *dir  = parent-dentry-d_inode;
struct dentry *dentry;
@@ -666,11 +677,11 @@ static noinline int btrfs_mksubvol(struct path *parent,
goto out_up_read;
 
if (snap_src) {
-   error = create_snapshot(snap_src, dentry,
-   name, namelen, async_transid, readonly);
+   error = create_snapshot(snap_src, dentry, name, namelen,
+   async_transid, readonly, inherit);
} else {
error = create_subvol(BTRFS_I(dir)-root, dentry,
- name, namelen, async_transid);
+ name, namelen, async_transid, inherit);
}
if (!error)
fsnotify_mkdir(dir, dentry);
@@ -1379,11 +1390,9 @@ out:
 }
 
 static noinline int btrfs_ioctl_snap_create_transid(struct file *file,
-   char *name,
-   unsigned long fd,
-   int subvol,
-   u64 *transid,
-   bool readonly)
+   char *name, unsigned long fd, int subvol,
+   u64 *transid, bool readonly,
+   struct btrfs_qgroup_inherit **inherit)
 {
struct btrfs_root *root = BTRFS_I(fdentry(file)-d_inode)-root;
struct file *src_file;
@@ -1407,7 +1416,7 @@ static noinline int 
btrfs_ioctl_snap_create_transid(struct file *file,
 
if (subvol) {
ret = btrfs_mksubvol(file-f_path, name, namelen,
-NULL, transid, readonly);
+NULL, transid, readonly, inherit);
} else {
struct inode 

[PATCH v1 14/15] Btrfs: add qgroup ioctls

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

Ioctls to control the qgroup feature like adding and
removing qgroups and assigning qgroups.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/ioctl.c |  185 ++
 fs/btrfs/ioctl.h |   27 
 2 files changed, 212 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 0e92e57..55a7283 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3390,6 +3390,183 @@ out:
return ret;
 }
 
+static long btrfs_ioctl_quota_ctl(struct btrfs_root *root, void __user *arg)
+{
+   struct btrfs_ioctl_quota_ctl_args *sa;
+   struct btrfs_trans_handle *trans = NULL;
+   int ret;
+   int err;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (root-fs_info-sb-s_flags  MS_RDONLY)
+   return -EROFS;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa))
+   return PTR_ERR(sa);
+
+   if (sa-cmd != BTRFS_QUOTA_CTL_RESCAN) {
+   trans = btrfs_start_transaction(root, 2);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+   }
+
+   switch (sa-cmd) {
+   case BTRFS_QUOTA_CTL_ENABLE:
+   ret = btrfs_quota_enable(trans, root-fs_info);
+   break;
+   case BTRFS_QUOTA_CTL_DISABLE:
+   ret = btrfs_quota_disable(trans, root-fs_info);
+   break;
+   case BTRFS_QUOTA_CTL_RESCAN:
+   ret = btrfs_quota_rescan(root-fs_info);
+   break;
+   default:
+   ret = -EINVAL;
+   break;
+   }
+
+   if (copy_to_user(arg, sa, sizeof(*sa)))
+   ret = -EFAULT;
+
+   if (trans) {
+   err = btrfs_commit_transaction(trans, root);
+   if (err  !ret)
+   ret = err;
+   }
+
+out:
+   kfree(sa);
+   return ret;
+}
+
+static long btrfs_ioctl_qgroup_assign(struct btrfs_root *root, void __user 
*arg)
+{
+   struct btrfs_ioctl_qgroup_assign_args *sa;
+   struct btrfs_trans_handle *trans;
+   int ret;
+   int err;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (root-fs_info-sb-s_flags  MS_RDONLY)
+   return -EROFS;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa))
+   return PTR_ERR(sa);
+
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+
+   /* FIXME: check if the IDs really exist */
+   if (sa-assign) {
+   ret = btrfs_add_qgroup_relation(trans, root-fs_info,
+   sa-src, sa-dst);
+   } else {
+   ret = btrfs_del_qgroup_relation(trans, root-fs_info,
+   sa-src, sa-dst);
+   }
+
+   err = btrfs_end_transaction(trans, root);
+   if (err  !ret)
+   ret = err;
+
+out:
+   kfree(sa);
+   return ret;
+}
+
+static long btrfs_ioctl_qgroup_create(struct btrfs_root *root, void __user 
*arg)
+{
+   struct btrfs_ioctl_qgroup_create_args *sa;
+   struct btrfs_trans_handle *trans;
+   int ret;
+   int err;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (root-fs_info-sb-s_flags  MS_RDONLY)
+   return -EROFS;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa))
+   return PTR_ERR(sa);
+
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+
+   /* FIXME: check if the IDs really exist */
+   if (sa-create) {
+   ret = btrfs_create_qgroup(trans, root-fs_info, sa-qgroupid,
+ NULL);
+   } else {
+   ret = btrfs_remove_qgroup(trans, root-fs_info, sa-qgroupid);
+   }
+
+   err = btrfs_end_transaction(trans, root);
+   if (err  !ret)
+   ret = err;
+
+out:
+   kfree(sa);
+   return ret;
+}
+
+static long btrfs_ioctl_qgroup_limit(struct btrfs_root *root, void __user *arg)
+{
+   struct btrfs_ioctl_qgroup_limit_args *sa;
+   struct btrfs_trans_handle *trans;
+   int ret;
+   int err;
+   u64 qgroupid;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (root-fs_info-sb-s_flags  MS_RDONLY)
+   return -EROFS;
+
+   sa = memdup_user(arg, sizeof(*sa));
+   if (IS_ERR(sa))
+   return PTR_ERR(sa);
+
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+
+   qgroupid = sa-qgroupid;
+   if (!qgroupid) {
+   /* take the current subvol as qgroup */
+  

[PATCH v1 11/15] Btrfs: quota tree support and startup

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

Init the quota tree along with the others on open_ctree
and close_ctree. Add the quota tree to the list of well
known trees in btrfs_read_fs_root_no_name.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/ctree.h   |1 +
 fs/btrfs/disk-io.c |   47 +--
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index ccba9b6..2ba03b9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2967,6 +2967,7 @@ static inline void free_fs_info(struct btrfs_fs_info 
*fs_info)
kfree(fs_info-chunk_root);
kfree(fs_info-dev_root);
kfree(fs_info-csum_root);
+   kfree(fs_info-quota_root);
kfree(fs_info-super_copy);
kfree(fs_info-super_for_commit);
kfree(fs_info);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index eca0549..87d9391 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1472,6 +1472,9 @@ struct btrfs_root *btrfs_read_fs_root_no_name(struct 
btrfs_fs_info *fs_info,
return fs_info-dev_root;
if (location-objectid == BTRFS_CSUM_TREE_OBJECTID)
return fs_info-csum_root;
+   if (location-objectid == BTRFS_QUOTA_TREE_OBJECTID)
+   return fs_info-quota_root ? fs_info-quota_root :
+ERR_PTR(-ENOENT);
 again:
spin_lock(fs_info-fs_roots_radix_lock);
root = radix_tree_lookup(fs_info-fs_roots_radix,
@@ -1899,6 +1902,10 @@ static void free_root_pointers(struct btrfs_fs_info 
*info, int chunk_root)
free_extent_buffer(info-extent_root-commit_root);
free_extent_buffer(info-csum_root-node);
free_extent_buffer(info-csum_root-commit_root);
+   if (info-quota_root) {
+   free_extent_buffer(info-quota_root-node);
+   free_extent_buffer(info-quota_root-commit_root);
+   }
 
info-tree_root-node = NULL;
info-tree_root-commit_root = NULL;
@@ -1908,6 +1915,10 @@ static void free_root_pointers(struct btrfs_fs_info 
*info, int chunk_root)
info-extent_root-commit_root = NULL;
info-csum_root-node = NULL;
info-csum_root-commit_root = NULL;
+   if (info-quota_root) {
+   info-quota_root-node = NULL;
+   info-quota_root-commit_root = NULL;
+   }
 
if (chunk_root) {
free_extent_buffer(info-chunk_root-node);
@@ -1938,6 +1949,7 @@ int open_ctree(struct super_block *sb,
struct btrfs_root *csum_root;
struct btrfs_root *chunk_root;
struct btrfs_root *dev_root;
+   struct btrfs_root *quota_root;
struct btrfs_root *log_tree_root;
int ret;
int err = -EINVAL;
@@ -1949,9 +1961,10 @@ int open_ctree(struct super_block *sb,
csum_root = fs_info-csum_root = btrfs_alloc_root(fs_info);
chunk_root = fs_info-chunk_root = btrfs_alloc_root(fs_info);
dev_root = fs_info-dev_root = btrfs_alloc_root(fs_info);
+   quota_root = fs_info-quota_root = btrfs_alloc_root(fs_info);
 
if (!tree_root || !extent_root || !csum_root ||
-   !chunk_root || !dev_root) {
+   !chunk_root || !dev_root || !quota_root) {
err = -ENOMEM;
goto fail;
}
@@ -2441,6 +2454,17 @@ retry_root_backup:
goto recovery_tree_root;
csum_root-track_dirty = 1;
 
+   ret = find_and_setup_root(tree_root, fs_info,
+ BTRFS_QUOTA_TREE_OBJECTID, quota_root);
+   if (ret) {
+   kfree(quota_root);
+   quota_root = fs_info-quota_root = NULL;
+   } else {
+   quota_root-track_dirty = 1;
+   fs_info-quota_enabled = 1;
+   fs_info-pending_quota_state = 1;
+   }
+
fs_info-generation = generation;
fs_info-last_trans_committed = generation;
 
@@ -2500,6 +2524,9 @@ retry_root_backup:
integrity check module %s\n, sb-s_id);
}
 #endif
+   ret = btrfs_read_qgroup_config(fs_info);
+   if (ret)
+   goto fail_trans_kthread;
 
/* do not make disk changes in broken FS */
if (btrfs_super_log_root(disk_super) != 0 
@@ -2510,7 +2537,7 @@ retry_root_backup:
printk(KERN_WARNING Btrfs log replay required 
   on RO media\n);
err = -EIO;
-   goto fail_trans_kthread;
+   goto fail_qgroup;
}
blocksize =
 btrfs_level_size(tree_root,
@@ -2519,7 +2546,7 @@ retry_root_backup:
log_tree_root = btrfs_alloc_root(fs_info);
if (!log_tree_root) {
err = -ENOMEM;
-   goto fail_trans_kthread;
+   goto fail_qgroup;
}
 
__setup_root(nodesize, 

[PATCH v1 12/15] Btrfs: hooks for qgroup to record delayed refs

2012-07-12 Thread Jan Schmidt
Hooks into qgroup code to record refs and into transaction commit.
This is the main entry point for qgroup. Basically every change in
extent backrefs got accounted to the appropriate qgroups.

Signed-off-by: Arne Jansen sensi...@gmx.net
Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/delayed-ref.c |   16 ++--
 fs/btrfs/delayed-ref.h |   19 +++
 fs/btrfs/transaction.c |7 +++
 3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 21a7577..da7419e 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -529,8 +529,8 @@ static noinline void add_delayed_tree_ref(struct 
btrfs_fs_info *fs_info,
ref-is_head = 0;
ref-in_tree = 1;
 
-   if (is_fstree(ref_root))
-   seq = btrfs_inc_tree_mod_seq(fs_info);
+   if (need_ref_seq(for_cow, ref_root))
+   seq = btrfs_get_tree_mod_seq(fs_info, trans-delayed_ref_elem);
ref-seq = seq;
 
full_ref = btrfs_delayed_node_to_tree_ref(ref);
@@ -588,8 +588,8 @@ static noinline void add_delayed_data_ref(struct 
btrfs_fs_info *fs_info,
ref-is_head = 0;
ref-in_tree = 1;
 
-   if (is_fstree(ref_root))
-   seq = btrfs_inc_tree_mod_seq(fs_info);
+   if (need_ref_seq(for_cow, ref_root))
+   seq = btrfs_get_tree_mod_seq(fs_info, trans-delayed_ref_elem);
ref-seq = seq;
 
full_ref = btrfs_delayed_node_to_data_ref(ref);
@@ -662,10 +662,12 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
*fs_info,
add_delayed_tree_ref(fs_info, trans, ref-node, bytenr,
   num_bytes, parent, ref_root, level, action,
   for_cow);
-   if (!is_fstree(ref_root) 
+   if (!need_ref_seq(for_cow, ref_root) 
waitqueue_active(fs_info-tree_mod_seq_wait))
wake_up(fs_info-tree_mod_seq_wait);
spin_unlock(delayed_refs-lock);
+   if (need_ref_seq(for_cow, ref_root))
+   btrfs_qgroup_record_ref(trans, ref-node, extent_op);
 
return 0;
 }
@@ -711,10 +713,12 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
add_delayed_data_ref(fs_info, trans, ref-node, bytenr,
   num_bytes, parent, ref_root, owner, offset,
   action, for_cow);
-   if (!is_fstree(ref_root) 
+   if (!need_ref_seq(for_cow, ref_root) 
waitqueue_active(fs_info-tree_mod_seq_wait))
wake_up(fs_info-tree_mod_seq_wait);
spin_unlock(delayed_refs-lock);
+   if (need_ref_seq(for_cow, ref_root))
+   btrfs_qgroup_record_ref(trans, ref-node, extent_op);
 
return 0;
 }
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 2b5cb27..0d7c90c 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -180,6 +180,25 @@ int btrfs_check_delayed_seq(struct btrfs_fs_info *fs_info,
u64 seq);
 
 /*
+ * delayed refs with a ref_seq  0 must be held back during backref walking.
+ * this only applies to items in one of the fs-trees. for_cow items never need
+ * to be held back, so they won't get a ref_seq number.
+ */
+static inline int need_ref_seq(int for_cow, u64 rootid)
+{
+   if (for_cow)
+   return 0;
+
+   if (rootid == BTRFS_FS_TREE_OBJECTID)
+   return 1;
+
+   if ((s64)rootid = (s64)BTRFS_FIRST_FREE_OBJECTID)
+   return 1;
+
+   return 0;
+}
+
+/*
  * a node might live in a head or a regular ref, this lets you
  * test for the proper type to use.
  */
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d20d2e2..21c768c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -795,6 +795,13 @@ static noinline int commit_cowonly_roots(struct 
btrfs_trans_handle *trans,
ret = btrfs_run_dev_stats(trans, root-fs_info);
BUG_ON(ret);
 
+   ret = btrfs_run_qgroups(trans, root-fs_info);
+   BUG_ON(ret);
+
+   /* run_qgroups might have added some more refs */
+   ret = btrfs_run_delayed_refs(trans, root, (unsigned long)-1);
+   BUG_ON(ret);
+
while (!list_empty(fs_info-dirty_cowonly_roots)) {
next = fs_info-dirty_cowonly_roots.next;
list_del_init(next);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 09/15] Btrfs: qgroup implementation and prototypes

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

Signed-off-by: Arne Jansen sensi...@gmx.net
Signed-off-by: Jan Schmidt list.bt...@jan-o-sch.net
---
 fs/btrfs/Makefile  |2 +-
 fs/btrfs/ctree.h   |   46 ++
 fs/btrfs/extent-tree.c |   34 +
 fs/btrfs/ioctl.h   |   24 +
 fs/btrfs/qgroup.c  | 1571 
 fs/btrfs/transaction.c |2 +
 fs/btrfs/transaction.h |3 +
 7 files changed, 1681 insertions(+), 1 deletions(-)
 create mode 100644 fs/btrfs/qgroup.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 0c4fa2b..0bc4d3a 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,7 +8,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-  reada.o backref.o ulist.o
+  reada.o backref.o ulist.o qgroup.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a5269d4..ccba9b6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2830,6 +2830,8 @@ int btrfs_force_chunk_alloc(struct btrfs_trans_handle 
*trans,
 int btrfs_trim_fs(struct btrfs_root *root, struct fstrim_range *range);
 
 int btrfs_init_space_info(struct btrfs_fs_info *fs_info);
+int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info);
 /* ctree.c */
 int btrfs_bin_search(struct extent_buffer *eb, struct btrfs_key *key,
 int level, int *slot);
@@ -3339,6 +3341,50 @@ void btrfs_reada_detach(void *handle);
 int btree_readahead_hook(struct btrfs_root *root, struct extent_buffer *eb,
 u64 start, int err);
 
+/* qgroup.c */
+struct qgroup_update {
+   struct list_head list;
+   struct btrfs_delayed_ref_node *node;
+   struct btrfs_delayed_extent_op *extent_op;
+};
+
+int btrfs_quota_enable(struct btrfs_trans_handle *trans,
+  struct btrfs_fs_info *fs_info);
+int btrfs_quota_disable(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info);
+int btrfs_quota_rescan(struct btrfs_fs_info *fs_info);
+int btrfs_add_qgroup_relation(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info, u64 src, u64 dst);
+int btrfs_del_qgroup_relation(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info, u64 src, u64 dst);
+int btrfs_create_qgroup(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info, u64 qgroupid,
+   char *name);
+int btrfs_remove_qgroup(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info, u64 qgroupid);
+int btrfs_limit_qgroup(struct btrfs_trans_handle *trans,
+  struct btrfs_fs_info *fs_info, u64 qgroupid,
+  struct btrfs_qgroup_limit *limit);
+int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info);
+void btrfs_free_qgroup_config(struct btrfs_fs_info *fs_info);
+struct btrfs_delayed_extent_op;
+int btrfs_qgroup_record_ref(struct btrfs_trans_handle *trans,
+   struct btrfs_delayed_ref_node *node,
+   struct btrfs_delayed_extent_op *extent_op);
+int btrfs_qgroup_account_ref(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info,
+struct btrfs_delayed_ref_node *node,
+struct btrfs_delayed_extent_op *extent_op);
+int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info);
+int btrfs_qgroup_inherit(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info, u64 srcid, u64 objectid,
+struct btrfs_qgroup_inherit *inherit);
+int btrfs_qgroup_reserve(struct btrfs_root *root, u64 num_bytes);
+void btrfs_qgroup_free(struct btrfs_root *root, u64 num_bytes);
+
+void assert_qgroups_uptodate(struct btrfs_trans_handle *trans);
+
 static inline int is_fstree(u64 rootid)
 {
if (rootid == BTRFS_FS_TREE_OBJECTID ||
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b13f1fb..1a63b83 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2409,6 +2409,40 @@ static u64 find_middle(struct rb_root *root)
 }
 #endif
 
+int btrfs_delayed_refs_qgroup_accounting(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info)
+{
+   struct qgroup_update *qgroup_update;
+   int ret = 0;
+
+   if (list_empty(trans-qgroup_ref_list) !=
+   !trans-delayed_ref_elem.seq) {
+   /* list without seq or 

[PATCH v1 08/15] Btrfs: Test code to change the order of delayed-ref processing

2012-07-12 Thread Jan Schmidt
From: Arne Jansen sensi...@gmx.net

Normally delayed refs get processed in ascending bytenr order. This
correlates in most cases to the order added. To expose dependencies
on this order, we start to process the tree in the middle instead of
the beginning.
This code is only effective when SCRAMBLE_DELAYED_REFS is defined.

Signed-off-by: Arne Jansen sensi...@gmx.net
---
 fs/btrfs/extent-tree.c |   49 
 1 files changed, 49 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 94ce79f..b13f1fb 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -34,6 +34,8 @@
 #include locking.h
 #include free-space-cache.h
 
+#undef SCRAMBLE_DELAYED_REFS
+
 /*
  * control flags for do_chunk_alloc's force field
  * CHUNK_ALLOC_NO_FORCE means to only allocate a chunk
@@ -2364,6 +2366,49 @@ static void wait_for_more_refs(struct btrfs_fs_info 
*fs_info,
spin_lock(delayed_refs-lock);
 }
 
+#ifdef SCRAMBLE_DELAYED_REFS
+/*
+ * Normally delayed refs get processed in ascending bytenr order. This
+ * correlates in most cases to the order added. To expose dependencies on this
+ * order, we start to process the tree in the middle instead of the beginning
+ */
+static u64 find_middle(struct rb_root *root)
+{
+   struct rb_node *n = root-rb_node;
+   struct btrfs_delayed_ref_node *entry;
+   int alt = 1;
+   u64 middle;
+   u64 first = 0, last = 0;
+
+   n = rb_first(root);
+   if (n) {
+   entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+   first = entry-bytenr;
+   }
+   n = rb_last(root);
+   if (n) {
+   entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+   last = entry-bytenr;
+   }
+   n = root-rb_node;
+
+   while (n) {
+   entry = rb_entry(n, struct btrfs_delayed_ref_node, rb_node);
+   WARN_ON(!entry-in_tree);
+
+   middle = entry-bytenr;
+
+   if (alt)
+   n = n-rb_left;
+   else
+   n = n-rb_right;
+
+   alt = 1 - alt;
+   }
+   return middle;
+}
+#endif
+
 /*
  * this starts processing the delayed reference count updates and
  * extent insertions we have queued up so far.  count can be
@@ -2406,6 +2451,10 @@ again:
consider_waiting = 0;
spin_lock(delayed_refs-lock);
 
+#ifdef SCRAMBLE_DELAYED_REFS
+   delayed_refs-run_delayed_start = find_middle(delayed_refs-root);
+#endif
+
if (count == 0) {
count = delayed_refs-num_entries * 2;
run_most = 1;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs benchmark

2012-07-12 Thread Liu Bo
On 07/12/2012 05:17 PM, Bernd Kohler wrote:

 Hi @ all,
 
 in the last edition of the german Linux-Magazin, there has been an
 article about Linux filesystem performance test - the article is titled
 Formel Storage - Linux-Dateisystem im Leistungstest.
 
 The author of this article, Mr Michael Kromer provides his benchmark
 script on link [1] and as btrfs is mentioned there, I felt free to
 publish this here ;)
 




Thank you very much, I'm playing with it now.

thanks,
liubo

 best
 
 Bernd Kohler
 
 
 
 
 [1]
 http://medozas.de/2012-lm-fs-benchmark.tar.gz
 




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 04/15] Btrfs: add helper for tree enumeration

2012-07-12 Thread Alexander Block
On Thu, Jul 12, 2012 at 11:43 AM, Jan Schmidt list.bt...@jan-o-sch.net wrote:
 From: Arne Jansen sensi...@gmx.net

 Often no exact match is wanted but just the next lower or
 higher item. There's a lot of duplicated code throughout
 btrfs to deal with the corner cases. This patch adds a
 helper function that can facilitate searching.

 Signed-off-by: Arne Jansen sensi...@gmx.net
Hmm, I'm very sorry but my btrfs send/receive patchset has created
some conflicts here. I had to fix the find_higher=0 case in this patch
as it was not decrementing the slot at the right time. I changed the
original patch from Arne and hoped that Jan and Arne would take that
fixed version. I should have waited for an ACK from both, but instead
forgot about communicating it properly and posted it. I'm not sure how
to proceed now...asking for advise.

The version of the patch that I posted:
http://article.gmane.org/gmane.comp.file-systems.btrfs/18451
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: kill free_space pointer from inode structure

2012-07-12 Thread Josef Bacik
On Wed, Jul 11, 2012 at 07:38:11PM -0600, Li Zefan wrote:
 On 2012/7/11 3:29, Josef Bacik wrote:
 
  On Mon, Jul 09, 2012 at 08:21:07PM -0600, Li Zefan wrote:
  Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
  which means every inode has the same BTRFS_I(inode)-free_space pointer.
 
  This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).
 
  Signed-off-by: Li Zefan lize...@huawei.com
  
  Li I can't apply any of your patches because they are all in base64 format 
  and
  I'm having a hell of a time pulling them out to apply them, can you resend 
  with
  git send-email or something so I can apply them properly?  Thanks,
  
 
 
 Hmm.. I got no complaints from Tejun or Chris before, so I didn't realize all
 the emails I sent were in base64. It should be the email server that encoded
 my patches, so I don't think using git-send-email will make any difference.
 (not to mention I failed to make git-send-email work in my office :(
 
 Is it ok if I attach the patches in attachments? Otherwise I'll use gmail
 instead when I'm at home.
 

Sorry I'll just pull them out somewhere else, for some reason notmuch doesn't
want to decode them properly so I just get garbage.  I'll just pull it out of
thunderbird by hand, hopefully that will work properly.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: allow delayed refs to be merged

2012-07-12 Thread Josef Bacik
On Wed, Jul 11, 2012 at 07:19:29AM -0600, Jan Schmidt wrote:
 Hi Josef,
 
 I hit a warning with this patch on top of the current cmason/for-linus
 branch. Takes about 15 minutes to produce when running xfstest 278 in
 a loop and, in another shell, doing fsstress on the same volume to
 force metadata modifications.
 
 fs/btrfs/extent-tree.c
 ...
 5032 } else if (ret == -ENOENT) {
 5033 btrfs_print_leaf(extent_root, path-nodes[0]);
 5034 WARN_ON(1);
 5035 printk(KERN_ERR btrfs unable to find ref byte nr %llu 
 5036parent %llu root %llu  owner %llu offset %llu\n,
 5037(unsigned long long)bytenr,
 5038(unsigned long long)parent,
 5039(unsigned long long)root_objectid,
 5040(unsigned long long)owner_objectid,
 5041(unsigned long long)owner_offset);
 5042 } else {
 

Which test, 278 is an xfs specific test.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Btrfs: improve multi-thread buffer read

2012-07-12 Thread Chris Mason
On Wed, Jul 11, 2012 at 08:13:51PM -0600, Liu Bo wrote:
 While testing with my buffer read fio jobs[1], I find that btrfs does not
 perform well enough.
 
 Here is a scenario in fio jobs:
 
 We have 4 threads, t1 t2 t3 t4, starting to buffer read a same file,
 and all of them will race on add_to_page_cache_lru(), and if one thread
 successfully puts its page into the page cache, it takes the responsibility
 to read the page's data.
 
 And what's more, reading a page needs a period of time to finish, in which
 other threads can slide in and process rest pages:
 
  t1  t2  t3  t4
add Page1
read Page1  add Page2
  | read Page2  add Page3
  ||read Page3  add Page4
  ||   |read Page4
 -||---|---|
  vv   v   v
 bio  bio bio bio
 
 Now we have four bios, each of which holds only one page since we need to
 maintain consecutive pages in bio.  Thus, we can end up with far more bios
 than we need.
 
 Here we're going to
 a) delay the real read-page section and
 b) try to put more pages into page cache.
 
 With that said, we can make each bio hold more pages and reduce the number
 of bios we need.
 
 Here is some numbers taken from fio results:
  w/o patch w patch
-    ---
 READ:745MB/s+32%   987MB/s
 
 [1]:
 [global]
 group_reporting
 thread
 numjobs=4
 bs=32k
 rw=read
 ioengine=sync
 directory=/mnt/btrfs/
 
 [READ]
 filename=foobar
 size=2000M
 invalidate=1
 
 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
 v1-v2: if we fail to make a allocation, just fall back to the old way to
 read page.
  fs/btrfs/extent_io.c |   41 +++--
  1 files changed, 39 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index 01c21b6..5c8ab6c 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -3549,6 +3549,11 @@ int extent_writepages(struct extent_io_tree *tree,
   return ret;
  }
  
 +struct pagelst {
 + struct page *page;
 + struct list_head lst;
 +};
 +

I like this patch, its a  big improvement for just a little timing
change.  Instead of doing the kmalloc of this struct, can you please
change it to put a pagevec on the stack.

The model would be:

add a page to the pagevec array
if pagevec full
launch all the readpages

This lets you avoid the kmalloc, and it is closer to how we solve
similar problems in other parts of the kernel.

Thanks!

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: allow delayed refs to be merged

2012-07-12 Thread Jan Schmidt
On 12.07.2012 19:05, Josef Bacik wrote:
 Which test, 278 is an xfs specific test.  Thanks,

Oops. The test finally made it into xfstests-dev as 276.

Sorry,
-Jan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck crashes

2012-07-12 Thread Christian Volkmann

Anand Jain schrieb:



  If this is a deliberate corruption can you pls share the test-case ?
  if not have you tried mount with recovery and the scrub. ? scrub
  would be preferred choice over btrfsck.





Scrub does not fix the problem. I replaced the real host name with myhost.
Strange for me: the mentioned pathes for errors point to the same file names,
just a part of the myhost is different.

The btrfsck fails with the same crash after the scrub.


speedy:/home/cv # btrfs scrub status /backup.old
scrub status for fa7034c8-86d4-4aa3-9fde-ecd7051ff43c
scrub started at Thu Jul 12 20:21:08 2012 and finished after 1495 
seconds
total bytes scrubbed: 115.49GiB with 9 errors
error details: verify=3 csum=6
corrected errors: 3, uncorrectable errors: 6, unverified errors: 0

Should I continue with any analysis for bug hunting or just reformat and forget?

Best regard,
Christian

[ 5059.168649] btrfs: checksum/header error at logical 1532956672 on dev 
/dev/md3, sector 7204744: metadata leaf (level 0) in tree 2
[ 5059.168656] btrfs: checksum/header error at logical 1532956672 on dev 
/dev/md3, sector 7204744: metadata leaf (level 0) in tree 2
[ 5065.581348] btrfs: fixed up error at logical 1532956672 on dev /dev/md3
[ 5065.587844] btrfs: checksum/header error at logical 1532960768 on dev 
/dev/md3, sector 7204752: metadata leaf (level 0) in tree 2
[ 5065.587851] btrfs: checksum/header error at logical 1532960768 on dev 
/dev/md3, sector 7204752: metadata leaf (level 0) in tree 2
[ 5065.599317] btrfs: fixed up error at logical 1532960768 on dev /dev/md3
[ 5065.599500] btrfs: checksum/header error at logical 1532964864 on dev 
/dev/md3, sector 7204760: metadata leaf (level 0) in tree 2
[ 5065.599506] btrfs: checksum/header error at logical 1532964864 on dev 
/dev/md3, sector 7204760: metadata leaf (level 0) in tree 2
[ 5065.607379] btrfs: fixed up error at logical 1532964864 on dev /dev/md3
[ 5074.964900] btrfs: checksum error at logical 2327654400 on dev /dev/md3, 
sector 8756888: metadata leaf (level 0) in tree 5
[ 5074.964907] btrfs: checksum error at logical 2327654400 on dev /dev/md3, 
sector 8756888: metadata leaf (level 0) in tree 5
[ 5075.977763] btrfs: unable to fixup (regular) error at logical 2327654400 on 
dev /dev/md3
[ 5085.133646] btrfs: checksum error at logical 2327654400 on dev /dev/md3, 
sector 10854040: metadata leaf (level 0) in tree 5
[ 5085.133653] btrfs: checksum error at logical 2327654400 on dev /dev/md3, 
sector 10854040: metadata leaf (level 0) in tree 5
[ 5086.148842] btrfs: unable to fixup (regular) error at logical 2327654400 on 
dev /dev/md3
[ 6436.036292] btrfs: checksum error at logical 139801403392 on dev /dev/md3, sector 
331786256, root 5, inode 2960268, offset 1345069056, length 4096, links 1 (path: 
int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/myhost.de/statistics/logs/access_ssl_log.processed)
[ 6436.036300] btrfs: unable to fixup (regular) error at logical 139801403392 
on dev /dev/md3
[ 6454.615722] btrfs: checksum error at logical 141661282304 on dev /dev/md3, sector 
335418832, root 5, inode 2968078, offset 104292352, length 4096, links 1 (path: 
int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/myhost.no/statistics/logs/error_log)
[ 6454.615736] btrfs: unable to fixup (regular) error at logical 141661282304 
on dev /dev/md3
[ 6455.523759] btrfs: checksum error at logical 140794101760 on dev /dev/md3, sector 
333725120, root 5, inode 2964438, offset 87449600, length 4096, links 1 (path: 
int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/myhost.fr/statistics/logs/access_log.processed)
[ 6455.523775] btrfs: unable to fixup (regular) error at logical 140794101760 
on dev /dev/md3
[ 6475.865387] btrfs: checksum error at logical 143052115968 on dev /dev/md3, sector 
338135304, root 5, inode 3000621, offset 1078595584, length 4096, links 1 (path: 
int-www-mail/int-www-2012-07-05-22_28_57/srv/www/vhosts/otherhost.com/statistics/logs/access_log.processed)
[ 6475.865403] btrfs: unable to fixup (regular) error at logical 143052115968 
on dev /dev/md3

speedy:/tmp/btrfs/btrfs-progs # ./btrfsck /dev/md3
checking extents
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
checksum verify failed on 2327654400 wanted 73CDE79C found 72
Csum didn't match
owner ref check failed [2327654400 4096]
ref mismatch on [101138354176 98304] extent item 1, found 0
Incorrect local backref count on 101138354176 root 5 owner 1867898 offset 0 
found 0 wanted 1 back 0x787c260
backpointer mismatch on [101138354176 98304]
owner ref check failed [101138354176 98304]
ref mismatch on [101138452480 106496] extent item 1, found 0
Incorrect local backref count on 101138452480 root 5 owner 1867899 offset 0 
found 0 wanted 1 back 0x787c2a0
backpointer mismatch on [101138452480 106496]
owner ref check failed [101138452480 

Re: [PATCH v2] Btrfs: improve multi-thread buffer read

2012-07-12 Thread Liu Bo
On 07/12/2012 02:04 PM, Chris Mason wrote:

 On Wed, Jul 11, 2012 at 08:13:51PM -0600, Liu Bo wrote:
 While testing with my buffer read fio jobs[1], I find that btrfs does not
 perform well enough.

 Here is a scenario in fio jobs:

 We have 4 threads, t1 t2 t3 t4, starting to buffer read a same file,
 and all of them will race on add_to_page_cache_lru(), and if one thread
 successfully puts its page into the page cache, it takes the responsibility
 to read the page's data.

 And what's more, reading a page needs a period of time to finish, in which
 other threads can slide in and process rest pages:

  t1  t2  t3  t4
add Page1
read Page1  add Page2
  | read Page2  add Page3
  ||read Page3  add Page4
  ||   |read Page4
 -||---|---|
  vv   v   v
 bio  bio bio bio

 Now we have four bios, each of which holds only one page since we need to
 maintain consecutive pages in bio.  Thus, we can end up with far more bios
 than we need.

 Here we're going to
 a) delay the real read-page section and
 b) try to put more pages into page cache.

 With that said, we can make each bio hold more pages and reduce the number
 of bios we need.

 Here is some numbers taken from fio results:
  w/o patch w patch
-    ---
 READ:745MB/s+32%   987MB/s

 [1]:
 [global]
 group_reporting
 thread
 numjobs=4
 bs=32k
 rw=read
 ioengine=sync
 directory=/mnt/btrfs/

 [READ]
 filename=foobar
 size=2000M
 invalidate=1

 Signed-off-by: Liu Bo liubo2...@cn.fujitsu.com
 ---
 v1-v2: if we fail to make a allocation, just fall back to the old way to
 read page.
  fs/btrfs/extent_io.c |   41 +++--
  1 files changed, 39 insertions(+), 2 deletions(-)

 diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
 index 01c21b6..5c8ab6c 100644
 --- a/fs/btrfs/extent_io.c
 +++ b/fs/btrfs/extent_io.c
 @@ -3549,6 +3549,11 @@ int extent_writepages(struct extent_io_tree *tree,
  return ret;
  }
  
 +struct pagelst {
 +struct page *page;
 +struct list_head lst;
 +};
 +
 
 I like this patch, its a  big improvement for just a little timing
 change.  Instead of doing the kmalloc of this struct, can you please
 change it to put a pagevec on the stack.
 
 The model would be:
 
 add a page to the pagevec array
 if pagevec full
   launch all the readpages
 
 This lets you avoid the kmalloc, and it is closer to how we solve
 similar problems in other parts of the kernel.
 


Yeah, but there is something different.

Actually my first attempt is doing this with struct pagevec, but pagevec has
a PAGEVEC_SIZE, which is limited to 14.

That means that at the worst case, we batch only 14 pages in a bio to submit.

However, a bio is able to contains at most 128 pages with my devices, that's the
reason why I turn to kmalloc another struct.

Here is some performance number:
  w/o patch  w pvec patch w kmalloc patch
-   --  ---
READ:  745MB/s 880MB/s  987MB/s


So what do you think about it?  I'm ok with both.

thanks,
liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html