Re: empty disk reports full

2016-04-25 Thread Chris Murphy
On Mon, Apr 25, 2016 at 8:03 AM, Alejandro Vargas  wrote:
> El Viernes, 1 de abril de 2016 10:05:07 Hugo Mills escribió:
>> On Fri, Apr 01, 2016 at 11:50:50AM +0200, Alejandro Vargas wrote:
>> > I am using a 2Tb disk for incremental backups.
>> >
>> > I use rsync for backing up to a subvolume, and each day I creates an
>> > snapshot of the lastest snapshot and do rsync in this.
>> >
>> > When the disk becomes nearly full (100Gb or less available) I deletes the
>> > oldest subvolume (withbtrfs subvolume delete).
>> >
>> > My problem is that *even removing ALL the subvolumes*, the free space does
>> > not change. It continues reporting the same size (disk is nearly full).
>> >
>> > I tried "btrfs balance start /mnt/backup" but it takes hours and hours.
>> >
>> > I'm using linux 4.1.15
>> > btrfs-progs v4.1.2
>>
>>Can you show us the output of both "sudo btrfs fi show" and "btrfs
>> fi df /mnt/backup", please?
>
> Before deleting subvolumes:
>
> [root@backups ~]# df /mnt/backup
> S.ficheros Tamaño Usados  Disp Uso% Montado en
> /dev/sdb11,9T   1,9T  5,0M 100% /mnt/backup
>
>
> [root@backups ~]# ls -l /mnt/backup
> total 0
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160318/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160328/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160330/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160401/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160404/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160406/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160408/
>
>
> [root@backups ~]# btrfs fi show
> Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
> Total devices 1 FS bytes used 1.80TiB
> devid1 size 1.82TiB used 1.82TiB path /dev/sdb1
>
> btrfs-progs v4.1.2
>
> [root@backups ~]# btrfs fi df /mnt/backup
> Data, single: total=1.79TiB, used=1.79TiB
> System, DUP: total=32.00MiB, used=240.00KiB
> Metadata, DUP: total=17.00GiB, used=15.83GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> Now I remove the oldest subvolume:
>
>
> [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160318/
> Delete subvolume (no-commit): '/mnt/backup/back20160318'
>
> [root@backups ~]# df  /mnt/backup
> S.ficheros Tamaño Usados  Disp Uso% Montado en
> /dev/sdb11,9T   1,9T   22M 100% /mnt/backup
>
> [root@backups ~]# btrfs fi show
> Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
> Total devices 1 FS bytes used 1.80TiB
> devid1 size 1.82TiB used 1.82TiB path /dev/sdb1
>
> [root@backups ~]# btrfs fi show
> Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
> Total devices 1 FS bytes used 1.80TiB
> devid1 size 1.82TiB used 1.82TiB path /dev/sdb1
>
> btrfs-progs v4.1.2
> [root@backups ~]# btrfs fi df /mnt/backup
> Data, single: total=1.79TiB, used=1.79TiB
> System, DUP: total=32.00MiB, used=240.00KiB
> Metadata, DUP: total=17.00GiB, used=15.83GiB
> GlobalReserve, single: total=512.00MiB, used=102.53MiB
>
>
>
> Now I remove 2 more subvolumes:
>
> [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160328/
> Delete subvolume (no-commit): '/mnt/backup/back20160328'
> [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160330/
> Delete subvolume (no-commit): '/mnt/backup/back20160330'
>
> [root@backups ~]# df /mnt/backup/
> S.ficheros Tamaño Usados  Disp Uso% Montado en
> /dev/sdb11,9T   1,9T  348M 100% /mnt/backup
>
> [root@backups ~]# btrfs fi show
> Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
> Total devices 1 FS bytes used 1.80TiB
> devid1 size 1.82TiB used 1.82TiB path /dev/sdb1
>
> btrfs-progs v4.1.2
>
> Data, single: total=1.79TiB, used=1.79TiB
> System, DUP: total=32.00MiB, used=240.00KiB
> Metadata, DUP: total=17.00GiB, used=15.83GiB
> GlobalReserve, single: total=512.00MiB, used=98.94MiB
>
>
> [root@backups ~]# ls -l /mnt/backup/
> total 0
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160401/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160404/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160406/
> drwxr-xr-x 1 root root 86 mar 20 16:23 back20160408/
>
>
> Now I will remove the resting subvolumes
>
> [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160401/
> Delete subvolume (no-commit): '/mnt/backup/back20160401'
> [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160404/
> Delete subvolume (no-commit): '/mnt/backup/back20160404'
> [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160406/
> Delete subvolume (no-commit): '/mnt/backup/back20160406'
> [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160408/
> Delete subvolume (no-commit): '/mnt/backup/back20160408'
>
> [root@backups ~]# ls -l /mnt/backup/
> total 0
>
> [root@backups ~]# df /mnt/backup/
> S.ficheros Tamaño Usados  Disp Uso% Montado en
> /dev/sdb11,9T   1,9T  4,6G 100% /mnt/backup
> [root@backups ~]# btrfs fi show
> Label: 'disco_backup'  uuid: cbfe8735-

[PATCH RFC 06/16] btrfs-progs: fsck: Introduce function to check referencer for data backref

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce new function check_extent_data_backref() to search referencer
for a given data backref.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 96 
 1 file changed, 96 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 1d1b198..8f971b9 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -8873,6 +8873,102 @@ out:
return 0;
 }
 
+/*
+ * Check referencer for normal(inlined) data ref
+ * If len == 0, it will be resolved by searching in extent tree
+ */
+static int check_extent_data_backref(struct btrfs_fs_info *fs_info,
+u64 root_id, u64 objectid, u64 offset,
+u64 bytenr, u64 len)
+{
+   struct btrfs_root *root;
+   struct btrfs_root *extent_root = fs_info->extent_root;
+   struct btrfs_key key;
+   struct btrfs_path path;
+   struct extent_buffer *leaf;
+   struct btrfs_file_extent_item *fi;
+   int slot;
+   int found_referencer = 0;
+   int ret = 0;
+
+   if (!len) {
+   key.objectid = bytenr;
+   key.type = BTRFS_EXTENT_ITEM_KEY;
+   key.offset = (u64)-1;
+
+   btrfs_init_path(&path);
+   ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
+   if (ret < 0)
+   goto out;
+   ret = btrfs_previous_extent_item(extent_root, &path, bytenr);
+   if (ret)
+   goto out;
+   btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+   if (key.objectid != bytenr ||
+   key.type != BTRFS_EXTENT_ITEM_KEY)
+   goto out;
+   len = key.offset;
+   btrfs_release_path(&path);
+   }
+   key.objectid = root_id;
+   btrfs_set_key_type(&key, BTRFS_ROOT_ITEM_KEY);
+   key.offset = (u64)-1;
+
+   root = btrfs_read_fs_root(fs_info, &key);
+   if (IS_ERR(root))
+   goto out;
+
+   btrfs_init_path(&path);
+   key.objectid = objectid;
+   key.type = BTRFS_EXTENT_DATA_KEY;
+   /* 
+* It can be nasty as data backref offset is
+* file offset - file extent offset, which is smaller or
+* equal to original backref offset.
+* The only special case is overflow.
+* So we need to special judgement and do further search
+*/
+   key.offset = offset & (1ULL << 63) ? 0 : offset;
+
+   ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
+   if (ret < 0)
+   goto out;
+
+   /* Search afterwards to get correct one */
+   while (1) {
+   leaf = path.nodes[0];
+   slot = path.slots[0];
+
+   btrfs_item_key_to_cpu(leaf, &key, slot);
+   if (key.objectid != objectid || key.type != 
BTRFS_EXTENT_DATA_KEY)
+   break;
+   fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
+   /*
+* Except normal disk bytenr and disk num bytes, we still
+* need to do extra check on dbackref offset as
+* dbackref offset = file_offset - file_extent_offset
+*/
+   if (btrfs_file_extent_disk_bytenr(leaf, fi) == bytenr &&
+   btrfs_file_extent_disk_num_bytes(leaf, fi) == len &&
+   (u64)(key.offset - btrfs_file_extent_offset(leaf, fi)) ==
+   offset) {
+   found_referencer = 1;
+   break;
+   }
+   ret = btrfs_next_item(root, &path);
+   if (ret)
+   break;
+   }
+out:
+   btrfs_release_path(&path);
+   if (!found_referencer) {
+   error("Extent[%llu, %llu] lost referencer(root: %llu, owner: 
%llu, offset: %llu)",
+ bytenr, len, root_id, objectid, offset);
+   return -MISSING_REFERENCER;
+   }
+   return 0;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 03/16] btrfs-progs: fsck: Introduce function to query tree block level

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function query_tree_block_level() to resolve tree block level
by reading out the tree block.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index d097edd..6633b6e 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -8716,6 +8716,26 @@ error:
return err;
 }
 
+/*
+ * Get real tree block level for case like shared block
+ * Return >= 0 as tree level
+ * Return <0 for error
+ */
+static int query_tree_block_level(struct btrfs_fs_info *fs_info, u64 bytenr)
+{
+   struct extent_buffer *eb;
+   u32 nodesize = btrfs_super_nodesize(fs_info->super_copy);
+   int ret = -EIO;
+
+   eb = read_tree_block_fs_info(fs_info, bytenr, nodesize, 0);
+   if (!extent_buffer_uptodate(eb))
+   goto out;
+   ret = btrfs_header_level(eb);
+out:
+   free_extent_buffer(eb);
+   return ret;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 09/16] btrfs-progs: fsck: Introduce function to check dev extent item

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function check_dev_extent_item() to find its referencer chunk.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 57 +
 1 file changed, 57 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 7f9f848..92c254f 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9126,6 +9126,63 @@ out:
return -err;
 }
 
+/*
+ * Check if a dev extent item is referred correctly by its chunk
+ */
+static int check_dev_extent_item(struct btrfs_fs_info *fs_info,
+struct extent_buffer *eb, int slot)
+{
+   struct btrfs_root *chunk_root = fs_info->chunk_root;
+   struct btrfs_dev_extent *ptr;
+   struct btrfs_path path;
+   struct btrfs_key key;
+   struct btrfs_key found_key;
+   struct btrfs_chunk *chunk;
+   struct extent_buffer *l;
+   int num_stripes;
+   u64 length;
+   int i;
+   int found_chunk = 0;
+   int ret;
+
+   btrfs_item_key_to_cpu(eb, &found_key, slot);
+   ptr = btrfs_item_ptr(eb, slot, struct btrfs_dev_extent);
+   length = btrfs_dev_extent_length(eb, ptr);
+
+   key.objectid = btrfs_dev_extent_chunk_objectid(eb, ptr);
+   key.type = BTRFS_CHUNK_ITEM_KEY;
+   key.offset = btrfs_dev_extent_chunk_offset(eb, ptr);
+
+   btrfs_init_path(&path);
+   ret = btrfs_search_slot(NULL, chunk_root, &key, &path, 0, 0);
+   if (ret)
+   goto out;
+
+   l = path.nodes[0];
+   chunk = btrfs_item_ptr(l, path.slots[0], struct btrfs_chunk);
+   if (btrfs_chunk_length(l, chunk) != length)
+   goto out;
+
+   num_stripes = btrfs_chunk_num_stripes(l, chunk);
+   for (i = 0; i < num_stripes; i++) {
+   u64 devid = btrfs_stripe_devid_nr(l, chunk, i);
+   u64 offset = btrfs_stripe_offset_nr(l, chunk, i);
+
+   if (devid == found_key.objectid && offset == found_key.offset) {
+   found_chunk= 1;
+   break;
+   }
+   }
+out:
+   btrfs_release_path(&path);
+   if (!found_chunk) {
+   error("Device extent[%llu, %llu, %llu] didn't find the relative 
chunk",
+  found_key.objectid, found_key.offset, length);
+   return -MISSING_REFERENCER;
+   }
+   return 0;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 02/16] btrfs-progs: fsck: Introduce function to check data backref in extent tree

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce a new function check_data_extent_item() to check if the
corresponding data backref exists in extent tree.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c  | 151 ++
 ctree.h   |   2 +
 extent-tree.c |   2 +-
 3 files changed, 154 insertions(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index 27fc26f..d097edd 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -322,6 +322,7 @@ struct root_item_info {
  */
 #define MISSING_BACKREF(1 << 0) /* Completely no backref in extent 
tree */
 #define BAD_BACKREF(1 << 1) /* Backref mismatch */
+#define UNALIGNED_BYTES(1 << 2) /* Some bytes are not aligned */
 
 static void *print_status_check(void *p)
 {
@@ -8565,6 +8566,156 @@ out:
return -err;
 }
 
+/*
+ * Check EXTENT_DATA item, mainly for its dbackref in extent tree
+ *
+ * Return <0 any error found and output error message
+ * Return 0 for no error found
+ */
+static int check_extent_data_item(struct btrfs_root *root,
+ struct extent_buffer *eb, int slot)
+{
+   struct btrfs_file_extent_item *fi;
+   struct btrfs_path path;
+   struct btrfs_root *extent_root = root->fs_info->extent_root;
+   struct btrfs_key key;
+   struct btrfs_key found_key;
+   struct extent_buffer *leaf;
+   struct btrfs_extent_item *ei;
+   struct btrfs_extent_inline_ref *iref;
+   struct btrfs_extent_data_ref *dref;
+   u64 owner;
+   u64 file_extent_gen;
+   u64 disk_bytenr;
+   u64 disk_num_bytes;
+   u64 extent_num_bytes;
+   u64 extent_flags;
+   u64 extent_gen;
+   u32 item_size;
+   unsigned long end;
+   unsigned long ptr;
+   int type;
+   u64 ref_root;
+   int found_dbackref = 0;
+   int err = 0;
+   int ret;
+
+   btrfs_item_key_to_cpu(eb, &key, slot);
+   fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item);
+   file_extent_gen = btrfs_file_extent_generation(eb, fi);
+
+   /* Nothing to check for hole and inline data extents */
+   if (btrfs_file_extent_type(eb, fi) == BTRFS_FILE_EXTENT_INLINE ||
+   btrfs_file_extent_disk_bytenr(eb, fi) == 0)
+   return 0;
+
+   disk_bytenr = btrfs_file_extent_disk_bytenr(eb, fi);
+   disk_num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi);
+   extent_num_bytes = btrfs_file_extent_num_bytes(eb, fi);
+   /* Check unaligned disk_num_bytes and num_bytes */
+   if (!IS_ALIGNED(disk_num_bytes, root->sectorsize)) {
+   error("File extent [%llu, %llu] has unaligned disk num bytes: 
%llu, should be aligned to %u",
+ key.objectid, key.offset, disk_num_bytes,
+ root->sectorsize);
+   err |= UNALIGNED_BYTES;
+   } else
+   data_bytes_allocated += disk_num_bytes;
+   if (!IS_ALIGNED(extent_num_bytes, root->sectorsize)) {
+   error("File extent [%llu, %llu] has unaligned num bytes: %llu, 
should be aligned to %u",
+ key.objectid, key.offset, extent_num_bytes,
+ root->sectorsize);
+   err |= UNALIGNED_BYTES;
+   } else
+   data_bytes_referenced += extent_num_bytes;
+   owner = btrfs_header_owner(eb);
+
+   /* Check the data backref in extent tree */
+   btrfs_init_path(&path);
+   key.objectid = btrfs_file_extent_disk_bytenr(eb, fi);
+   key.type = BTRFS_EXTENT_ITEM_KEY;
+   key.offset = btrfs_file_extent_disk_num_bytes(eb, fi);
+
+   ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
+   if (ret) {
+   err |= MISSING_BACKREF;
+   goto error;
+   }
+
+   leaf = path.nodes[0];
+   slot = path.slots[0];
+   btrfs_item_key_to_cpu(leaf, &found_key, slot);
+   ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item);
+
+   extent_flags = btrfs_extent_flags(leaf, ei);
+   extent_gen = btrfs_extent_generation(leaf, ei);
+
+   btrfs_item_key_to_cpu(eb, &key, slot);
+   if (!(extent_flags & BTRFS_EXTENT_FLAG_DATA)) {
+   error("Extent[%llu %llu] backref type mismatch, wanted bit: 
%llx",
+ disk_bytenr, disk_num_bytes,
+ BTRFS_EXTENT_FLAG_DATA);
+   err |= BAD_BACKREF;
+   }
+
+   if (file_extent_gen != extent_gen) {
+   error("Extent[%llu %llu] backref generation mismatch, wanted: 
%llu, have: %llu",
+ disk_bytenr, disk_num_bytes, file_extent_gen,
+ extent_gen);
+   err = BAD_BACKREF;
+   }
+
+   /* Check data backref */
+   item_size = btrfs_item_size_nr(leaf, path.slots[0]);
+   iref = (struct btrfs_extent_inline_ref *)(ei + 1);
+   ptr = (unsigned long)iref;
+   end = (unsigned long)ei + item_size;
+   while (ptr < end) {
+   iref = (struc

[PATCH RFC 15/16] btrfs-progs: fsck: Introduce traversal function for fsck

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce a new function traversal_tree_block() to do pre-order
traversal, to co-operate with new fs/subvolume tree skip function.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 77 
 1 file changed, 77 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 92f8aa1..85d6cf4 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9644,6 +9644,83 @@ need_check:
return 1;
 }
 
+/*
+ * Traversal function for tree block.
+ * We will do
+ * 1) Skip shared fs/subvolume tree blocks
+ * 2) Update related bytes accounting
+ * 3) Pre-order traversal
+ */
+static int traversal_tree_block(struct btrfs_root *root,
+   struct extent_buffer *node)
+{
+   struct extent_buffer *eb;
+   int level;
+   u64 nr;
+   int i;
+   int err = 0;
+   int ret;
+
+   /*
+* Skip shared fs/subvolume tree block, in that case they will
+* be checked by referencer with lowest rootid
+*/
+   if (is_fstree(root->objectid) && !should_check(root, node))
+   return 0;
+
+   /* Update bytes accounting */
+   total_btree_bytes += node->len;
+   if (fs_root_objectid(btrfs_header_owner(node)))
+   total_fs_tree_bytes += node->len;
+   if (btrfs_header_owner(node) == BTRFS_EXTENT_TREE_OBJECTID)
+   total_extent_tree_bytes += node->len;
+   if (!found_old_backref &&
+   btrfs_header_owner(node) == BTRFS_TREE_RELOC_OBJECTID &&
+   btrfs_header_backref_rev(node) == BTRFS_MIXED_BACKREF_REV &&
+   !btrfs_header_flag(node, BTRFS_HEADER_FLAG_RELOC))
+   found_old_backref = 1;
+
+   /* pre-order tranversal, check itself first */
+   level = btrfs_header_level(node);
+   ret = check_tree_block_ref(root, node, btrfs_header_bytenr(node),
+  btrfs_header_level(node),
+  btrfs_header_owner(node));
+   err |= -ret;
+   if (err)
+   error("check %s failed root %llu bytenr %llu level %d, force 
continue check",
+ level ? "node":"leaf", root->objectid,
+ btrfs_header_bytenr(node), btrfs_header_level(node));
+
+   if (!level) {
+   btree_space_waste += btrfs_leaf_free_space(root, node);
+   ret = check_leaf_items(root, node);
+   err |= -ret;
+   return -err;
+   }
+
+   nr = btrfs_header_nritems(node);
+   btree_space_waste += (BTRFS_NODEPTRS_PER_BLOCK(root) - nr) *
+   sizeof(struct btrfs_key_ptr);
+
+   /* Then check all its children */
+   for (i = 0; i < nr; i++) {
+   u64 blocknr = btrfs_node_blockptr(node, i);
+
+   /*
+* As a btrfs tree has most 8 levels(0~7), so it's quite
+* safe to call the function itself.
+*/
+   eb = read_tree_block(root, blocknr, root->nodesize, 0);
+   if (extent_buffer_uptodate(eb)) {
+   ret = traversal_tree_block(root, eb);
+   err |= -ret;
+   }
+   free_extent_buffer(eb);
+   }
+
+   return -err;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 01/16] btrfs-progs: fsck: Introduce function to check tree block backref in extent tree

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function check_tree_block_ref() to check whether a tree block
has correct backref in extent tree.

Unlike old extent tree check method, we only use search_slot() to search
reference, no extra structure will be allocated in heap to record what we
have checked.

This method may cause a little more IO, but should work for super large
fs without triggering OOM.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 163 +++
 1 file changed, 163 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index d59968b..27fc26f 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -313,6 +313,16 @@ struct root_item_info {
struct cache_extent cache_extent;
 };
 
+/*
+ * Error bit for low memory mode check.
+ * Return value should be - (ERR_BIT1 | ERR_BIT2 | ...)
+ *
+ * Current no caller cares about it yet.
+ * Just as an internal use error classification
+ */
+#define MISSING_BACKREF(1 << 0) /* Completely no backref in extent 
tree */
+#define BAD_BACKREF(1 << 1) /* Backref mismatch */
+
 static void *print_status_check(void *p)
 {
struct task_ctx *priv = p;
@@ -8402,6 +8412,159 @@ loop:
goto again;
 }
 
+/*
+ * Check backrefs of a tree block given by @bytenr or @eb.
+ *
+ * @root:  the root containin the @bytenr or @eb
+ * @eb:tree block extent buffer, can be NULL
+ * @bytenr:bytenr of the tree block to search
+ * @level: tree level of the tree block
+ * @owner: owner of the tree block
+ *
+ * Return < 0 for any error found and output error message
+ * Return 0 for no error found
+ */
+static int check_tree_block_ref(struct btrfs_root *root,
+   struct extent_buffer *eb, u64 bytenr,
+   int level, u64 owner)
+{
+   struct btrfs_key key;
+   struct btrfs_key found_key;
+   struct btrfs_root *extent_root = root->fs_info->extent_root;
+   struct btrfs_path path;
+   struct btrfs_extent_item *ei;
+   struct btrfs_extent_inline_ref *iref;
+   struct extent_buffer *leaf;
+   unsigned long end;
+   unsigned long ptr;
+   int slot;
+   int skinny_level;
+   int type;
+   u32 nodesize = root->nodesize;
+   u32 item_size;
+   u64 offset;
+   int found_ref = 0;
+   int err = 0;
+   int ret;
+
+   btrfs_init_path(&path);
+   key.objectid = bytenr;
+   if (btrfs_fs_incompat(root->fs_info,
+ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA))
+   key.type = BTRFS_METADATA_ITEM_KEY;
+   else
+   key.type = BTRFS_EXTENT_ITEM_KEY;
+   key.offset = (u64)-1;
+
+   /* Search for the backref in extent tree */
+   ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
+   if (ret < 0) {
+   err = MISSING_BACKREF;
+   goto out;
+   }
+   ret = btrfs_previous_extent_item(extent_root, &path, bytenr);
+   if (ret) {
+   err = MISSING_BACKREF;
+   goto out;
+   }
+
+   leaf = path.nodes[0];
+   slot = path.slots[0];
+   btrfs_item_key_to_cpu(leaf, &found_key, slot);
+
+   ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item);
+
+   if (btrfs_key_type(&found_key) == BTRFS_METADATA_ITEM_KEY) {
+   skinny_level = (int)found_key.offset;
+   iref = (struct btrfs_extent_inline_ref *)(ei + 1);
+   } else {
+   struct btrfs_tree_block_info *info;
+
+   info = (struct btrfs_tree_block_info *)(ei + 1);
+   skinny_level = btrfs_tree_block_level(leaf, info);
+   iref = (struct btrfs_extent_inline_ref *)(info + 1);
+   }
+
+   if (eb) {
+   u64 header_gen;
+   u64 extent_gen;
+
+   if (!(btrfs_extent_flags(leaf, ei) &
+ BTRFS_EXTENT_FLAG_TREE_BLOCK)) {
+   error("Extent[%llu %u] backref type mismatch, missing 
bit: %llx",
+ found_key.objectid, nodesize,
+ BTRFS_EXTENT_FLAG_TREE_BLOCK);
+   err = BAD_BACKREF;
+   }
+   header_gen = btrfs_header_generation(eb);
+   extent_gen = btrfs_extent_generation(leaf, ei);
+   if (header_gen != extent_gen) {
+   error("Extent[%llu %u] backref generation mismatch, 
wanted: %llu, have: %llu",
+ found_key.objectid, nodesize, header_gen,
+ extent_gen);
+   err = BAD_BACKREF;
+   }
+   if (level != skinny_level) {
+   error("Extent[%llu %u] level mismatch, wanted: %u, 
have: %u",
+ found_key.objectid, nodesize, level, 
skinny_level);
+   err = BAD_BACKREF;
+   }
+   if (!is_fstree(own

[PATCH RFC 14/16] btrfs-progs: fsck: Introduce function to speed up fs tree check

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function should_check() to reduced duplicated tree block check
for fs/subvolume tree.

The idea is, we only check the fs/subvolue tree block if we have the
highest referencer rootid, according to extent tree.

In that case, we can skip a lot of fs/subvolume tree block check if
there are a lot of snapshots.

Although we will do a lot of extent tree search for it.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 90 
 1 file changed, 90 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index db6fc8e..92f8aa1 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9554,6 +9554,96 @@ next:
return err;
 }
 
+/*
+ * Helper function for later fs/subvol tree check.
+ * To determine if a tree block should be checked.
+ * This function will ensure only the directly referencer with lowest 
+ * rootid to check a fs/subvolume tree block.
+ *
+ * Backref check at extent tree would detect error like missing subvolume
+ * tree, so we can do aggressive judgement to reduce duplicated check.
+ */
+static int should_check(struct btrfs_root *root, struct extent_buffer *eb)
+{
+   struct btrfs_root *extent_root = root->fs_info->extent_root;
+   struct btrfs_key key;
+   struct btrfs_path path;
+   struct extent_buffer *leaf;
+   int slot;
+   struct btrfs_extent_item *ei;
+   unsigned long ptr;
+   unsigned long end;
+   int type;
+   u32 item_size;
+   u64 offset;
+   struct btrfs_extent_inline_ref *iref;
+   int ret;
+
+   btrfs_init_path(&path);
+   key.objectid = btrfs_header_bytenr(eb);
+   key.type = BTRFS_METADATA_ITEM_KEY;
+   key.offset = (u64)-1;
+
+   /*
+* Any failure in backref resolving means we can't determine
+* who the tree block belongs to.
+* So in that case, we need to check that tree block
+*/
+   ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
+   if (ret < 0)
+   goto need_check;
+
+   ret = btrfs_previous_extent_item(extent_root, &path,
+btrfs_header_bytenr(eb));
+   if (ret)
+   goto need_check;
+
+   leaf = path.nodes[0];
+   slot = path.slots[0];
+   btrfs_item_key_to_cpu(leaf, &key, slot);
+   ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item);
+
+   if (key.type == BTRFS_METADATA_ITEM_KEY) {
+   iref = (struct btrfs_extent_inline_ref *)(ei + 1);
+   } else {
+   struct btrfs_tree_block_info *info;
+
+   info = (struct btrfs_tree_block_info *)(ei + 1);
+   iref = (struct btrfs_extent_inline_ref *)(info + 1);
+   }
+
+   item_size = btrfs_item_size_nr(leaf, slot);
+   ptr = (unsigned long)iref;
+   end = (unsigned long)ei + item_size;
+   while (ptr < end) {
+   iref = (struct btrfs_extent_inline_ref *)ptr;
+   type = btrfs_extent_inline_ref_type(leaf, iref);
+   offset = btrfs_extent_inline_ref_offset(leaf, iref);
+
+   /*
+* We only check the tree block if current root is
+* the lowest referencer of it.
+*/
+   if (type == BTRFS_TREE_BLOCK_REF_KEY &&
+   offset < root->objectid) {
+   btrfs_release_path(&path);
+   return 0;
+   }
+
+   ptr += btrfs_extent_inline_ref_size(type);
+   }
+   /*
+* Normally we should also check keyed tree block ref,
+* but that may be very time consuming.
+* Inlined ref should already make us skip a lot of refs now.
+* So skip search keyed tree block ref.
+*/
+
+need_check:
+   btrfs_release_path(&path);
+   return 1;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 10/16] btrfs-progs: fsck: Introduce function to check dev used space

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function check_dev_item() to check used space with dev extent
items.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 64 
 1 file changed, 64 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 92c254f..e2d1ebf 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -328,6 +328,7 @@ struct root_item_info {
 #define CROSSING_STRIPE_BOUNDARY (1 << 4) /* For kernel scrub workaround */
 #define BAD_ITEM_SIZE  (1 << 5) /* Bad item size */
 #define UNKNOWN_TYPE   (1 << 6) /* Unknown type */
+#define ACCOUNTING_MISMATCH (1 << 7) /* Used space accounting error */
 
 static void *print_status_check(void *p)
 {
@@ -9183,6 +9184,69 @@ out:
return 0;
 }
 
+/*
+ * Check the used space is correct with the dev item
+ */
+static int check_dev_item(struct btrfs_fs_info *fs_info,
+ struct extent_buffer *eb, int slot)
+{
+   struct btrfs_root *dev_root = fs_info->dev_root;
+   struct btrfs_dev_item *dev_item;
+   struct btrfs_path path;
+   struct btrfs_key key;
+   struct btrfs_dev_extent *ptr;
+   u64 dev_id;
+   u64 used;
+   u64 total = 0;
+   int ret;
+
+   dev_item = btrfs_item_ptr(eb, slot, struct btrfs_dev_item);
+   dev_id = btrfs_device_id(eb, dev_item);
+   used = btrfs_device_bytes_used(eb, dev_item);
+
+   key.objectid = dev_id;
+   key.type = BTRFS_DEV_EXTENT_KEY;
+   key.offset = 0;
+
+   btrfs_init_path(&path);
+   ret = btrfs_search_slot(NULL, dev_root, &key, &path, 0, 0);
+   if (ret < 0) {
+   btrfs_item_key_to_cpu(eb, &key, slot);
+   error("Couldn't find any releative dev extent for dev[%llu, %u, 
%llu]",
+ key.objectid, key.type, key.offset);
+   btrfs_release_path(&path);
+   return -MISSING_REFERENCER;
+   }
+
+   /* Iterate dev_extents to calculate the used space of a device */
+   while (1) {
+   btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+
+   if (key.objectid > dev_id)
+   break;
+   if (key.type != BTRFS_DEV_EXTENT_KEY || key.objectid != dev_id)
+   goto next;
+
+   ptr = btrfs_item_ptr(path.nodes[0], path.slots[0],
+struct btrfs_dev_extent);
+   total += btrfs_dev_extent_length(path.nodes[0], ptr);
+next:
+   ret = btrfs_next_item(dev_root, &path);
+   if (ret)
+   break;
+   }
+   btrfs_release_path(&path);
+
+   if (used != total) {
+   btrfs_item_key_to_cpu(eb, &key, slot);
+   error("Dev extent's total-byte(%llu) is not equal to 
byte-used(%llu) in dev[%llu, %u, %llu]",
+ total, used, BTRFS_ROOT_TREE_OBJECTID,
+ BTRFS_DEV_EXTENT_KEY, dev_id);
+   return -ACCOUNTING_MISMATCH;
+   }
+   return 0;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 04/16] btrfs-progs: fsck: Introduce function to check referencer of a backref

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce a new function check_tree_block_backref() to check if a
backref points to correct referencer.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 95 
 1 file changed, 95 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 6633b6e..81dd4f3 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -323,6 +323,8 @@ struct root_item_info {
 #define MISSING_BACKREF(1 << 0) /* Completely no backref in extent 
tree */
 #define BAD_BACKREF(1 << 1) /* Backref mismatch */
 #define UNALIGNED_BYTES(1 << 2) /* Some bytes are not aligned */
+#define MISSING_REFERENCER (1 << 3) /* Referencer not found */
+#define BAD_REFERENCER (1 << 4) /* Referencer found, but not mismatch */
 
 static void *print_status_check(void *p)
 {
@@ -8736,6 +8738,99 @@ out:
return ret;
 }
 
+/*
+ * Check if a tree block backref is valid (points to valid tree block)
+ * if level == -1, level will be resolved
+ */
+static int check_tree_block_backref(struct btrfs_fs_info *fs_info, u64 root_id,
+   u64 bytenr, int level)
+{
+   struct btrfs_root *root;
+   struct btrfs_key key;
+   struct btrfs_path path;
+   struct extent_buffer *eb;
+   struct extent_buffer *node;
+   u32 nodesize = btrfs_super_nodesize(fs_info->super_copy);
+   int err = 0;
+   int ret;
+
+   /* Query level for level == -1 special case */
+   if (level == -1)
+   level = query_tree_block_level(fs_info, bytenr);
+   if (level < 0) {
+   err = MISSING_REFERENCER;
+   goto out;
+   }
+
+   key.objectid = root_id;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+   key.offset = (u64)-1;
+
+   root = btrfs_read_fs_root(fs_info, &key);
+   if (IS_ERR(root)) {
+   err |= MISSING_REFERENCER;
+   goto out;
+   }
+
+   /* Read out the tree block to get item/node key */
+   eb = read_tree_block(root, bytenr, root->nodesize, 0);
+   /* Impossible, as tree block query has read out the tree block */
+   if (!extent_buffer_uptodate(eb)) {
+   err |= MISSING_REFERENCER;
+   free_extent_buffer(eb);
+   goto out;
+   }
+
+   /* Empty tree, no need to check key */
+   if (!btrfs_header_nritems(eb) && !level) {
+   free_extent_buffer(eb);
+   goto out;
+   }
+
+   if (level)
+   btrfs_node_key_to_cpu(eb, &key, 0);
+   else
+   btrfs_item_key_to_cpu(eb, &key, 0);
+
+   free_extent_buffer(eb);
+
+   btrfs_init_path(&path);
+   /* Search with the first key, to ensure we can reach it */
+   ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0);
+   if (ret) {
+   err |= MISSING_REFERENCER;
+   goto release_out;
+   }
+
+   node = path.nodes[level];
+   if (btrfs_header_bytenr(node) != bytenr) {
+   error("Extent [%llu %d] referencer bytenr mismatch, wanted: 
%llu, have: %llu",
+ bytenr, nodesize, bytenr,
+ btrfs_header_bytenr(node));
+   err |= BAD_REFERENCER;
+   }
+   if (btrfs_header_level(node) != level) {
+   error("Extent [%llu %d] referencer level mismatch, wanted: %d, 
have: %d",
+ bytenr, nodesize, level,
+ btrfs_header_level(node));
+   err |= BAD_REFERENCER;
+   }
+
+release_out:
+   btrfs_release_path(&path);
+out:
+   if (err & MISSING_REFERENCER) {
+   if (level < 0)
+   error("Extent [%llu %d] lost referencer(owner: %llu)",
+  bytenr, nodesize, root_id);
+   else
+   error("Extent [%llu %d] lost referencer(owner: %llu, 
level: %u)",
+  bytenr, nodesize, root_id, level);
+   }
+
+   return -err;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 08/16] btrfs-progs: fsck: Introduce function to check an extent

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function check_extent_item() using previous introduced
functions.

With previous function to check referencer and backref, this function
can be quite easy.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 113 +++
 1 file changed, 113 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 5588898..7f9f848 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -325,6 +325,9 @@ struct root_item_info {
 #define UNALIGNED_BYTES(1 << 2) /* Some bytes are not aligned */
 #define MISSING_REFERENCER (1 << 3) /* Referencer not found */
 #define BAD_REFERENCER (1 << 4) /* Referencer found, but not mismatch */
+#define CROSSING_STRIPE_BOUNDARY (1 << 4) /* For kernel scrub workaround */
+#define BAD_ITEM_SIZE  (1 << 5) /* Bad item size */
+#define UNKNOWN_TYPE   (1 << 6) /* Unknown type */
 
 static void *print_status_check(void *p)
 {
@@ -9013,6 +9016,116 @@ out:
return 0;
 }
 
+/*
+ * This function will check a given extent item, including its backref and
+ * itself (like crossing stripe boundary and type)
+ *
+ * Since we don't use extent_record anymore, introduce new error bit
+ */
+static int check_extent_item(struct btrfs_fs_info *fs_info,
+struct extent_buffer *eb, int slot, int metadata)
+{
+   struct btrfs_extent_item *ei;
+   struct btrfs_extent_inline_ref *iref;
+   struct btrfs_extent_data_ref *dref;
+   unsigned long end;
+   unsigned long ptr;
+   int type;
+   u32 nodesize = btrfs_super_nodesize(fs_info->super_copy);
+   u32 item_size = btrfs_item_size_nr(eb, slot);
+   u64 flags;
+   u64 offset;
+   int level;
+   struct btrfs_key key;
+   int ret;
+   int err = 0;
+
+   btrfs_item_key_to_cpu(eb, &key, slot);
+
+   /*
+* XXX: Do we really need to handle such historic
+* extent structure?
+*/
+   if (item_size < sizeof(*ei)) {
+#ifdef BTRFS_COMPAT_EXTENT_TREE_V0
+   struct btrfs_extent_item_v0 *ei0;
+
+   BUG_ON(item_size != sizeof(*ei0));
+   return 1;
+#else
+   BUG();
+#endif
+   }
+
+   if (metadata && check_crossing_stripes(key.objectid, eb->len)) {
+   error("bad metadata [%llu, %llu) crossing stripe boundary",
+ key.objectid, key.objectid + nodesize);
+   err |= CROSSING_STRIPE_BOUNDARY;
+   }
+
+   ei = btrfs_item_ptr(eb, slot, struct btrfs_extent_item);
+   flags = btrfs_extent_flags(eb, ei);
+
+   ptr = (unsigned long)(ei + 1);
+   if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK && !metadata) {
+   struct btrfs_tree_block_info *info;
+
+   info = (struct btrfs_tree_block_info *)ptr;
+   level = btrfs_tree_block_level(eb, info);
+   ptr += sizeof(struct btrfs_tree_block_info);
+   } else
+   level = key.offset;
+   end = (unsigned long)ei + item_size;
+
+   if (ptr >= end) {
+   err |= BAD_ITEM_SIZE;
+   goto out;
+   }
+
+   /* Now check every backref in this extent item */
+next:
+   iref = (struct btrfs_extent_inline_ref *)ptr;
+   type = btrfs_extent_inline_ref_type(eb, iref);
+   offset = btrfs_extent_inline_ref_offset(eb, iref);
+   switch (type) {
+   case BTRFS_TREE_BLOCK_REF_KEY:
+   ret = check_tree_block_backref(fs_info, offset, key.objectid,
+  level);
+   err |= -ret;
+   break;
+   case BTRFS_SHARED_BLOCK_REF_KEY:
+   ret = check_shared_block_backref(fs_info, offset, key.objectid,
+level);
+   err |= -ret;
+   break;
+   case BTRFS_EXTENT_DATA_REF_KEY:
+   dref = (struct btrfs_extent_data_ref *)(&iref->offset);
+   ret = check_extent_data_backref(fs_info,
+   btrfs_extent_data_ref_root(eb, dref),
+   btrfs_extent_data_ref_objectid(eb, dref),
+   btrfs_extent_data_ref_offset(eb, dref),
+   key.objectid, key.offset);
+   err |= -ret;
+   break;
+   case BTRFS_SHARED_DATA_REF_KEY:
+   ret = check_shared_data_backref(fs_info, offset, key.objectid);
+   err |= -ret;
+   break;
+   default:
+   error("Extent[%llu %d %llu] has unknown ref type: %d",
+ key.objectid, key.type, key.offset, type);
+   err |= UNKNOWN_TYPE;
+   goto out;
+   }
+
+   ptr += btrfs_extent_inline_ref_size(type);
+   if (ptr < end)
+   goto next;
+
+out:
+   return -err;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root

[PATCH RFC 11/16] btrfs-progs: fsck: Introduce function to check block group item

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function check_block_group_item() to check a block group item.
It will check the referencer chunk and the used space accounting with
extent tree.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 116 +++
 1 file changed, 116 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index e2d1ebf..b9fbb02 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -329,6 +329,7 @@ struct root_item_info {
 #define BAD_ITEM_SIZE  (1 << 5) /* Bad item size */
 #define UNKNOWN_TYPE   (1 << 6) /* Unknown type */
 #define ACCOUNTING_MISMATCH (1 << 7) /* Used space accounting error */
+#define MISMATCH_TYPE  (1 << 8)
 
 static void *print_status_check(void *p)
 {
@@ -9247,6 +9248,121 @@ next:
return 0;
 }
 
+/*
+ * Check a block group item with its referener(chunk) and its used space
+ * with extent/metadata item
+ */
+static int check_block_group_item(struct btrfs_fs_info *fs_info,
+ struct extent_buffer *eb, int slot)
+{
+   struct btrfs_root *extent_root = fs_info->extent_root;
+   struct btrfs_root *chunk_root = fs_info->chunk_root;
+   struct btrfs_block_group_item *bi;
+   struct btrfs_block_group_item bg_item;
+   struct btrfs_path path;
+   struct btrfs_key key;
+   struct btrfs_key found_key;
+   struct btrfs_chunk *chunk;
+   struct extent_buffer *leaf;
+   struct btrfs_extent_item *ei;
+   u32 nodesize = btrfs_super_nodesize(fs_info->super_copy);
+   u64 flags;
+   u64 bg_flags;
+   u64 used;
+   u64 total = 0;
+   int ret;
+   int err = 0;
+
+   btrfs_item_key_to_cpu(eb, &found_key, slot);
+   bi = btrfs_item_ptr(eb, slot, struct btrfs_block_group_item);
+   read_extent_buffer(eb, &bg_item, (unsigned long)bi, sizeof(bg_item));
+   used = btrfs_block_group_used(&bg_item);
+   bg_flags = btrfs_block_group_flags(&bg_item);
+
+   key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
+   key.type = BTRFS_CHUNK_ITEM_KEY;
+   key.offset = found_key.objectid;
+
+   btrfs_init_path(&path);
+   /* Search for the referencer chunk */
+   ret = btrfs_search_slot(NULL, chunk_root, &key, &path, 0, 0);
+   if (ret) {
+   error("Block group[%llu %llu] didn't find the releative chunk 
item",
+ found_key.objectid, found_key.offset);
+   err |= MISSING_REFERENCER;
+   } else {
+   chunk = btrfs_item_ptr(path.nodes[0], path.slots[0],
+   struct btrfs_chunk);
+   if (btrfs_chunk_length(path.nodes[0], chunk) !=
+   found_key.offset) {
+   error("Block group[%llu %llu] relative chunk item 
length don't match",
+ found_key.objectid, found_key.offset);
+   err |= BAD_REFERENCER;
+   }
+   }
+   btrfs_release_path(&path);
+
+   key.objectid = 0;
+   key.type = BTRFS_METADATA_ITEM_KEY;
+   key.offset = found_key.objectid;
+
+   btrfs_init_path(&path);
+   ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
+   if (ret < 0)
+   goto out;
+
+   /* Iterate extent tree to account used space */
+   while (1) {
+   leaf = path.nodes[0];
+   btrfs_item_key_to_cpu(leaf, &key, path.slots[0]);
+   if (key.objectid >= found_key.objectid + found_key.offset)
+   break;
+
+   if (key.type != BTRFS_METADATA_ITEM_KEY &&
+   key.type != BTRFS_EXTENT_ITEM_KEY)
+   goto next;
+   if (key.objectid < found_key.objectid)
+   goto next;
+
+   if (key.type == BTRFS_METADATA_ITEM_KEY)
+   total += nodesize;
+   else
+   total += key.offset;
+
+   ei = btrfs_item_ptr(leaf, path.slots[0],
+   struct btrfs_extent_item);
+   flags = btrfs_extent_flags(leaf, ei);
+   if (flags & BTRFS_EXTENT_FLAG_DATA) {
+   if (!(bg_flags & BTRFS_BLOCK_GROUP_DATA)) {
+   error("bad extent[%llu, %llu) type mismatch 
with chunk",
+ key.objectid, key.objectid + key.offset);
+   err |= MISMATCH_TYPE;
+   }
+   } else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) {
+   if (!(bg_flags & (BTRFS_BLOCK_GROUP_SYSTEM |
+   BTRFS_BLOCK_GROUP_METADATA))) {
+   error("bad extent[%llu, %llu) type mismatch 
with chunk",
+ key.objectid, key.objectid + nodesize);
+   err |= MISMATCH_TYPE;
+   }
+

[PATCH RFC 12/16] btrfs-progs: fsck: Introduce function to check chunk item

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function check_chunk_item() to check a chunk item.
It will check all chunk stripes with dev extents and the corresponding
block group item.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 109 +++
 1 file changed, 109 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index b9fbb02..a02db07 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9363,6 +9363,115 @@ out:
return -err;
 }
 
+/*
+ * Check a chunk item.
+ * Including checking all referred dev_extents and block group
+ */
+static int check_chunk_item(struct btrfs_fs_info *fs_info,
+   struct extent_buffer *eb, int slot)
+{
+   struct btrfs_root *extent_root = fs_info->extent_root;
+   struct btrfs_root *dev_root = fs_info->dev_root;
+   struct btrfs_path path;
+   struct btrfs_key key;
+   struct btrfs_key found_key;
+   struct btrfs_chunk *chunk;
+   struct extent_buffer *leaf;
+   struct btrfs_block_group_item *bi;
+   struct btrfs_block_group_item bg_item;
+   struct btrfs_dev_extent *ptr;
+   u32 sectorsize = btrfs_super_sectorsize(fs_info->super_copy);
+   u64 length;
+   u64 type;
+   u64 profile;
+   int num_stripes;
+   u64 offset;
+   u64 objectid;
+   int i;
+   int ret;
+   int err = 0;
+
+   btrfs_item_key_to_cpu(eb, &found_key, slot);
+   chunk = btrfs_item_ptr(eb, slot, struct btrfs_chunk);
+   length = btrfs_chunk_length(eb, chunk);
+   if (!IS_ALIGNED(length, sectorsize)) {
+   error("Chunk[%llu %llu] length %llu not aligned to %u",
+ found_key.objectid, found_key.offset,
+ length, sectorsize);
+   err |= UNALIGNED_BYTES;
+   goto out;
+   }
+
+   type = btrfs_chunk_type(eb, chunk);
+   profile = type & BTRFS_BLOCK_GROUP_PROFILE_MASK;
+   if (!(type & BTRFS_BLOCK_GROUP_TYPE_MASK)) {
+   error("Chunk[%llu %llu] has no chunk type",
+ found_key.objectid, found_key.offset);
+   err |= UNKNOWN_TYPE;
+   }
+   if (profile && (profile & (profile - 1))) {
+   error("Chunk[%llu %llu] multiple profiled detected",
+ found_key.objectid, found_key.offset);
+   err |= UNKNOWN_TYPE;
+   }
+
+   key.objectid = found_key.offset;
+   btrfs_set_key_type(&key, BTRFS_BLOCK_GROUP_ITEM_KEY);
+   key.offset = length;
+
+   btrfs_init_path(&path);
+   ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0);
+   if (ret) {
+   error("Chunk[%llu %llu] didn't find the releative block group 
item",
+ found_key.objectid, found_key.offset);
+   err |= MISSING_REFERENCER;
+   } else{
+   leaf = path.nodes[0];
+   bi = btrfs_item_ptr(leaf, path.slots[0],
+   struct btrfs_block_group_item);
+   read_extent_buffer(leaf, &bg_item, (unsigned long)bi,
+  sizeof(bg_item));
+   if (btrfs_block_group_flags(&bg_item) != type) {
+   error("Chunk[%llu %llu] releative block group item 
flags mismatch, wanted: %llu, have: %llu",
+ found_key.objectid, found_key.offset, type,
+ btrfs_block_group_flags(&bg_item));
+   err |= MISSING_REFERENCER;
+   }
+   }
+
+   num_stripes = btrfs_chunk_num_stripes(eb, chunk);
+   for (i = 0; i < num_stripes; i++) {
+   btrfs_release_path(&path);
+   btrfs_init_path(&path);
+   key.objectid = btrfs_stripe_devid_nr(eb, chunk, i);
+   btrfs_set_key_type(&key, BTRFS_DEV_EXTENT_KEY);
+   key.offset = btrfs_stripe_offset_nr(eb, chunk, i);
+
+   ret = btrfs_search_slot(NULL, dev_root, &key, &path, 0, 0);
+   if (ret)
+   goto not_match_dev;
+
+   leaf = path.nodes[0];
+   ptr = btrfs_item_ptr(leaf, path.slots[0],
+struct btrfs_dev_extent);
+   objectid = btrfs_dev_extent_chunk_objectid(leaf, ptr);
+   offset = btrfs_dev_extent_chunk_offset(leaf, ptr);
+   if (objectid != found_key.objectid ||
+   offset != found_key.offset ||
+   btrfs_dev_extent_length(leaf, ptr) != length)
+   goto not_match_dev;
+   continue;
+not_match_dev:
+   err |= MISSING_BACKREF;
+   error("Chunk[%llu %llu] stripe %d didn't find the releative dev 
extent",
+ found_key.objectid, found_key.offset, i);
+   continue;
+   }
+   btrfs_release_path(&path);
+out:
+   return -err;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_tr

[PATCH RFC 00/16] Introduce low memory usage btrfsck mode

2016-04-25 Thread Qu Wenruo
The branch can be fetched from my github:
https://github.com/adam900710/btrfs-progs.git low_mem_fsck_rebasing

Original btrfsck checks extent tree in a very efficient method, by
recording every checked extent in extent record tree to ensure every
extent will be iterated for at most 2 times.

However extent records are all stored in heap memory, and consider how
large a btrfs file system can be, it can easily eat up all memory and
cause OOM for TB-sized metadata.

Instead of such heap memory usage, we introduce low memory usage fsck
mode.

In this mode, we will use btrfs_search_slot() only and avoid any heap
memory allocation.

The work flow is:
1) Iterate extent tree (backref check)
   And check whether the referencer of every backref exists.

2) Iterate other trees (forward ref check)
   And check whether the backref of every tree block/data exists in
   extent tree.

So in theory, every extent is iterated twice just as original one.
But since we don't have extent record, but use btrfs_search_slot() every
time we check, it will cause extra IO.

I assume the extra IO is reasonable and should make btrfsck able to
handle super large fs.

TODO features:
1) Repair
   Repair should be the same as old btrfsck, but still need to determine
   the repair principle.
   Current repair sometimes uses backref to repair data extent,
   sometimes uses data extent to fix backref.
   We need a consistent principle, or we will screw things up.

2) Replace current fsck code
   We assume the low memory mode has less lines of code, and may be
   easier for review and expand.

   If low memory mode is stable enough, we will consider to replace
   current extent and chunk tree check codes to free a lot of lines.

3) Further code refining
   Reduce duplicated codes

4) Unify output
   Make the output of low-memory mode same as the normal one.

Lu Fengqi (16):
  btrfs-progs: fsck: Introduce function to check tree block backref in
extent tree
  btrfs-progs: fsck: Introduce function to check data backref in extent
tree
  btrfs-progs: fsck: Introduce function to query tree block level
  btrfs-progs: fsck: Introduce function to check referencer of a backref
  btrfs-progs: fsck: Introduce function to check shared block ref
  btrfs-progs: fsck: Introduce function to check referencer for data
backref
  btrfs-progs: fsck: Introduce function to check shared data backref
  btrfs-progs: fsck: Introduce function to check an extent
  btrfs-progs: fsck: Introduce function to check dev extent item
  btrfs-progs: fsck: Introduce function to check dev used space
  btrfs-progs: fsck: Introduce function to check block group item
  btrfs-progs: fsck: Introduce function to check chunk item
  btrfs-progs: fsck: Introduce hub function for later fsck
  btrfs-progs: fsck: Introduce function to speed up fs tree check
  btrfs-progs: fsck: Introduce traversal function for fsck
  btrfs-progs: fsck: Introduce low memory mode

 Documentation/btrfs-check.asciidoc |2 +
 cmds-check.c   | 1667 +---
 ctree.h|2 +
 extent-tree.c  |2 +-
 4 files changed, 1536 insertions(+), 137 deletions(-)

-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 07/16] btrfs-progs: fsck: Introduce function to check shared data backref

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce the function check_shared_data_backref() to check the
referencer of a given shared data backref.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 8f971b9..5588898 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -8969,6 +8969,50 @@ out:
return 0;
 }
 
+/*
+ * Check if the referencer of a shared data backref exists
+ */
+static int check_shared_data_backref(struct btrfs_fs_info *fs_info,
+u64 parent, u64 bytenr)
+{
+   struct extent_buffer *eb;
+   struct btrfs_key key;
+   struct btrfs_file_extent_item *fi;
+   u32 nodesize = btrfs_super_nodesize(fs_info->super_copy);
+   u32 nr;
+   int found_parent = 0;
+   int i;
+
+   eb = read_tree_block_fs_info(fs_info, parent, nodesize, 0);
+   if (!extent_buffer_uptodate(eb))
+   goto out;
+
+   nr = btrfs_header_nritems(eb);
+   for (i = 0; i < nr; i++) {
+   btrfs_item_key_to_cpu(eb, &key, i);
+   if (key.type != BTRFS_EXTENT_DATA_KEY)
+   continue;
+
+   fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item);
+   if (btrfs_file_extent_type(eb, fi) == BTRFS_FILE_EXTENT_INLINE)
+   continue;
+
+   if (btrfs_file_extent_disk_bytenr(eb, fi) == bytenr) {
+   found_parent = 1;
+   break;
+   }
+   }
+
+out:
+   free_extent_buffer(eb);
+   if (!found_parent) {
+   error("Shared extent %llu referencer lost(parent: %llu)",
+ bytenr, parent);
+   return -MISSING_REFERENCER;
+   }
+   return 0;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 05/16] btrfs-progs: fsck: Introduce function to check shared block ref

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce function check_shared_block_backref() to check shared block
ref.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 81dd4f3..1d1b198 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -8831,6 +8831,48 @@ out:
return -err;
 }
 
+/*
+ * Check referencer for shared block backref
+ * If level == -1, this function will resolve the level.
+ */
+static int check_shared_block_backref(struct btrfs_fs_info *fs_info,
+u64 parent, u64 bytenr, int level)
+{
+   struct extent_buffer *eb;
+   u32 nodesize = btrfs_super_nodesize(fs_info->super_copy);
+   u32 nr;
+   int found_parent = 0;
+   int i;
+
+   eb = read_tree_block_fs_info(fs_info, parent, nodesize, 0);
+   if (!extent_buffer_uptodate(eb))
+   goto out;
+
+   if (level == -1)
+   level = query_tree_block_level(fs_info, bytenr);
+   if (level < 0)
+   goto out;
+
+   if (level + 1 != btrfs_header_level(eb))
+   goto out;
+
+   nr = btrfs_header_nritems(eb);
+   for (i = 0; i < nr; i++) {
+   if (bytenr == btrfs_node_blockptr(eb, i)) {
+   found_parent = 1;
+   break;
+   }
+   }
+out:
+   free_extent_buffer(eb);
+   if (!found_parent) {
+   error("Shared extent[%llu %u] lost its parent(parent: %llu, 
level: %u)",
+ bytenr, nodesize, parent, level);
+   return -MISSING_REFERENCER;
+   }
+   return 0;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 16/16] btrfs-progs: fsck: Introduce low memory mode

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce a new fsck mode: low memory mode.

Old btrfsck is doing a quite efficient but uses some memory for each
extent item.
Old method will ensure extents are only iterated once at extent/chunk
tree check process.

But since it uses a little memory for each extent item, for large fs
with several TB metadata, this can easily eat up memory and cause OOM.

To handle such limitation and improve scalability, the new low-memory
mode will not use any heap memory to record which extent is checked.
Instead it will use extent backref to avoid most of uneeded check on
shared fs/subvolume tree blocks.
And with the use forward and backward reference cross check, we can also
ensure every tree block is at least checked once.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-check.asciidoc |  2 +
 cmds-check.c   | 80 +-
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/Documentation/btrfs-check.asciidoc 
b/Documentation/btrfs-check.asciidoc
index 7371a23..96eadc8 100644
--- a/Documentation/btrfs-check.asciidoc
+++ b/Documentation/btrfs-check.asciidoc
@@ -35,6 +35,8 @@ run in read-only mode (default)
 create a new CRC tree and recalculate all checksums
 --init-extent-tree::
 create a new extent tree
+--low-memory::
+check fs in low memory usage mode(experimental)
 --check-data-csum::
 verify checksums of data blocks
 -p|--progress::
diff --git a/cmds-check.c b/cmds-check.c
index 85d6cf4..e9d68dd 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -71,6 +71,7 @@ static int repair = 0;
 static int no_holes = 0;
 static int init_extent_tree = 0;
 static int check_data_csum = 0;
+static int low_memory = 0;
 static struct btrfs_fs_info *global_info;
 static struct task_ctx ctx = { 0 };
 static struct cache_tree *roots_info_cache = NULL;
@@ -9721,6 +9722,63 @@ static int traversal_tree_block(struct btrfs_root *root,
return -err;
 }
 
+/*
+ * Low memory usage version check_chunks_and_extents.
+ */
+static int check_chunks_and_extents_v2(struct btrfs_root *root)
+{
+   struct btrfs_path path;
+   struct btrfs_key key;
+   struct btrfs_root *root1;
+   struct btrfs_root *cur_root;
+   int err = 0;
+   int ret;
+
+   root1 = root->fs_info->chunk_root;
+   ret = traversal_tree_block(root1, root1->node);
+   err |= -ret;
+
+   root1 = root->fs_info->tree_root;
+   ret = traversal_tree_block(root1, root1->node);
+   err |= -ret;
+
+   btrfs_init_path(&path);
+   key.objectid = BTRFS_EXTENT_TREE_OBJECTID;
+   key.offset = 0;
+   key.type = BTRFS_ROOT_ITEM_KEY;
+
+   ret = btrfs_search_slot(NULL, root1, &key, &path, 0, 0);
+   if (ret) {
+   error("couldn't find extent_tree_root from tree_root");
+   goto out;
+   }
+
+   while (1) {
+   btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]);
+   if (key.type != BTRFS_ROOT_ITEM_KEY)
+   goto next;
+   key.offset = (u64)-1;
+
+   cur_root = btrfs_read_fs_root(root->fs_info, &key);
+   if (IS_ERR(cur_root) || !cur_root) {
+   error("Fail to read tree: %lld", key.objectid);
+   goto next;
+   }
+
+   ret = traversal_tree_block(cur_root, cur_root->node);
+   err |= ret;
+
+next:
+   ret = btrfs_next_item(root1, &path);
+   if (ret)
+   goto out;
+   }
+
+out:
+   btrfs_release_path(&path);
+   return err;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
@@ -10837,6 +10895,7 @@ const char * const cmd_check_usage[] = {
"--readonly  run in read-only mode (default)",
"--init-csum-treecreate a new CRC tree",
"--init-extent-tree  create a new extent tree",
+   "--low-memorycheck in low memory usage 
mode(experimental)",
"--check-data-csum   verify checkums of data blocks",
"-Q|--qgroup-report   print a report on qgroup consistency",
"-E|--subvol-extents ",
@@ -10868,7 +10927,8 @@ int cmd_check(int argc, char **argv)
int c;
enum { GETOPT_VAL_REPAIR = 257, GETOPT_VAL_INIT_CSUM,
GETOPT_VAL_INIT_EXTENT, GETOPT_VAL_CHECK_CSUM,
-   GETOPT_VAL_READONLY, GETOPT_VAL_CHUNK_TREE };
+   GETOPT_VAL_READONLY, GETOPT_VAL_CHUNK_TREE,
+   GETOPT_VAL_LOW_MEMORY };
static const struct option long_options[] = {
{ "super", required_argument, NULL, 's' },
{ "repair", no_argument, NULL, GETOPT_VAL_REPAIR },
@@ -10886,6 +10946,8 @@ int cmd_check(int argc, char **argv)
{ "chun

[PATCH RFC 13/16] btrfs-progs: fsck: Introduce hub function for later fsck

2016-04-25 Thread Qu Wenruo
From: Lu Fengqi 

Introduce a hub function, check_items() to check all known/valuable
items and update related accounting like total_bytes and csum_bytes.

Signed-off-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 82 
 1 file changed, 82 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index a02db07..db6fc8e 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -9472,6 +9472,88 @@ out:
return -err;
 }
 
+/*
+ * Hub function to check known items and update related accounting info
+ */
+static int check_leaf_items(struct btrfs_root *root, struct extent_buffer *eb)
+{
+   struct btrfs_fs_info *fs_info = root->fs_info;
+   struct btrfs_key key;
+   int slot = 0;
+   int type;
+   int metadata;
+   struct btrfs_extent_data_ref *dref;
+   int ret;
+   int err = 0;
+
+next:
+   btrfs_item_key_to_cpu(eb, &key, slot);
+   type = btrfs_key_type(&key);
+
+   switch (type) {
+   case BTRFS_EXTENT_DATA_KEY:
+   ret = check_extent_data_item(root, eb, slot);
+   err |= -ret;
+   break;
+   case BTRFS_BLOCK_GROUP_ITEM_KEY:
+   ret = check_block_group_item(fs_info, eb, slot);
+   err |= -ret;
+   break;
+   case BTRFS_DEV_ITEM_KEY:
+   ret = check_dev_item(fs_info, eb, slot);
+   err |= -ret;
+   break;
+   case BTRFS_CHUNK_ITEM_KEY:
+   ret = check_chunk_item(fs_info, eb, slot);
+   err |= -ret;
+   break;
+   case BTRFS_DEV_EXTENT_KEY:
+   ret = check_dev_extent_item(fs_info, eb, slot);
+   err |= -ret;
+   break;
+   case BTRFS_EXTENT_ITEM_KEY:
+   case BTRFS_METADATA_ITEM_KEY:
+   metadata = type == BTRFS_METADATA_ITEM_KEY;
+   ret = check_extent_item(fs_info, eb, slot, metadata);
+   err |= -ret;
+   break;
+   case BTRFS_EXTENT_CSUM_KEY:
+   total_csum_bytes += btrfs_item_size_nr(eb, slot);
+   break;
+   case BTRFS_TREE_BLOCK_REF_KEY:
+   ret = check_tree_block_backref(fs_info, key.offset,
+  key.objectid, -1);
+   err |= -ret;
+   break;
+   case BTRFS_EXTENT_DATA_REF_KEY:
+   dref = btrfs_item_ptr(eb, slot, struct btrfs_extent_data_ref);
+   ret = check_extent_data_backref(fs_info,
+   btrfs_extent_data_ref_root(eb, dref),
+   btrfs_extent_data_ref_objectid(eb, dref),
+   btrfs_extent_data_ref_offset(eb, dref),
+   key.objectid, 0);
+   err |= -ret;
+   break;
+   case BTRFS_SHARED_BLOCK_REF_KEY:
+   ret = check_shared_block_backref(fs_info, key.offset,
+key.objectid, -1);
+   err |= -ret;
+   break;
+   case BTRFS_SHARED_DATA_REF_KEY:
+   ret = check_shared_data_backref(fs_info, key.offset,
+   key.objectid);
+   err |= -ret;
+   break;
+   default:
+   break;
+   }
+
+   if (++slot < btrfs_header_nritems(eb))
+   goto next;
+
+   return err;
+}
+
 static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite)
 {
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Double Freeing of dentry in check_parent_dirs_for_sync

2016-04-25 Thread Duncan
Paulo Dias posted on Mon, 25 Apr 2016 22:40:59 -0300 as excerpted:

> hi/2 all..
> 
> we are in 4.6 rc5 and im still seeing a LOT of this with my SSD:
> 
> Abr 25 22:38:01 hydra kernel: [ cut here ]
> Abr 25 22:38:01 hydra kernel: WARNING: CPU: 1 PID: 6236 at
> /home/kernel/COD/linux/fs/btrfs/inode.c:9261
> btrfs_destroy_inode+0x247/0x2c0 [btrfs]

I, OTOH, am not seeing any of them here, also SSD, after upgrading to 
pre-4.6-git shortly after 4.6-rc4. 

But my use-case is apparently less stress on the filesystem than many.  
Multiple small (largest is 24 GiB usable) btrfs raid1 on a pair of 
parallel-partitioned ssds, save for /boot, which is tiny (256 MiB)
mixed-bg dup mode, with the first backup on the other device and the grub 
install for each device pointing at its /boot, so I can bios-select the 
backup when needed.  The only serious problems I had were when one of the 
two ssds was going bad, forcing a replacement, after which I've not had 
major problems of any sort.

Also, as I'm using multiple independent btrfs, including identically 
sized fallbacks as first backup on the same pair of physical devices, I 
don't use subvolumes and don't do snapshots.  Also, no active quotas and 
I mount with autodefrag, ssd is automatically detected, and I don't use 
the discard mount option.

So with your ssd showing the problem and mine not, it's not directly ssd 
related, but if you do snapshotting and/or subvolumes, it could be 
related to that, or quotas, or trim/discard, or filesystem size.


Meanwhile, see the "btrfs_destroy_inode WARN_ON" thread, which 
interestingly enough, had a followup posted apparently the exact same 
minute as yours was, to this thread.

Based on that, it's not just you, but by that reply anyway, despite 
seeing lots of the warn-ons and getting scared back to an earlier kernel 
as a result, no dataloss was observed.  So without a pin-down it's tough 
to say it /can't/ happen, but at least based on the reply there, with the 
warn-ons apparently happening about every 10 minutes even with light use, 
no data loss from it to date, so while data loss /might/ still be 
possible, if it is, thankfully it doesn't seem to actually trigger very 
often, even under heavy destroy-inode warn-on triggering.

So they're obviously aware of the problem and presumably working on it, 
but it's equally obviously not fixed yet.

Were I seeing the problem frequently (again, I've not seen it at all), 
I'd likely drop back to 4.5 until there's a fix, tho if it takes long 
enough 4.5 might be going out of support, 4.4-LTS is of course another 
option.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: fsck: Fix found bytes accounting error

2016-04-25 Thread Qu Wenruo
In the new add_extent_rec_nolookup() function, we add bytes_used to
update found bytes accounting.

However there is a typo that we used tmpl->nr, which should be rec->nr.
This will make us to add 1 for data backref, instead the correct size.

Reported-by: Lu Fengqi 
Signed-off-by: Qu Wenruo 
---
 cmds-check.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-check.c b/cmds-check.c
index d59968b..b207f8e 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -4550,7 +4550,7 @@ static int add_extent_rec_nolookup(struct cache_tree 
*extent_cache,
rec->cache.size = tmpl->nr;
ret = insert_cache_extent(extent_cache, &rec->cache);
BUG_ON(ret);
-   bytes_used += tmpl->nr;
+   bytes_used += rec->nr;
 
if (tmpl->metadata)
rec->crossing_stripes = check_crossing_stripes(rec->start,
-- 
2.8.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs_destroy_inode WARN_ON.

2016-04-25 Thread Adam Borowski
On Mon, Mar 28, 2016 at 04:14:46PM +0200, Markus Trippelsdorf wrote:
> On 2016.03.28 at 10:05 -0400, Josef Bacik wrote:
> > >Mar 24 10:37:27 x4 kernel: WARNING: CPU: 3 PID: 11838 at 
> > >fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x22b/0x2a0
> > 
> > I saw this running some xfstests on our internal kernels but haven't been
> > able to reproduce it on my latest enospc work (which is obviously perfect).
> > What were you doing when you tripped this?  I'd like to see if I actually
> > did fix it or if I still need to run it down.  Thanks,
> 
> I cannot really tell. Looking at the backtrace, both Dave and I were
> running rm. 
> This warning happened just once on my machine, so the issue is obviously
> very hard to trigger.

On the other hand, it seems to be triggering really often (on the order of
~10 mins of light use) on my box.  I understandably ran away from 4.6-rc to
stable kernels (no one likes to risk data loss), but even in that little
time it triggered 328 times (over ~20ish boots).

Despite all of these WARNs, there's no data loss yet on the disk in
question, and the filesystem appears consistent.

Call stacks show a variety of callers of btrfs_destroy_inode, originating
from do_unlinkat, SyS_rename, btrfs_ioctl_snap_destroy, shrink_zone, or
task_work_run, direct callers being:

do_unlinkat

__dentry_kill
dput

__dentry_kill
shrink_dentry_list

dispose_list
prune_icache_sb


Just tried 4.6-rc5, it's still there.

Any way I could help debug this?

-- 
A tit a day keeps the vet away.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Possible Double Freeing of dentry in check_parent_dirs_for_sync

2016-04-25 Thread Paulo Dias
hi/2 all..

we are in 4.6 rc5 and im still seeing a LOT of this with my SSD:

Abr 25 22:38:01 hydra kernel: [ cut here ]
Abr 25 22:38:01 hydra kernel: WARNING: CPU: 1 PID: 6236 at
/home/kernel/COD/linux/fs/btrfs/inode.c:9261
btrfs_destroy_inode+0x247/0x2c0 [btrfs]
Abr 25 22:38:01 hydra kernel: Modules linked in: drbg ansi_cprng ctr
ccm rfcomm hid_generic usbhid hid rtsx_usb_ms memstick pci_stub bnep
vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) binfmt_misc
nls_iso8859_1 dell_wmi sparse_keymap ath3k intel_rapl btusb
x86_pkg_temp_thermal intel_powerclamp btrtl dell_laptop btbcm btintel
coretemp bluetooth dell_smm_hwmon kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul uvcvideo dell_led dell_smbios
ghash_clmulni_intel videobuf2_vmalloc dcdbas videobuf2_memops
videobuf2_v4l2 aesni_intel videobuf2_core snd_hda_codec_realtek
aes_x86_64 snd_hda_codec_generic videodev lrw gf128mul arc4 media
glue_helper ablk_helper cryptd snd_hda_intel snd_hda_codec ath9k
snd_hda_core input_leds ath9k_common joydev snd_hwdep serio_raw
snd_pcm ath9k_hw ath snd_seq_midi mac80211 snd_seq_midi_event
Abr 25 22:38:01 hydra kernel:  snd_rawmidi lpc_ich snd_seq cfg80211
snd_seq_device snd_timer snd mei_me soundcore mei shpchp
soc_button_array mac_hid dell_rbtn parport_pc ppdev lp parport autofs4
btrfs xor raid6_pq rtsx_usb_sdmmc rtsx_usb amdkfd amd_iommu_v2 radeon
i915 ttm i2c_algo_bit drm_kms_helper syscopyarea psmouse sysfillrect
sysimgblt fb_sys_fops ahci libahci r8169 drm mii wmi video fjes
Abr 25 22:38:01 hydra kernel: CPU: 1 PID: 6236 Comm: apt Tainted: G
W  OE   4.6.0-040600rc5-generic #201604242031
Abr 25 22:38:01 hydra kernel: Hardware name: Dell Inc. Latitude
3540/02R0J9, BIOS A10 01/28/2015
Abr 25 22:38:01 hydra kernel:  0286 c84e716a
8801288bfd18 813eee83
Abr 25 22:38:01 hydra kernel:   
8801288bfd58 810827cb
Abr 25 22:38:01 hydra kernel:  242d3bead680 8800acddbe40
8800acddbe40 8800354f9000
Abr 25 22:38:01 hydra kernel: Call Trace:
Abr 25 22:38:01 hydra kernel:  [] dump_stack+0x63/0x90
Abr 25 22:38:01 hydra kernel:  [] __warn+0xcb/0xf0
Abr 25 22:38:01 hydra kernel:  [] warn_slowpath_null+0x1d/0x20
Abr 25 22:38:01 hydra kernel:  []
btrfs_destroy_inode+0x247/0x2c0 [btrfs]
Abr 25 22:38:01 hydra kernel:  [] destroy_inode+0x3b/0x60
Abr 25 22:38:01 hydra kernel:  [] evict+0x136/0x1a0
Abr 25 22:38:01 hydra kernel:  [] iput+0x1ba/0x240
Abr 25 22:38:01 hydra kernel:  [] __dentry_kill+0x18d/0x1e0
Abr 25 22:38:01 hydra kernel:  [] dput+0x12b/0x220
Abr 25 22:38:01 hydra kernel:  [] __fput+0x18b/0x230
Abr 25 22:38:01 hydra kernel:  [] fput+0xe/0x10
Abr 25 22:38:01 hydra kernel:  [] task_work_run+0x73/0x90
Abr 25 22:38:01 hydra kernel:  []
exit_to_usermode_loop+0xc2/0xd0
Abr 25 22:38:01 hydra kernel:  []
syscall_return_slowpath+0x4e/0x60
Abr 25 22:38:01 hydra kernel:  []
entry_SYSCALL_64_fastpath+0xa6/0xa8
Abr 25 22:38:01 hydra kernel: ---[ end trace 7071159cbaf5ff25 ]---

two questions:

1 - is this harmless? i mean, its just a warning or i can get some data loss?
2 - is anyone looking at this yet?

best
| Paulo Dias
| paulo.miguel.d...@gmail.com

Tempora mutantur, nos et mutamur in illis.


On Wed, Apr 6, 2016 at 9:26 AM, Filipe Manana  wrote:
> On Wed, Apr 6, 2016 at 4:46 AM, Bastien Philbert
>  wrote:
>> Greetings All,
>> After some tracing I am not certain if this is correct due to being newer to 
>> the btrfs
>> codebase. However if someone more experience can show me if I am missing 
>> something in
>> my traces please let me known:)
>> Firstly here is the bug trace or the part that matters:
>> [ 7195.792492] [ cut here ]
>> [ 7195.792532] WARNING: CPU: 0 PID: 5352 at 
>> /home/kernel/COD/linux/fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x247/0x2c0 
>> [btrfs]
>> [ 7195.792535] Modules linked in: bnep binfmt_misc intel_rapl 
>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel samsung_laptop kvm 
>> irqbypass crct10dif_pclmul crc32_pclmul btusb ghash_clmulni_intel btrtl 
>> btbcm btintel cryptd snd_hda_codec_hdmi uvcvideo bluetooth 
>> snd_hda_codec_realtek videobuf2_vmalloc snd_hda_codec_generic 
>> videobuf2_memops arc4 videobuf2_v4l2 snd_hda_intel input_leds videobuf2_core 
>> snd_hda_codec joydev snd_hda_core iwldvm serio_raw snd_hwdep videodev 
>> snd_pcm mac80211 media snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq 
>> snd_seq_device iwlwifi snd_timer cfg80211 snd lpc_ich mei_me soundcore 
>> shpchp mei dell_smo8800 mac_hid parport_pc ppdev lp parport autofs4 btrfs 
>> xor raid6_pq hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper 
>> syscopyarea sysfillrect psmouse sysimgblt fb_sys_fops
>> [ 7195.792593]  drm r8169 ahci libahci mii wmi video fjes
>> [ 7195.792602] CPU: 0 PID: 5352 Comm: aptitude Not tainted 
>> 4.6.0-040600rc1-generic #201603261930
>> [ 7195.792604] Hardware name: SAMSUNG ELECTRONICS CO., LTD. 
>> 530U3C/530U4C/SAMSUNG_NP1234567890, BIO

Re: [PATCH v8 19/27] btrfs: try more times to alloc metadata reserve space

2016-04-25 Thread Qu Wenruo



Josef Bacik wrote on 2016/04/25 10:05 -0400:

On 04/24/2016 08:54 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/22 14:06 -0400:

On 03/21/2016 09:35 PM, Qu Wenruo wrote:

From: Wang Xiaoguang 

In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we
try
to reserve is calculated by the difference between outstanding_extents
and
reserved_extents.

When reserve_metadata_bytes() fails to reserve desited metadata space,
it has already done some reclaim work, such as write ordered extents.

In that case, outstanding_extents and reserved_extents may already
changed, and we may reserve enough metadata space then.

So this patch will try to call reserve_metadata_bytes() at most 3 times
to ensure we really run out of space.

Such false ENOSPC is mainly caused by small file extents and time
consuming delalloc functions, which mainly affects in-band
de-duplication. (Compress should also be affected, but LZO/zlib is
faster than SHA256, so still harder to trigger than dedupe).

Signed-off-by: Wang Xiaoguang 
---
  fs/btrfs/extent-tree.c | 25 ++---
  1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dabd721..016d2ec 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2421,7 +2421,7 @@ static int run_one_delayed_ref(struct
btrfs_trans_handle *trans,
   * a new extent is revered, then deleted
   * in one tran, and inc/dec get merged to 0.
   *
- * In this case, we need to remove its dedup
+ * In this case, we need to remove its dedupe
   * hash.
   */
  btrfs_dedupe_del(trans, fs_info, node->bytenr);
@@ -5675,6 +5675,7 @@ int btrfs_delalloc_reserve_metadata(struct inode
*inode, u64 num_bytes)
  bool delalloc_lock = true;
  u64 to_free = 0;
  unsigned dropped;
+int loops = 0;

  /* If we are a free space inode we need to not flush since we
will be in
   * the middle of a transaction commit.  We also don't need the
delalloc
@@ -5690,11 +5691,12 @@ int btrfs_delalloc_reserve_metadata(struct
inode *inode, u64 num_bytes)
  btrfs_transaction_in_commit(root->fs_info))
  schedule_timeout(1);

+num_bytes = ALIGN(num_bytes, root->sectorsize);
+
+again:
  if (delalloc_lock)
  mutex_lock(&BTRFS_I(inode)->delalloc_mutex);

-num_bytes = ALIGN(num_bytes, root->sectorsize);
-
  spin_lock(&BTRFS_I(inode)->lock);
  nr_extents = (unsigned)div64_u64(num_bytes +
   BTRFS_MAX_EXTENT_SIZE - 1,
@@ -5815,6 +5817,23 @@ out_fail:
  }
  if (delalloc_lock)
  mutex_unlock(&BTRFS_I(inode)->delalloc_mutex);
+/*
+ * The number of metadata bytes is calculated by the difference
+ * between outstanding_extents and reserved_extents. Sometimes
though
+ * reserve_metadata_bytes() fails to reserve the wanted metadata
bytes,
+ * indeed it has already done some work to reclaim metadata
space, hence
+ * both outstanding_extents and reserved_extents would have
changed and
+ * the bytes we try to reserve would also has changed(may be
smaller).
+ * So here we try to reserve again. This is much useful for online
+ * dedupe, which will easily eat almost all meta space.
+ *
+ * XXX: Indeed here 3 is arbitrarily choosed, it's a good
workaround for
+ * online dedupe, later we should find a better method to avoid
dedupe
+ * enospc issue.
+ */
+if (unlikely(ret == -ENOSPC && loops++ < 3))
+goto again;
+
  return ret;
  }




NAK, we aren't going to just arbitrarily retry to make our metadata
reservation.  Dropping reserved metadata space by completing ordered
extents should free enough to make our current reservation, and in fact
this only accounts for the disparity, so should be an accurate count
most of the time.  I can see a case for detecting that the disparity no
longer exists and retrying in that case (we free enough ordered extents
that we are no longer trying to reserve ours + overflow but now only
ours) and retry in _that specific case_, but we need to limit it to this
case only.  Thanks,


Would it be OK to retry only for dedupe enabled case?

Currently it's only a workaround and we are still digging the root
cause, but for a workaround, I assume it is good enough though for
dedupe enabled case.



No we're not going to leave things in a known broken state to come back
to later, that just makes it so we forget stuff and it sits there
forever.  Thanks,

Josef


OK, We'll investigate it and find the best fix.

BTW, we also found extent-tree.c is using the same 3 loops code:
(and that's why we choose the same method)
--
loops = 0;
while (delalloc_bytes && loops < 3) {
max_reclaim = min(delalloc_bytes, to_reclaim);
nr_pages = max_reclaim >> PAGE_CACHE_SHIFT;
btrfs_writeback_inod

Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-25 Thread Qu Wenruo



Josef Bacik wrote on 2016/04/25 10:24 -0400:

On 04/24/2016 08:56 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/22 14:23 -0400:

On 04/22/2016 02:21 PM, Mark Fasheh wrote:

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

+/*
+ * Force parent root to be updated, as we recorded it before so
its
+ * last_trans == cur_transid.
+ * Or it won't be committed again onto disk after later
+ * insert_dir_item()
+ */
+if (!ret)
+record_root_in_trans(trans, parent, 1);
+return ret;
+}


NACK, holy shit we aren't adding a special transaction commit only
for qgroup snapshots.  Figure out a different way.  Thanks,


Yeah I saw that. To be fair, we run a whole lot of the transaction
stuff
multiple times (at least from my reading) so I'm really unclear on
what the
performance impact is.

Do you have any suggestion though? We've been banging our heads
against this
for a while now and as slow as this patch might be, it actually works
where
nothing else has so far.


I'm less concerned about committing another transaction and more
concerned about the fact that it is an special variant of the
transaction commit.  If this goes wrong, or at some point in the future
we fail to update it along with btrfs_transaction_commit we suddenly are
corrupting metadata.  If we have to commit a transaction then call
btrfs_commit_transaction(), don't open code a stripped down version,
here be dragons.  Thanks,

Josef




Yes, I also don't like the dirty hack.

Although the problem is, we have no other good choice.

If we can call commit_transaction() that's the best case, but the
problem is, in create_pending_snapshots(), we are already inside
commit_transaction().

Or commit_transaction() can be called inside commit_transaction()?



No, figure out a different way.  IIRC I dealt with this with the
no_quota flag for inc_ref/dec_ref since the copy root stuff does strange
things with the reference counts, but all this code is gone now.  I
looked around to see if I could figure out how the refs are ending up
this way but it doesn't make sense to me and there isn't enough
information in your changelog for me to be able to figure it out. You've
created this mess, clean it up without making it messier.  Thanks,

Josef


Unfortunately, your original no_quota flag just hide the bug, and hide 
it in a bad method.


Originally, no_quota flag is used for case like this, to skip quota at 
snapshot creation, and use quota_inherit() to hack the quota accounting.
It seems work, but in fact, if the DIR_ITEM insert need to create a new 
cousin leaf, then quota is messed up.


Your quota rework doesn't really help, as it won't even accounting 
things well, just check fstest/btrfs/091 on 4.1 kernel.


The only perfect fix for this already nasty subvolume creation is to do 
full subtree rescan.

Or no one knows when higher qgroups will be broken.



If you think splitting commit_transaction into two variants can cause 
problem, I can merge this two variants into one.


As in btrfs_commit_transaction() the commit process is much the same as 
the one used in create_pending_snapshot().


If there is only one __commit_roots() to do such commit, then there is 
nothing special only for quota.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: use dynamic allocation for root item in create_subvol

2016-04-25 Thread Tsutomu Itoh
On 2016/04/25 20:18, David Sterba wrote:
> The size of root item is more than 400 bytes, which is quite a lot of
> stack space. As we do IO from inside the subvolume ioctls, we should
> keep the stack usage low in case the filesystem is on top of other
> layers (NFS, device mapper, iscsi, etc).
> 
> Signed-off-by: David Sterba 

Looks good to me.

Reviewed-by: Tsutomu Itoh 

> ---
>   fs/btrfs/ioctl.c | 65 
> 
>   1 file changed, 37 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 053e677839fe..9a63fe07bc2e 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -439,7 +439,7 @@ static noinline int create_subvol(struct inode *dir,
>   {
>   struct btrfs_trans_handle *trans;
>   struct btrfs_key key;
> - struct btrfs_root_item root_item;
> + struct btrfs_root_item *root_item;
>   struct btrfs_inode_item *inode_item;
>   struct extent_buffer *leaf;
>   struct btrfs_root *root = BTRFS_I(dir)->root;
> @@ -455,16 +455,22 @@ static noinline int create_subvol(struct inode *dir,
>   u64 qgroup_reserved;
>   uuid_le new_uuid;
>   
> + root_item = kzalloc(sizeof(*root_item), GFP_KERNEL);
> + if (!root_item)
> + return -ENOMEM;
> +
>   ret = btrfs_find_free_objectid(root->fs_info->tree_root, &objectid);
>   if (ret)
> - return ret;
> + goto fail_free;
>   
>   /*
>* Don't create subvolume whose level is not zero. Or qgroup will be
>* screwed up since it assume subvolme qgroup's level to be 0.
>*/
> - if (btrfs_qgroup_level(objectid))
> - return -ENOSPC;
> + if (btrfs_qgroup_level(objectid)) {
> + ret = -ENOSPC;
> + goto fail_free;
> + }
>   
>   btrfs_init_block_rsv(&block_rsv, BTRFS_BLOCK_RSV_TEMP);
>   /*
> @@ -474,14 +480,14 @@ static noinline int create_subvol(struct inode *dir,
>   ret = btrfs_subvolume_reserve_metadata(root, &block_rsv,
>  8, &qgroup_reserved, false);
>   if (ret)
> - return ret;
> + goto fail_free;
>   
>   trans = btrfs_start_transaction(root, 0);
>   if (IS_ERR(trans)) {
>   ret = PTR_ERR(trans);
>   btrfs_subvolume_release_metadata(root, &block_rsv,
>qgroup_reserved);
> - return ret;
> + goto fail_free;
>   }
>   trans->block_rsv = &block_rsv;
>   trans->bytes_reserved = block_rsv.size;
> @@ -509,47 +515,45 @@ static noinline int create_subvol(struct inode *dir,
>   BTRFS_UUID_SIZE);
>   btrfs_mark_buffer_dirty(leaf);
>   
> - memset(&root_item, 0, sizeof(root_item));
> -
> - inode_item = &root_item.inode;
> + inode_item = &root_item->inode;
>   btrfs_set_stack_inode_generation(inode_item, 1);
>   btrfs_set_stack_inode_size(inode_item, 3);
>   btrfs_set_stack_inode_nlink(inode_item, 1);
>   btrfs_set_stack_inode_nbytes(inode_item, root->nodesize);
>   btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755);
>   
> - btrfs_set_root_flags(&root_item, 0);
> - btrfs_set_root_limit(&root_item, 0);
> + btrfs_set_root_flags(root_item, 0);
> + btrfs_set_root_limit(root_item, 0);
>   btrfs_set_stack_inode_flags(inode_item, BTRFS_INODE_ROOT_ITEM_INIT);
>   
> - btrfs_set_root_bytenr(&root_item, leaf->start);
> - btrfs_set_root_generation(&root_item, trans->transid);
> - btrfs_set_root_level(&root_item, 0);
> - btrfs_set_root_refs(&root_item, 1);
> - btrfs_set_root_used(&root_item, leaf->len);
> - btrfs_set_root_last_snapshot(&root_item, 0);
> + btrfs_set_root_bytenr(root_item, leaf->start);
> + btrfs_set_root_generation(root_item, trans->transid);
> + btrfs_set_root_level(root_item, 0);
> + btrfs_set_root_refs(root_item, 1);
> + btrfs_set_root_used(root_item, leaf->len);
> + btrfs_set_root_last_snapshot(root_item, 0);
>   
> - btrfs_set_root_generation_v2(&root_item,
> - btrfs_root_generation(&root_item));
> + btrfs_set_root_generation_v2(root_item,
> + btrfs_root_generation(root_item));
>   uuid_le_gen(&new_uuid);
> - memcpy(root_item.uuid, new_uuid.b, BTRFS_UUID_SIZE);
> - btrfs_set_stack_timespec_sec(&root_item.otime, cur_time.tv_sec);
> - btrfs_set_stack_timespec_nsec(&root_item.otime, cur_time.tv_nsec);
> - root_item.ctime = root_item.otime;
> - btrfs_set_root_ctransid(&root_item, trans->transid);
> - btrfs_set_root_otransid(&root_item, trans->transid);
> + memcpy(root_item->uuid, new_uuid.b, BTRFS_UUID_SIZE);
> + btrfs_set_stack_timespec_sec(&root_item->otime, cur_time.tv_sec);
> + btrfs_set_stack_timespec_nsec(&root_item->otime, cur_time.tv_nsec);
> + root_item->ctime = root_item->otime;
> + btrfs_set_root_ctransi

Re: Install to or Recover RAID Array Subvolume Root?

2016-04-25 Thread Nicholas D Steeves
On 22 April 2016 at 06:44, David Alcorn  wrote:
>
> First, I verified that while the Debian Installer will install to a
> pre set default BTRFS RAID6 subvolume, the Grub install step fails.
> The alternative to restore installation to a RAID6 subvolume requires
> installation to a non RAID6 subvolume and then send|receive the
> snapshotted installation to the array.  To prepare for this attempt, I
> reinstalled BTRFS (Debian stable) to a flash drive using separate
> partitions for efi, /boot/ and / (in a subvolume).  The default
> subvolume was set to 5 for both the flash / partition and also the
> RAID6 array.  I used a separate /boot partition to reduce complexity.
> Both the kernel and btrfs tools were upgraded to 4.4.  I soon
> thereafter got lost.

1. Have you partially filled your RAID6 array?  If so, do you have
current backups for everything you care about?
2. Please indicate whether you prefer to mount by LABEL, UUID, or /dev
3. If it's by /dev, please send the output of: parted -l
4. If it's by LABEL or UUID, please also send the output of: blkid

Sincerely,
Nicholas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs: refactor btrfs_dev_replace_start for reuse

2016-04-25 Thread David Sterba
On Thu, Mar 24, 2016 at 06:48:14PM +0800, Anand Jain wrote:
> A refactor patch, and avoids user input verification in the
> btrfs_dev_replace_start(), and so this function can be reused.
> 
> Signed-off-by: Anand Jain 

Added on top of the delete-by-id patchset as there's a dependency, plus
the 1/3 patch "btrfs: use fs_info directly".
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs: keep sysfs target add in the last

2016-04-25 Thread David Sterba
On Thu, Mar 24, 2016 at 06:48:13PM +0800, Anand Jain wrote:
> Sysfs create context should come in the last, so that we
> don't have to undo sysfs operation for the reason that any
> other operation has failed.

Moving the sysfs call will make a visible change: in the old code, the
sysfs node exists during the whole replace process, while in the new
code it appears only after it finishes. While this is not necessarily a
problem, I'd like to check that his is an intended change, as it's not
mentioned in the changelog.

Besides, the sysfs node seems to be added unconditionally, so if the
scrub is running in parallel (checked a few lines above the new code),
we'll happily add the target device although no replace happened.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/13] Introduce device state 'failed', spare device and auto replace

2016-04-25 Thread Yauhen Kharuzhy
On Mon, Apr 18, 2016 at 07:31:31PM +0800, Anand Jain wrote:
> Thanks for various comments, tests and feedback.

Seems working good for me.

-- 
Yauhen Kharuzhy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: cleanup assigning next active device with a check

2016-04-25 Thread David Sterba
On Mon, Apr 18, 2016 at 07:25:52PM +0800, Anand Jain wrote:
> Creates helper fucntion as needed by the device delete
> and replace operations. Also now it checks if the next
> device being assigned is an active device.
> 
> Signed-off-by: Anand Jain 
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1684,10 +1684,40 @@ out:
>   return ret;
>  }
>  
> +struct btrfs_device *btrfs_find_next_active_device(struct btrfs_fs_devices 
> *fs_devs,
> + struct btrfs_device *device)

> +
> +void btrfs_assign_next_active_device(struct btrfs_fs_info *fs_info,
> + struct btrfs_device *device, struct btrfs_device *this_dev)

Please add comments what the functions do so that one does not need to
read the whole function to figure it out.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix lock dep warning, move scratch dev out of device_list_mutex and uuid_mutex

2016-04-25 Thread David Sterba
On Mon, Apr 18, 2016 at 04:51:23PM +0800, Anand Jain wrote:
> When the replace target fails, the target device will be taken
> out of fs device list, scratch + update_dev_time and freed. However
> we could do the scratch  + update_dev_time and free part after the
> device has been taken out of device list, so that we don't have to
> hold the device_list_mutex and uuid_mutex locks.
> 
> Reported issue:
[...]
> 
> Signed-off-by: Anand Jain 
> Reported-by: Yauhen Kharuzhy 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Restrict e2fsprogs version for new convert

2016-04-25 Thread David Sterba
On Mon, Apr 18, 2016 at 09:20:18AM +0800, Qu Wenruo wrote:
> 
> 
> David Sterba wrote on 2016/04/15 13:17 +0200:
> > On Thu, Apr 14, 2016 at 02:24:34PM +0800, Qu Wenruo wrote:
> >> New btrfs-convert is using a lot of new macro in e2fsprogs 1.42.
> >> Unfortunately the new compatible layer for older e2fsprogs is still
> >> under development.
> >
> > It hasn't been released yet so it's not really a big problem, although
> > it makes testing on my side a bit harder. The configure-time check
> > should be 1.41 and until it's fixed we can print a warning.
> >
> >
> Did I missed something?
> 
> I checkout 1.41.14 and it shows no cluster support in ext2fs.h.
> 
> Also git describe shows it's v1.41.14-36-g1da5ef7, after the last v1.41 
> version.
> 
> So I think the check should be 1.42, just as the patch.

The idea is to keep lowest supported version 1.41, because this version
can be commonly found on enterprise distros. The lack of cluster is
expected and needs to be dealt with both build- and run-time.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot

2016-04-25 Thread Josef Bacik

On 04/24/2016 08:56 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/22 14:23 -0400:

On 04/22/2016 02:21 PM, Mark Fasheh wrote:

On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote:

On 04/15/2016 05:08 AM, Qu Wenruo wrote:

+/*
+ * Force parent root to be updated, as we recorded it before so
its
+ * last_trans == cur_transid.
+ * Or it won't be committed again onto disk after later
+ * insert_dir_item()
+ */
+if (!ret)
+record_root_in_trans(trans, parent, 1);
+return ret;
+}


NACK, holy shit we aren't adding a special transaction commit only
for qgroup snapshots.  Figure out a different way.  Thanks,


Yeah I saw that. To be fair, we run a whole lot of the transaction stuff
multiple times (at least from my reading) so I'm really unclear on
what the
performance impact is.

Do you have any suggestion though? We've been banging our heads
against this
for a while now and as slow as this patch might be, it actually works
where
nothing else has so far.


I'm less concerned about committing another transaction and more
concerned about the fact that it is an special variant of the
transaction commit.  If this goes wrong, or at some point in the future
we fail to update it along with btrfs_transaction_commit we suddenly are
corrupting metadata.  If we have to commit a transaction then call
btrfs_commit_transaction(), don't open code a stripped down version,
here be dragons.  Thanks,

Josef




Yes, I also don't like the dirty hack.

Although the problem is, we have no other good choice.

If we can call commit_transaction() that's the best case, but the
problem is, in create_pending_snapshots(), we are already inside
commit_transaction().

Or commit_transaction() can be called inside commit_transaction()?



No, figure out a different way.  IIRC I dealt with this with the 
no_quota flag for inc_ref/dec_ref since the copy root stuff does strange 
things with the reference counts, but all this code is gone now.  I 
looked around to see if I could figure out how the refs are ending up 
this way but it doesn't make sense to me and there isn't enough 
information in your changelog for me to be able to figure it out. 
You've created this mess, clean it up without making it messier.  Thanks,


Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 19/27] btrfs: try more times to alloc metadata reserve space

2016-04-25 Thread Josef Bacik

On 04/24/2016 08:54 PM, Qu Wenruo wrote:



Josef Bacik wrote on 2016/04/22 14:06 -0400:

On 03/21/2016 09:35 PM, Qu Wenruo wrote:

From: Wang Xiaoguang 

In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we
try
to reserve is calculated by the difference between outstanding_extents
and
reserved_extents.

When reserve_metadata_bytes() fails to reserve desited metadata space,
it has already done some reclaim work, such as write ordered extents.

In that case, outstanding_extents and reserved_extents may already
changed, and we may reserve enough metadata space then.

So this patch will try to call reserve_metadata_bytes() at most 3 times
to ensure we really run out of space.

Such false ENOSPC is mainly caused by small file extents and time
consuming delalloc functions, which mainly affects in-band
de-duplication. (Compress should also be affected, but LZO/zlib is
faster than SHA256, so still harder to trigger than dedupe).

Signed-off-by: Wang Xiaoguang 
---
  fs/btrfs/extent-tree.c | 25 ++---
  1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dabd721..016d2ec 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2421,7 +2421,7 @@ static int run_one_delayed_ref(struct
btrfs_trans_handle *trans,
   * a new extent is revered, then deleted
   * in one tran, and inc/dec get merged to 0.
   *
- * In this case, we need to remove its dedup
+ * In this case, we need to remove its dedupe
   * hash.
   */
  btrfs_dedupe_del(trans, fs_info, node->bytenr);
@@ -5675,6 +5675,7 @@ int btrfs_delalloc_reserve_metadata(struct inode
*inode, u64 num_bytes)
  bool delalloc_lock = true;
  u64 to_free = 0;
  unsigned dropped;
+int loops = 0;

  /* If we are a free space inode we need to not flush since we
will be in
   * the middle of a transaction commit.  We also don't need the
delalloc
@@ -5690,11 +5691,12 @@ int btrfs_delalloc_reserve_metadata(struct
inode *inode, u64 num_bytes)
  btrfs_transaction_in_commit(root->fs_info))
  schedule_timeout(1);

+num_bytes = ALIGN(num_bytes, root->sectorsize);
+
+again:
  if (delalloc_lock)
  mutex_lock(&BTRFS_I(inode)->delalloc_mutex);

-num_bytes = ALIGN(num_bytes, root->sectorsize);
-
  spin_lock(&BTRFS_I(inode)->lock);
  nr_extents = (unsigned)div64_u64(num_bytes +
   BTRFS_MAX_EXTENT_SIZE - 1,
@@ -5815,6 +5817,23 @@ out_fail:
  }
  if (delalloc_lock)
  mutex_unlock(&BTRFS_I(inode)->delalloc_mutex);
+/*
+ * The number of metadata bytes is calculated by the difference
+ * between outstanding_extents and reserved_extents. Sometimes
though
+ * reserve_metadata_bytes() fails to reserve the wanted metadata
bytes,
+ * indeed it has already done some work to reclaim metadata
space, hence
+ * both outstanding_extents and reserved_extents would have
changed and
+ * the bytes we try to reserve would also has changed(may be
smaller).
+ * So here we try to reserve again. This is much useful for online
+ * dedupe, which will easily eat almost all meta space.
+ *
+ * XXX: Indeed here 3 is arbitrarily choosed, it's a good
workaround for
+ * online dedupe, later we should find a better method to avoid
dedupe
+ * enospc issue.
+ */
+if (unlikely(ret == -ENOSPC && loops++ < 3))
+goto again;
+
  return ret;
  }




NAK, we aren't going to just arbitrarily retry to make our metadata
reservation.  Dropping reserved metadata space by completing ordered
extents should free enough to make our current reservation, and in fact
this only accounts for the disparity, so should be an accurate count
most of the time.  I can see a case for detecting that the disparity no
longer exists and retrying in that case (we free enough ordered extents
that we are no longer trying to reserve ours + overflow but now only
ours) and retry in _that specific case_, but we need to limit it to this
case only.  Thanks,


Would it be OK to retry only for dedupe enabled case?

Currently it's only a workaround and we are still digging the root
cause, but for a workaround, I assume it is good enough though for
dedupe enabled case.



No we're not going to leave things in a known broken state to come back 
to later, that just makes it so we forget stuff and it sits there 
forever.  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: empty disk reports full

2016-04-25 Thread Alejandro Vargas
El Viernes, 1 de abril de 2016 10:05:07 Hugo Mills escribió:
> On Fri, Apr 01, 2016 at 11:50:50AM +0200, Alejandro Vargas wrote:
> > I am using a 2Tb disk for incremental backups.
> > 
> > I use rsync for backing up to a subvolume, and each day I creates an
> > snapshot of the lastest snapshot and do rsync in this.
> > 
> > When the disk becomes nearly full (100Gb or less available) I deletes the
> > oldest subvolume (withbtrfs subvolume delete).
> > 
> > My problem is that *even removing ALL the subvolumes*, the free space does
> > not change. It continues reporting the same size (disk is nearly full).
> > 
> > I tried "btrfs balance start /mnt/backup" but it takes hours and hours.
> > 
> > I'm using linux 4.1.15
> > btrfs-progs v4.1.2
> 
>Can you show us the output of both "sudo btrfs fi show" and "btrfs
> fi df /mnt/backup", please?

Before deleting subvolumes:

[root@backups ~]# df /mnt/backup
S.ficheros Tamaño Usados  Disp Uso% Montado en
/dev/sdb11,9T   1,9T  5,0M 100% /mnt/backup


[root@backups ~]# ls -l /mnt/backup
total 0
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160318/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160328/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160330/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160401/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160404/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160406/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160408/


[root@backups ~]# btrfs fi show
Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
Total devices 1 FS bytes used 1.80TiB
devid1 size 1.82TiB used 1.82TiB path /dev/sdb1

btrfs-progs v4.1.2

[root@backups ~]# btrfs fi df /mnt/backup
Data, single: total=1.79TiB, used=1.79TiB
System, DUP: total=32.00MiB, used=240.00KiB
Metadata, DUP: total=17.00GiB, used=15.83GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


Now I remove the oldest subvolume:


[root@backups ~]# btrfs subvolume delete /mnt/backup/back20160318/
Delete subvolume (no-commit): '/mnt/backup/back20160318'

[root@backups ~]# df  /mnt/backup
S.ficheros Tamaño Usados  Disp Uso% Montado en
/dev/sdb11,9T   1,9T   22M 100% /mnt/backup

[root@backups ~]# btrfs fi show
Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
Total devices 1 FS bytes used 1.80TiB
devid1 size 1.82TiB used 1.82TiB path /dev/sdb1

[root@backups ~]# btrfs fi show
Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
Total devices 1 FS bytes used 1.80TiB
devid1 size 1.82TiB used 1.82TiB path /dev/sdb1

btrfs-progs v4.1.2
[root@backups ~]# btrfs fi df /mnt/backup
Data, single: total=1.79TiB, used=1.79TiB
System, DUP: total=32.00MiB, used=240.00KiB
Metadata, DUP: total=17.00GiB, used=15.83GiB
GlobalReserve, single: total=512.00MiB, used=102.53MiB



Now I remove 2 more subvolumes:

[root@backups ~]# btrfs subvolume delete /mnt/backup/back20160328/
Delete subvolume (no-commit): '/mnt/backup/back20160328'
[root@backups ~]# btrfs subvolume delete /mnt/backup/back20160330/
Delete subvolume (no-commit): '/mnt/backup/back20160330'

[root@backups ~]# df /mnt/backup/
S.ficheros Tamaño Usados  Disp Uso% Montado en
/dev/sdb11,9T   1,9T  348M 100% /mnt/backup

[root@backups ~]# btrfs fi show
Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
Total devices 1 FS bytes used 1.80TiB
devid1 size 1.82TiB used 1.82TiB path /dev/sdb1

btrfs-progs v4.1.2

Data, single: total=1.79TiB, used=1.79TiB
System, DUP: total=32.00MiB, used=240.00KiB
Metadata, DUP: total=17.00GiB, used=15.83GiB
GlobalReserve, single: total=512.00MiB, used=98.94MiB


[root@backups ~]# ls -l /mnt/backup/
total 0
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160401/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160404/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160406/
drwxr-xr-x 1 root root 86 mar 20 16:23 back20160408/


Now I will remove the resting subvolumes

[root@backups ~]# btrfs subvolume delete /mnt/backup/back20160401/
Delete subvolume (no-commit): '/mnt/backup/back20160401'
[root@backups ~]# btrfs subvolume delete /mnt/backup/back20160404/
Delete subvolume (no-commit): '/mnt/backup/back20160404'
[root@backups ~]# btrfs subvolume delete /mnt/backup/back20160406/
Delete subvolume (no-commit): '/mnt/backup/back20160406'
[root@backups ~]# btrfs subvolume delete /mnt/backup/back20160408/
Delete subvolume (no-commit): '/mnt/backup/back20160408'

[root@backups ~]# ls -l /mnt/backup/
total 0

[root@backups ~]# df /mnt/backup/
S.ficheros Tamaño Usados  Disp Uso% Montado en
/dev/sdb11,9T   1,9T  4,6G 100% /mnt/backup
[root@backups ~]# btrfs fi show
Label: 'disco_backup'  uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506
Total devices 1 FS bytes used 1.80TiB
devid1 size 1.82TiB used 1.82TiB path /dev/sdb1

btrfs-progs v4.1.2

[root@backups ~]# btrfs fi df /mnt/backup
Data, single: total=1.79TiB, used=1.78TiB
System, DUP: total=32.00MiB, 

[PATCH] Btrfs: use root when checking need_async_flush

2016-04-25 Thread Josef Bacik
Instead of doing fs_info->fs_root in need_async_flush, which may not be set
during recovery when mounting, just pass the root itself in, which makes more
sense as thats what btrfs_calc_reclaim_metadata_size takes.

Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f23f426..e760cf7 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4872,7 +4872,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_root *root,
 }
 
 static inline int need_do_async_reclaim(struct btrfs_space_info *space_info,
-   struct btrfs_fs_info *fs_info, u64 used)
+   struct btrfs_root *root, u64 used)
 {
u64 thresh = div_factor_fine(space_info->total_bytes, 98);
 
@@ -4880,11 +4880,12 @@ static inline int need_do_async_reclaim(struct 
btrfs_space_info *space_info,
if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh)
return 0;
 
-   if (!btrfs_calc_reclaim_metadata_size(fs_info->fs_root, space_info))
+   if (!btrfs_calc_reclaim_metadata_size(root, space_info))
return 0;
 
-   return (used >= thresh && !btrfs_fs_closing(fs_info) &&
-   !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state));
+   return (used >= thresh && !btrfs_fs_closing(root->fs_info) &&
+   !test_bit(BTRFS_FS_STATE_REMOUNTING,
+ &root->fs_info->fs_state));
 }
 
 static void wake_all_tickets(struct list_head *head)
@@ -5129,7 +5130,7 @@ static int __reserve_metadata_bytes(struct btrfs_root 
*root,
 * the async reclaim as we will panic.
 */
if (!root->fs_info->log_root_recovering &&
-   need_do_async_reclaim(space_info, root->fs_info, used) &&
+   need_do_async_reclaim(space_info, root, used) &&
!work_busy(&root->fs_info->async_reclaim_work)) {
trace_btrfs_trigger_flush(root->fs_info,
  space_info->flags,
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: primary location of btrfs-progs changelog: The wiki?

2016-04-25 Thread David Sterba
On Mon, Apr 25, 2016 at 08:06:47AM -0400, Nicholas D Steeves wrote:
> On 25 April 2016 at 07:36, David Sterba  wrote:
> > The conversion looks relatively ok, indentation could be 2 spaces and
> > all bullet lists with '*'. Thanks.
> 
> Done.  I also added one line before each new version.  I've attached
> it, since it's just one file; however, if you prefer I can clone your
> repo on github and submit it that way.

Thanks. I had a look how a changes file is usually formatted and made
further changes: fixed ordering of the minor releases, added exact dates
of release and un-indented in a few more places, plus some minor fixes.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add device while rebalancing

2016-04-25 Thread Austin S. Hemmelgarn

On 2016-04-25 08:43, Duncan wrote:

Austin S. Hemmelgarn posted on Mon, 25 Apr 2016 07:18:10 -0400 as
excerpted:


On 2016-04-23 01:38, Duncan wrote:


And again with snapshotting operations.  Making a snapshot is normally
nearly instantaneous, but there's a scaling issue if you have too many
per filesystem (try to keep it under 2000 snapshots per filesystem
total, if possible, and definitely keep it under 10K or some operations
will slow down substantially), and deleting snapshots is more work, so
while you should ordinarily automatically thin down snapshots if you're
automatically making them quite frequently (say daily or more
frequently), you may want to put the snapshot deletion, at least, on
hold while you scrub or balance or device delete or replace.



I would actually recommend putting all snapshot operations on hold, as
well as most writes to the filesystem, while doing a balance or device
deletion.  The more writes you have while doing those, the longer they
take, and the less likely that you end up with a good on-disk layout of
the data.


The thing with snapshot writing is that all snapshot creation effectively
does is a bit of metadata writing.  What snapshots primarily do is lock
existing extents in place (down within their chunk, with the higher chunk
level being the scope at which balance works), that would otherwise be
COWed elsewhere with the existing extent deleted on change, or simply
deleted on on file delete.  A snapshot simply adds a reference to the
current version, so that deletion, either directly or from the COW, never
happens, and to do that simply requires a relatively small metadata write.
Unless I'm mistaken about the internals of BTRFS (which might be the 
case), creating a snapshot has to update reference counts on every 
single extent in every single file in the snapshot.  For something small 
this isn't much, but if you are snapshotting something big (say, 
snapshotting an entire system with all the data in one subvolume), it 
can amount to multiple MB of writes, and it gets even worse if you have 
no shared extents to begin with (which is still pretty typical).  On 
some of the systems I work with at work, snapshotting a terabyte of data 
can end up resulting in 10-20 MB of writes to disk (in this case, that 
figure came from a partition containing mostly small files that were 
just big enough that they didn't fit in-line in the metadata blocks).


This is of course still significantly faster than copying everything, 
but it's not free either.


So while I agree in general that more writes means balances taking
longer, snapshot creation writes are pretty tiny in the scheme of things,
and won't affect the balance much, compared to larger writes you'll very
possibly still be doing unless you really do suspend pretty much all
write operations to that filesystem during the balance.
In general, yes, except that there's the case of running with mostly 
full metadata chunks, where it might result in a further chunk 
allocation, which in turn can throw off the balanced layout.  Balance 
always allocates new chunks, and doesn't write into existing ones, so if 
you're writing enough to allocate a new chunk while a balance is happening:
1. That chunk may or may not get considered by the balance code (I'm not 
100% certain about this, but I believe it will be ignored by any balance 
running at the time it gets allocated).
2. You run the risk of ending up with a chunk with almost nothing in it 
which could be packed into another existing chunk.
Snapshots are not likely to trigger this, but it is still possible, 
especially if you're taking lots of snapshots in a short period of time.


But as I said, snapshot deletions are an entirely different story, as
then all those previously locked in place extents are potentially freed,
and the filesystem must do a lot of work to figure out which ones it can
actually free and free them, vs. ones that still have other references
which therefore cannot yet be freed.
Most of the issue here with balance is that you end up potentially doing 
an amount of unnecessary work which is unquantifiable before it's done.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] btrfs: send: silence an integer overflow warning

2016-04-25 Thread David Sterba
On Wed, Apr 13, 2016 at 09:40:59AM +0300, Dan Carpenter wrote:
> The "sizeof(*arg->clone_sources) * arg->clone_sources_count" expression
> can overflow.  It causes several static checker warnings.  It's all
> under CAP_SYS_ADMIN so it's not that serious but lets silence the
> warnings.
> 
> Signed-off-by: Dan Carpenter 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs fails desatrerous on fuzzy tests

2016-04-25 Thread David Sterba
On Tue, Apr 12, 2016 at 04:24:32PM +0200, Juergen Sauer wrote:
> Hi!
> do you know this paper ?
> 
> http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016.pdf

Yes. There were several bugreports resulting from the fuzzing, all fixed
in 4.5 and IIRC all of them happen during mount. Thus the awkwardly low
amount of time to trigger the bugs. The fuzzing suite is not yet
released and instrumenting all the code is not all trivial, but the
Oracle guys promised to do a release but at least we have the generated
images in the btrfs-progs testsuite.

I'm curious about this level of fuzzing as it can help to make the error
handling more robust, but we'll be never able to completely defend
against crafted images. For example we can detect a missing extent
mapping when looking for it, but we cannot distinguish that from an
existing but wrong mapping.  That would be like doing a full filesystem
integrity check all the time (because we cannot trust any data we read
from disk). There are exceptions where there's enough information cached
or available from other contexts, but overall too hard to fix. And this
applies to all filesystem.s
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add device while rebalancing

2016-04-25 Thread Duncan
Austin S. Hemmelgarn posted on Mon, 25 Apr 2016 07:18:10 -0400 as
excerpted:

> On 2016-04-23 01:38, Duncan wrote:
>>
>> And again with snapshotting operations.  Making a snapshot is normally
>> nearly instantaneous, but there's a scaling issue if you have too many
>> per filesystem (try to keep it under 2000 snapshots per filesystem
>> total, if possible, and definitely keep it under 10K or some operations
>> will slow down substantially), and deleting snapshots is more work, so
>> while you should ordinarily automatically thin down snapshots if you're
>> automatically making them quite frequently (say daily or more
>> frequently), you may want to put the snapshot deletion, at least, on
>> hold while you scrub or balance or device delete or replace.

> I would actually recommend putting all snapshot operations on hold, as
> well as most writes to the filesystem, while doing a balance or device
> deletion.  The more writes you have while doing those, the longer they
> take, and the less likely that you end up with a good on-disk layout of
> the data.

The thing with snapshot writing is that all snapshot creation effectively 
does is a bit of metadata writing.  What snapshots primarily do is lock 
existing extents in place (down within their chunk, with the higher chunk 
level being the scope at which balance works), that would otherwise be 
COWed elsewhere with the existing extent deleted on change, or simply 
deleted on on file delete.  A snapshot simply adds a reference to the 
current version, so that deletion, either directly or from the COW, never 
happens, and to do that simply requires a relatively small metadata write.

So while I agree in general that more writes means balances taking 
longer, snapshot creation writes are pretty tiny in the scheme of things, 
and won't affect the balance much, compared to larger writes you'll very 
possibly still be doing unless you really do suspend pretty much all 
write operations to that filesystem during the balance.

But as I said, snapshot deletions are an entirely different story, as 
then all those previously locked in place extents are potentially freed, 
and the filesystem must do a lot of work to figure out which ones it can 
actually free and free them, vs. ones that still have other references 
which therefore cannot yet be freed.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Switch to generic xattr handlers

2016-04-25 Thread David Sterba
On Fri, Apr 22, 2016 at 10:36:44PM +0200, Andreas Gruenbacher wrote:
> The btrfs_{set,remove}xattr inode operations check for a read-only root
> (btrfs_root_readonly) before calling into generic_{set,remove}xattr.  If
> this check is moved into __btrfs_setxattr, we can get rid of
> btrfs_{set,remove}xattr.
> 
> This patch applies to mainline, I would like to keep it together with
> the other xattr cleanups if possible, though.  Could you please review?
> 
> Thanks,
> Andreas
> 
> 
> Signed-off-by: Andreas Gruenbacher 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: Fix return value bug of qgroups check

2016-04-25 Thread David Sterba
On Mon, Apr 18, 2016 at 10:27:07AM +0800, Qu Wenruo wrote:
> Before this patch, although btrfsck will check qgroups if quota is
> enabled, it always return 0 even qgroup numbers are corrupted.
> 
> Fix it by allowing return value from report_qgroups function (formally
> defined as print_qgroup_difference).
> 
> Signed-off-by: Qu Wenruo 

All three applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: primary location of btrfs-progs changelog: The wiki?

2016-04-25 Thread Nicholas D Steeves
On 25 April 2016 at 07:36, David Sterba  wrote:
> The conversion looks relatively ok, indentation could be 2 spaces and
> all bullet lists with '*'. Thanks.

Done.  I also added one line before each new version.  I've attached
it, since it's just one file; however, if you prefer I can clone your
repo on github and submit it that way.

Cheers,
Nick


changelog.gz
Description: GNU Zip compressed data


Re: [PATCH] btrfs-progs: prop: remove an unnecessary condition on parse_args

2016-04-25 Thread David Sterba
On Wed, Apr 20, 2016 at 03:32:48PM +0900, Satoru Takeuchi wrote:
> >From commit c742debab11f ('btrfs-progs: fix a regression that
> "property" with -t option doesn't work'), the number of arguments
> is checked strictly. So the following condition never be
> satisfied.
> 
> Signed-off-by: Satoru Takeuchi 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] btrfs-progs: "device ready" accepts just one device

2016-04-25 Thread David Sterba
On Mon, Mar 14, 2016 at 01:05:15PM +0100, David Sterba wrote:
> On Mon, Mar 14, 2016 at 09:27:22AM +0900, Satoru Takeuchi wrote:
> > * actual result
> > 
> >   ===
> >   # ./btrfs device ready /dev/sdb foo
> >   #
> >   ===
> > 
> > * expecting result
> > 
> >   ===
> >   # ./btrfs device ready /dev/sdb foo
> >   btrfs device ready: too many arguments
> >   usage: btrfs device ready 
> > 
> >   Check device to see if it has all of its devices in cache for mounting
> > 
> >   #
> >   ===
> > 
> > Signed-off-by: Satoru Takeuchi 
> > ---
> >  cmds-device.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/cmds-device.c b/cmds-device.c
> > index 33da2ce..23656c3 100644
> > --- a/cmds-device.c
> > +++ b/cmds-device.c
> > @@ -326,7 +326,7 @@ static int cmd_device_ready(int argc, char **argv)
> > 
> > clean_args_no_options(argc, argv, cmd_device_ready_usage);
> > 
> > -   if (check_argc_min(argc - optind, 1))
> > +   if (check_argc_exact(argc - optind, 1))
> 
> This silently changes the semantics, so far it accepts multiple values
> though it contradicts the documentation. I'm not yet sure how to resolve
> that.

More than one argument did not work before, so I think it's ok to expect
just one device. Applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: primary location of btrfs-progs changelog: The wiki?

2016-04-25 Thread David Sterba
On Mon, Apr 25, 2016 at 07:24:27AM -0400, Nicholas D Steeves wrote:
> On 25 April 2016 at 07:12, David Sterba  wrote:
> > On Fri, Apr 22, 2016 at 08:41:36PM -0400, Nicholas D Steeves wrote:
> >> I'm just wondering where the primary location of the btrfs-progs
> >> changelog is located
> >
> > At the moment it's the release announcement in this mailinglist, that
> > gets copied to the wiki with some formatting adjustments. I'm willing to
> > copy the announcement text to a file in git (and will do for the next
> > release). But at the moment I won't add all the past changelogs so if
> > anybody wants to do that I'l appreciate that.
> 
> I'd be happy to.  Are you looking for something like:
> 
> curl https://btrfs.wiki.kernel.org/index.php/Changelog | html2text |
> sed '0,/(announcement)/d;/By version (linux kernel)/Q' | gzip -9 >
> changelog
> 
> With some formatting adjustments?

The conversion looks relatively ok, indentation could be 2 spaces and
all bullet lists with '*'. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: primary location of btrfs-progs changelog: The wiki?

2016-04-25 Thread Nicholas D Steeves
oops, that gzip -9 shouldn't be there :-/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: primary location of btrfs-progs changelog: The wiki?

2016-04-25 Thread Nicholas D Steeves
On 25 April 2016 at 07:12, David Sterba  wrote:
> On Fri, Apr 22, 2016 at 08:41:36PM -0400, Nicholas D Steeves wrote:
>> I'm just wondering where the primary location of the btrfs-progs
>> changelog is located
>
> At the moment it's the release announcement in this mailinglist, that
> gets copied to the wiki with some formatting adjustments. I'm willing to
> copy the announcement text to a file in git (and will do for the next
> release). But at the moment I won't add all the past changelogs so if
> anybody wants to do that I'l appreciate that.

I'd be happy to.  Are you looking for something like:

curl https://btrfs.wiki.kernel.org/index.php/Changelog | html2text |
sed '0,/(announcement)/d;/By version (linux kernel)/Q' | gzip -9 >
changelog

With some formatting adjustments?

Cheers,
Nicholas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Add device while rebalancing

2016-04-25 Thread Austin S. Hemmelgarn

On 2016-04-23 01:38, Duncan wrote:

Juan Alberto Cirez posted on Fri, 22 Apr 2016 14:36:44 -0600 as excerpted:


Good morning,
I am new to this list and to btrfs in general. I have a quick question:
Can I add a new device to the pool while the btrfs filesystem balance
command is running on the drive pool?


Adding a device while balancing shouldn't be a problem.  However,
depending on your redundancy mode, you may wish to cancel the balance and
start a new one after the device add, so the balance will take account of
it as well and balance it into the mix.
I'm not 100% certain about how balance will handle this, except that 
nothing should break.  I believe that it picks a device each time it 
goes to move a chunk, so it should evaluate any chunks operated on after 
the addition of the device for possible placement on that device (and it 
will probably end up putting a lot of them there because that device 
will almost certainly be less full than any of the others).  That said, 
you probably do want to cancel the balance, add the device, and re-run 
the balance so that things end up more evenly distributed.


Note that while device add doesn't do more than that on its own, device
delete/remove effectively initiates its own balance, moving the chunks on
the device being removed to the other devices.  So you wouldn't want to
be running a balance and then do a device remove at the same time.
IIRC, trying to delete a device while running a balance will fail, and 
return an error, because only one balance can be running at a given moment.


Similarly with btrfs replace, altho in that case, it's more directly
moving data from the device being replaced (if it's still there, or using
redundancy or parity to recover it if not) to the replacement device, a
more limited and often faster operation.  But you probably still don't
want to do a balance at the same time as it places unnecessary stress on
both the filesystem and the hardware, and even if the filesystem and
devices handle the stress fine, the result is going to be that both
operations take longer as they're both intensive operations that will
interfere with each other to some extent.
Agreed, this is generally not a good idea because of the stress it puts 
on the devices (and because it probably isn't well tested).


Similarly with btrfs scrub.  The operations are logically different
enough that they shouldn't really interfere with each other logically,
but they're both hardware intensive operations that will put unnecessary
stress on the system if you're doing more than one at a time, and will
result in both going slower than they normally would.
Actually, depending on a number of factors, scrubbing while balancing 
can actually finish faster than running one then the other in sequence. 
 It's really dependent on how both decide to pick chunks, and how your 
underlying devices handle read and write caching, but it can happen. 
Most of the time though, it should take around the same amount of time 
as running one then the other, or a little bit longer if you're on 
traditional disks.


And again with snapshotting operations.  Making a snapshot is normally
nearly instantaneous, but there's a scaling issue if you have too many
per filesystem (try to keep it under 2000 snapshots per filesystem total,
if possible, and definitely keep it under 10K or some operations will
slow down substantially), and deleting snapshots is more work, so while
you should ordinarily automatically thin down snapshots if you're
automatically making them quite frequently (say daily or more
frequently), you may want to put the snapshot deletion, at least, on hold
while you scrub or balance or device delete or replace.
I would actually recommend putting all snapshot operations on hold, as 
well as most writes to the filesystem, while doing a balance or device 
deletion.  The more writes you have while doing those, the longer they 
take, and the less likely that you end up with a good on-disk layout of 
the data.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs: use dynamic allocation for root item in create_subvol

2016-04-25 Thread David Sterba
The size of root item is more than 400 bytes, which is quite a lot of
stack space. As we do IO from inside the subvolume ioctls, we should
keep the stack usage low in case the filesystem is on top of other
layers (NFS, device mapper, iscsi, etc).

Signed-off-by: David Sterba 
---
 fs/btrfs/ioctl.c | 65 
 1 file changed, 37 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 053e677839fe..9a63fe07bc2e 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -439,7 +439,7 @@ static noinline int create_subvol(struct inode *dir,
 {
struct btrfs_trans_handle *trans;
struct btrfs_key key;
-   struct btrfs_root_item root_item;
+   struct btrfs_root_item *root_item;
struct btrfs_inode_item *inode_item;
struct extent_buffer *leaf;
struct btrfs_root *root = BTRFS_I(dir)->root;
@@ -455,16 +455,22 @@ static noinline int create_subvol(struct inode *dir,
u64 qgroup_reserved;
uuid_le new_uuid;
 
+   root_item = kzalloc(sizeof(*root_item), GFP_KERNEL);
+   if (!root_item)
+   return -ENOMEM;
+
ret = btrfs_find_free_objectid(root->fs_info->tree_root, &objectid);
if (ret)
-   return ret;
+   goto fail_free;
 
/*
 * Don't create subvolume whose level is not zero. Or qgroup will be
 * screwed up since it assume subvolme qgroup's level to be 0.
 */
-   if (btrfs_qgroup_level(objectid))
-   return -ENOSPC;
+   if (btrfs_qgroup_level(objectid)) {
+   ret = -ENOSPC;
+   goto fail_free;
+   }
 
btrfs_init_block_rsv(&block_rsv, BTRFS_BLOCK_RSV_TEMP);
/*
@@ -474,14 +480,14 @@ static noinline int create_subvol(struct inode *dir,
ret = btrfs_subvolume_reserve_metadata(root, &block_rsv,
   8, &qgroup_reserved, false);
if (ret)
-   return ret;
+   goto fail_free;
 
trans = btrfs_start_transaction(root, 0);
if (IS_ERR(trans)) {
ret = PTR_ERR(trans);
btrfs_subvolume_release_metadata(root, &block_rsv,
 qgroup_reserved);
-   return ret;
+   goto fail_free;
}
trans->block_rsv = &block_rsv;
trans->bytes_reserved = block_rsv.size;
@@ -509,47 +515,45 @@ static noinline int create_subvol(struct inode *dir,
BTRFS_UUID_SIZE);
btrfs_mark_buffer_dirty(leaf);
 
-   memset(&root_item, 0, sizeof(root_item));
-
-   inode_item = &root_item.inode;
+   inode_item = &root_item->inode;
btrfs_set_stack_inode_generation(inode_item, 1);
btrfs_set_stack_inode_size(inode_item, 3);
btrfs_set_stack_inode_nlink(inode_item, 1);
btrfs_set_stack_inode_nbytes(inode_item, root->nodesize);
btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755);
 
-   btrfs_set_root_flags(&root_item, 0);
-   btrfs_set_root_limit(&root_item, 0);
+   btrfs_set_root_flags(root_item, 0);
+   btrfs_set_root_limit(root_item, 0);
btrfs_set_stack_inode_flags(inode_item, BTRFS_INODE_ROOT_ITEM_INIT);
 
-   btrfs_set_root_bytenr(&root_item, leaf->start);
-   btrfs_set_root_generation(&root_item, trans->transid);
-   btrfs_set_root_level(&root_item, 0);
-   btrfs_set_root_refs(&root_item, 1);
-   btrfs_set_root_used(&root_item, leaf->len);
-   btrfs_set_root_last_snapshot(&root_item, 0);
+   btrfs_set_root_bytenr(root_item, leaf->start);
+   btrfs_set_root_generation(root_item, trans->transid);
+   btrfs_set_root_level(root_item, 0);
+   btrfs_set_root_refs(root_item, 1);
+   btrfs_set_root_used(root_item, leaf->len);
+   btrfs_set_root_last_snapshot(root_item, 0);
 
-   btrfs_set_root_generation_v2(&root_item,
-   btrfs_root_generation(&root_item));
+   btrfs_set_root_generation_v2(root_item,
+   btrfs_root_generation(root_item));
uuid_le_gen(&new_uuid);
-   memcpy(root_item.uuid, new_uuid.b, BTRFS_UUID_SIZE);
-   btrfs_set_stack_timespec_sec(&root_item.otime, cur_time.tv_sec);
-   btrfs_set_stack_timespec_nsec(&root_item.otime, cur_time.tv_nsec);
-   root_item.ctime = root_item.otime;
-   btrfs_set_root_ctransid(&root_item, trans->transid);
-   btrfs_set_root_otransid(&root_item, trans->transid);
+   memcpy(root_item->uuid, new_uuid.b, BTRFS_UUID_SIZE);
+   btrfs_set_stack_timespec_sec(&root_item->otime, cur_time.tv_sec);
+   btrfs_set_stack_timespec_nsec(&root_item->otime, cur_time.tv_nsec);
+   root_item->ctime = root_item->otime;
+   btrfs_set_root_ctransid(root_item, trans->transid);
+   btrfs_set_root_otransid(root_item, trans->transid);
 
btrfs_tree_unlock(leaf);
free_extent_buffer(leaf);
  

Re: primary location of btrfs-progs changelog: The wiki?

2016-04-25 Thread David Sterba
On Fri, Apr 22, 2016 at 08:41:36PM -0400, Nicholas D Steeves wrote:
> I'm just wondering where the primary location of the btrfs-progs
> changelog is located, because I'd like to include upstream changes in
> the Debian package.  Is it really the wiki?  If so, it would seem my
> options are copying+pasting with every release, or writing a script to
> download the page, convert it to text, and then do something like cut
> everything before By version (btrfs-progs) and everything after By
> version (linux kernel).

At the moment it's the release announcement in this mailinglist, that
gets copied to the wiki with some formatting adjustments. I'm willing to
copy the announcement text to a file in git (and will do for the next
release). But at the moment I won't add all the past changelogs so if
anybody wants to do that I'l appreciate that.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs forced readonly + errno=-28 No space left

2016-04-25 Thread Martin Svec
Dne 22.4.2016 v 23:00 Nicholas D Steeves napsal(a):
> On 21 April 2016 at 18:44, Chris Murphy  wrote:
>> On Thu, Apr 21, 2016 at 6:53 AM, Martin Svec  wrote:
>>> Hello,
>>>
>>> we use btrfs subvolumes for rsync-based backups. During backups btrfs often 
>>> fails with "No space
>>> left" error and goes to readonly mode (dmesg output is below) while there's 
>>> still plenty of
>>> unallocated space:
>> Are you snapshotting near the time of enospc? If so it's a known
>> problem that's been around for a while. There are some suggestions in
>> the archives but I think the main thing is to back off on the workload
>> momentarily, take the snapshot, and then resume the workload. I don't
>> think it has to come to a complete stop but it's a lot more
>> reproducible with heavy writes.
> Is this known problem specific to heavy writes + take a snapshot + -o
> compress (either zlib or lzo), or does this enospc also affect the
> more simple heavy writes + take a snapshot case?  Is there a greater
> likelyhood of running into it if using compression?

In our case, I saw no difference when the compression was disabled.

Martin Svec

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs forced readonly + errno=-28 No space left

2016-04-25 Thread Martin Svec
Dne 22.4.2016 v 0:44 Chris Murphy napsal(a):
> On Thu, Apr 21, 2016 at 6:53 AM, Martin Svec  wrote:
>> Hello,
>>
>> we use btrfs subvolumes for rsync-based backups. During backups btrfs often 
>> fails with "No space
>> left" error and goes to readonly mode (dmesg output is below) while there's 
>> still plenty of
>> unallocated space:
> Are you snapshotting near the time of enospc?

What do you mean by "near"? Milliseconds, seconds, minutes? In general, yes, 
but it's hard to say
exactly because multiple backup jobs run in parallel every night.

> If so it's a known problem that's been around for a while. There are some 
> suggestions in
> the archives but I think the main thing is to back off on the workload
> momentarily, take the snapshot, and then resume the workload. I don't
> think it has to come to a complete stop but it's a lot more
> reproducible with heavy writes.

I'm afraid we cannot throttle the workload, due to backup jobs concurrency. I 
would expect this to
be done at the filesystem level.

Anyway, how can I help to fix this bug? Is there anybody who works on fixing it 
or is it considered
a "feature"?

Best regards
Martin Svec


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html