Re: empty disk reports full
On Mon, Apr 25, 2016 at 8:03 AM, Alejandro Vargas wrote: > El Viernes, 1 de abril de 2016 10:05:07 Hugo Mills escribió: >> On Fri, Apr 01, 2016 at 11:50:50AM +0200, Alejandro Vargas wrote: >> > I am using a 2Tb disk for incremental backups. >> > >> > I use rsync for backing up to a subvolume, and each day I creates an >> > snapshot of the lastest snapshot and do rsync in this. >> > >> > When the disk becomes nearly full (100Gb or less available) I deletes the >> > oldest subvolume (withbtrfs subvolume delete). >> > >> > My problem is that *even removing ALL the subvolumes*, the free space does >> > not change. It continues reporting the same size (disk is nearly full). >> > >> > I tried "btrfs balance start /mnt/backup" but it takes hours and hours. >> > >> > I'm using linux 4.1.15 >> > btrfs-progs v4.1.2 >> >>Can you show us the output of both "sudo btrfs fi show" and "btrfs >> fi df /mnt/backup", please? > > Before deleting subvolumes: > > [root@backups ~]# df /mnt/backup > S.ficheros Tamaño Usados Disp Uso% Montado en > /dev/sdb11,9T 1,9T 5,0M 100% /mnt/backup > > > [root@backups ~]# ls -l /mnt/backup > total 0 > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160318/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160328/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160330/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160401/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160404/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160406/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160408/ > > > [root@backups ~]# btrfs fi show > Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 > Total devices 1 FS bytes used 1.80TiB > devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 > > btrfs-progs v4.1.2 > > [root@backups ~]# btrfs fi df /mnt/backup > Data, single: total=1.79TiB, used=1.79TiB > System, DUP: total=32.00MiB, used=240.00KiB > Metadata, DUP: total=17.00GiB, used=15.83GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > > > Now I remove the oldest subvolume: > > > [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160318/ > Delete subvolume (no-commit): '/mnt/backup/back20160318' > > [root@backups ~]# df /mnt/backup > S.ficheros Tamaño Usados Disp Uso% Montado en > /dev/sdb11,9T 1,9T 22M 100% /mnt/backup > > [root@backups ~]# btrfs fi show > Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 > Total devices 1 FS bytes used 1.80TiB > devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 > > [root@backups ~]# btrfs fi show > Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 > Total devices 1 FS bytes used 1.80TiB > devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 > > btrfs-progs v4.1.2 > [root@backups ~]# btrfs fi df /mnt/backup > Data, single: total=1.79TiB, used=1.79TiB > System, DUP: total=32.00MiB, used=240.00KiB > Metadata, DUP: total=17.00GiB, used=15.83GiB > GlobalReserve, single: total=512.00MiB, used=102.53MiB > > > > Now I remove 2 more subvolumes: > > [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160328/ > Delete subvolume (no-commit): '/mnt/backup/back20160328' > [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160330/ > Delete subvolume (no-commit): '/mnt/backup/back20160330' > > [root@backups ~]# df /mnt/backup/ > S.ficheros Tamaño Usados Disp Uso% Montado en > /dev/sdb11,9T 1,9T 348M 100% /mnt/backup > > [root@backups ~]# btrfs fi show > Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 > Total devices 1 FS bytes used 1.80TiB > devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 > > btrfs-progs v4.1.2 > > Data, single: total=1.79TiB, used=1.79TiB > System, DUP: total=32.00MiB, used=240.00KiB > Metadata, DUP: total=17.00GiB, used=15.83GiB > GlobalReserve, single: total=512.00MiB, used=98.94MiB > > > [root@backups ~]# ls -l /mnt/backup/ > total 0 > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160401/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160404/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160406/ > drwxr-xr-x 1 root root 86 mar 20 16:23 back20160408/ > > > Now I will remove the resting subvolumes > > [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160401/ > Delete subvolume (no-commit): '/mnt/backup/back20160401' > [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160404/ > Delete subvolume (no-commit): '/mnt/backup/back20160404' > [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160406/ > Delete subvolume (no-commit): '/mnt/backup/back20160406' > [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160408/ > Delete subvolume (no-commit): '/mnt/backup/back20160408' > > [root@backups ~]# ls -l /mnt/backup/ > total 0 > > [root@backups ~]# df /mnt/backup/ > S.ficheros Tamaño Usados Disp Uso% Montado en > /dev/sdb11,9T 1,9T 4,6G 100% /mnt/backup > [root@backups ~]# btrfs fi show > Label: 'disco_backup' uuid: cbfe8735-
[PATCH RFC 06/16] btrfs-progs: fsck: Introduce function to check referencer for data backref
From: Lu Fengqi Introduce new function check_extent_data_backref() to search referencer for a given data backref. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 96 1 file changed, 96 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 1d1b198..8f971b9 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -8873,6 +8873,102 @@ out: return 0; } +/* + * Check referencer for normal(inlined) data ref + * If len == 0, it will be resolved by searching in extent tree + */ +static int check_extent_data_backref(struct btrfs_fs_info *fs_info, +u64 root_id, u64 objectid, u64 offset, +u64 bytenr, u64 len) +{ + struct btrfs_root *root; + struct btrfs_root *extent_root = fs_info->extent_root; + struct btrfs_key key; + struct btrfs_path path; + struct extent_buffer *leaf; + struct btrfs_file_extent_item *fi; + int slot; + int found_referencer = 0; + int ret = 0; + + if (!len) { + key.objectid = bytenr; + key.type = BTRFS_EXTENT_ITEM_KEY; + key.offset = (u64)-1; + + btrfs_init_path(&path); + ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0); + if (ret < 0) + goto out; + ret = btrfs_previous_extent_item(extent_root, &path, bytenr); + if (ret) + goto out; + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + if (key.objectid != bytenr || + key.type != BTRFS_EXTENT_ITEM_KEY) + goto out; + len = key.offset; + btrfs_release_path(&path); + } + key.objectid = root_id; + btrfs_set_key_type(&key, BTRFS_ROOT_ITEM_KEY); + key.offset = (u64)-1; + + root = btrfs_read_fs_root(fs_info, &key); + if (IS_ERR(root)) + goto out; + + btrfs_init_path(&path); + key.objectid = objectid; + key.type = BTRFS_EXTENT_DATA_KEY; + /* +* It can be nasty as data backref offset is +* file offset - file extent offset, which is smaller or +* equal to original backref offset. +* The only special case is overflow. +* So we need to special judgement and do further search +*/ + key.offset = offset & (1ULL << 63) ? 0 : offset; + + ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0); + if (ret < 0) + goto out; + + /* Search afterwards to get correct one */ + while (1) { + leaf = path.nodes[0]; + slot = path.slots[0]; + + btrfs_item_key_to_cpu(leaf, &key, slot); + if (key.objectid != objectid || key.type != BTRFS_EXTENT_DATA_KEY) + break; + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + /* +* Except normal disk bytenr and disk num bytes, we still +* need to do extra check on dbackref offset as +* dbackref offset = file_offset - file_extent_offset +*/ + if (btrfs_file_extent_disk_bytenr(leaf, fi) == bytenr && + btrfs_file_extent_disk_num_bytes(leaf, fi) == len && + (u64)(key.offset - btrfs_file_extent_offset(leaf, fi)) == + offset) { + found_referencer = 1; + break; + } + ret = btrfs_next_item(root, &path); + if (ret) + break; + } +out: + btrfs_release_path(&path); + if (!found_referencer) { + error("Extent[%llu, %llu] lost referencer(root: %llu, owner: %llu, offset: %llu)", + bytenr, len, root_id, objectid, offset); + return -MISSING_REFERENCER; + } + return 0; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 03/16] btrfs-progs: fsck: Introduce function to query tree block level
From: Lu Fengqi Introduce function query_tree_block_level() to resolve tree block level by reading out the tree block. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 20 1 file changed, 20 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index d097edd..6633b6e 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -8716,6 +8716,26 @@ error: return err; } +/* + * Get real tree block level for case like shared block + * Return >= 0 as tree level + * Return <0 for error + */ +static int query_tree_block_level(struct btrfs_fs_info *fs_info, u64 bytenr) +{ + struct extent_buffer *eb; + u32 nodesize = btrfs_super_nodesize(fs_info->super_copy); + int ret = -EIO; + + eb = read_tree_block_fs_info(fs_info, bytenr, nodesize, 0); + if (!extent_buffer_uptodate(eb)) + goto out; + ret = btrfs_header_level(eb); +out: + free_extent_buffer(eb); + return ret; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 09/16] btrfs-progs: fsck: Introduce function to check dev extent item
From: Lu Fengqi Introduce function check_dev_extent_item() to find its referencer chunk. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 57 + 1 file changed, 57 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 7f9f848..92c254f 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -9126,6 +9126,63 @@ out: return -err; } +/* + * Check if a dev extent item is referred correctly by its chunk + */ +static int check_dev_extent_item(struct btrfs_fs_info *fs_info, +struct extent_buffer *eb, int slot) +{ + struct btrfs_root *chunk_root = fs_info->chunk_root; + struct btrfs_dev_extent *ptr; + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_chunk *chunk; + struct extent_buffer *l; + int num_stripes; + u64 length; + int i; + int found_chunk = 0; + int ret; + + btrfs_item_key_to_cpu(eb, &found_key, slot); + ptr = btrfs_item_ptr(eb, slot, struct btrfs_dev_extent); + length = btrfs_dev_extent_length(eb, ptr); + + key.objectid = btrfs_dev_extent_chunk_objectid(eb, ptr); + key.type = BTRFS_CHUNK_ITEM_KEY; + key.offset = btrfs_dev_extent_chunk_offset(eb, ptr); + + btrfs_init_path(&path); + ret = btrfs_search_slot(NULL, chunk_root, &key, &path, 0, 0); + if (ret) + goto out; + + l = path.nodes[0]; + chunk = btrfs_item_ptr(l, path.slots[0], struct btrfs_chunk); + if (btrfs_chunk_length(l, chunk) != length) + goto out; + + num_stripes = btrfs_chunk_num_stripes(l, chunk); + for (i = 0; i < num_stripes; i++) { + u64 devid = btrfs_stripe_devid_nr(l, chunk, i); + u64 offset = btrfs_stripe_offset_nr(l, chunk, i); + + if (devid == found_key.objectid && offset == found_key.offset) { + found_chunk= 1; + break; + } + } +out: + btrfs_release_path(&path); + if (!found_chunk) { + error("Device extent[%llu, %llu, %llu] didn't find the relative chunk", + found_key.objectid, found_key.offset, length); + return -MISSING_REFERENCER; + } + return 0; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 02/16] btrfs-progs: fsck: Introduce function to check data backref in extent tree
From: Lu Fengqi Introduce a new function check_data_extent_item() to check if the corresponding data backref exists in extent tree. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 151 ++ ctree.h | 2 + extent-tree.c | 2 +- 3 files changed, 154 insertions(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index 27fc26f..d097edd 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -322,6 +322,7 @@ struct root_item_info { */ #define MISSING_BACKREF(1 << 0) /* Completely no backref in extent tree */ #define BAD_BACKREF(1 << 1) /* Backref mismatch */ +#define UNALIGNED_BYTES(1 << 2) /* Some bytes are not aligned */ static void *print_status_check(void *p) { @@ -8565,6 +8566,156 @@ out: return -err; } +/* + * Check EXTENT_DATA item, mainly for its dbackref in extent tree + * + * Return <0 any error found and output error message + * Return 0 for no error found + */ +static int check_extent_data_item(struct btrfs_root *root, + struct extent_buffer *eb, int slot) +{ + struct btrfs_file_extent_item *fi; + struct btrfs_path path; + struct btrfs_root *extent_root = root->fs_info->extent_root; + struct btrfs_key key; + struct btrfs_key found_key; + struct extent_buffer *leaf; + struct btrfs_extent_item *ei; + struct btrfs_extent_inline_ref *iref; + struct btrfs_extent_data_ref *dref; + u64 owner; + u64 file_extent_gen; + u64 disk_bytenr; + u64 disk_num_bytes; + u64 extent_num_bytes; + u64 extent_flags; + u64 extent_gen; + u32 item_size; + unsigned long end; + unsigned long ptr; + int type; + u64 ref_root; + int found_dbackref = 0; + int err = 0; + int ret; + + btrfs_item_key_to_cpu(eb, &key, slot); + fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); + file_extent_gen = btrfs_file_extent_generation(eb, fi); + + /* Nothing to check for hole and inline data extents */ + if (btrfs_file_extent_type(eb, fi) == BTRFS_FILE_EXTENT_INLINE || + btrfs_file_extent_disk_bytenr(eb, fi) == 0) + return 0; + + disk_bytenr = btrfs_file_extent_disk_bytenr(eb, fi); + disk_num_bytes = btrfs_file_extent_disk_num_bytes(eb, fi); + extent_num_bytes = btrfs_file_extent_num_bytes(eb, fi); + /* Check unaligned disk_num_bytes and num_bytes */ + if (!IS_ALIGNED(disk_num_bytes, root->sectorsize)) { + error("File extent [%llu, %llu] has unaligned disk num bytes: %llu, should be aligned to %u", + key.objectid, key.offset, disk_num_bytes, + root->sectorsize); + err |= UNALIGNED_BYTES; + } else + data_bytes_allocated += disk_num_bytes; + if (!IS_ALIGNED(extent_num_bytes, root->sectorsize)) { + error("File extent [%llu, %llu] has unaligned num bytes: %llu, should be aligned to %u", + key.objectid, key.offset, extent_num_bytes, + root->sectorsize); + err |= UNALIGNED_BYTES; + } else + data_bytes_referenced += extent_num_bytes; + owner = btrfs_header_owner(eb); + + /* Check the data backref in extent tree */ + btrfs_init_path(&path); + key.objectid = btrfs_file_extent_disk_bytenr(eb, fi); + key.type = BTRFS_EXTENT_ITEM_KEY; + key.offset = btrfs_file_extent_disk_num_bytes(eb, fi); + + ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0); + if (ret) { + err |= MISSING_BACKREF; + goto error; + } + + leaf = path.nodes[0]; + slot = path.slots[0]; + btrfs_item_key_to_cpu(leaf, &found_key, slot); + ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); + + extent_flags = btrfs_extent_flags(leaf, ei); + extent_gen = btrfs_extent_generation(leaf, ei); + + btrfs_item_key_to_cpu(eb, &key, slot); + if (!(extent_flags & BTRFS_EXTENT_FLAG_DATA)) { + error("Extent[%llu %llu] backref type mismatch, wanted bit: %llx", + disk_bytenr, disk_num_bytes, + BTRFS_EXTENT_FLAG_DATA); + err |= BAD_BACKREF; + } + + if (file_extent_gen != extent_gen) { + error("Extent[%llu %llu] backref generation mismatch, wanted: %llu, have: %llu", + disk_bytenr, disk_num_bytes, file_extent_gen, + extent_gen); + err = BAD_BACKREF; + } + + /* Check data backref */ + item_size = btrfs_item_size_nr(leaf, path.slots[0]); + iref = (struct btrfs_extent_inline_ref *)(ei + 1); + ptr = (unsigned long)iref; + end = (unsigned long)ei + item_size; + while (ptr < end) { + iref = (struc
[PATCH RFC 15/16] btrfs-progs: fsck: Introduce traversal function for fsck
From: Lu Fengqi Introduce a new function traversal_tree_block() to do pre-order traversal, to co-operate with new fs/subvolume tree skip function. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 77 1 file changed, 77 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 92f8aa1..85d6cf4 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -9644,6 +9644,83 @@ need_check: return 1; } +/* + * Traversal function for tree block. + * We will do + * 1) Skip shared fs/subvolume tree blocks + * 2) Update related bytes accounting + * 3) Pre-order traversal + */ +static int traversal_tree_block(struct btrfs_root *root, + struct extent_buffer *node) +{ + struct extent_buffer *eb; + int level; + u64 nr; + int i; + int err = 0; + int ret; + + /* +* Skip shared fs/subvolume tree block, in that case they will +* be checked by referencer with lowest rootid +*/ + if (is_fstree(root->objectid) && !should_check(root, node)) + return 0; + + /* Update bytes accounting */ + total_btree_bytes += node->len; + if (fs_root_objectid(btrfs_header_owner(node))) + total_fs_tree_bytes += node->len; + if (btrfs_header_owner(node) == BTRFS_EXTENT_TREE_OBJECTID) + total_extent_tree_bytes += node->len; + if (!found_old_backref && + btrfs_header_owner(node) == BTRFS_TREE_RELOC_OBJECTID && + btrfs_header_backref_rev(node) == BTRFS_MIXED_BACKREF_REV && + !btrfs_header_flag(node, BTRFS_HEADER_FLAG_RELOC)) + found_old_backref = 1; + + /* pre-order tranversal, check itself first */ + level = btrfs_header_level(node); + ret = check_tree_block_ref(root, node, btrfs_header_bytenr(node), + btrfs_header_level(node), + btrfs_header_owner(node)); + err |= -ret; + if (err) + error("check %s failed root %llu bytenr %llu level %d, force continue check", + level ? "node":"leaf", root->objectid, + btrfs_header_bytenr(node), btrfs_header_level(node)); + + if (!level) { + btree_space_waste += btrfs_leaf_free_space(root, node); + ret = check_leaf_items(root, node); + err |= -ret; + return -err; + } + + nr = btrfs_header_nritems(node); + btree_space_waste += (BTRFS_NODEPTRS_PER_BLOCK(root) - nr) * + sizeof(struct btrfs_key_ptr); + + /* Then check all its children */ + for (i = 0; i < nr; i++) { + u64 blocknr = btrfs_node_blockptr(node, i); + + /* +* As a btrfs tree has most 8 levels(0~7), so it's quite +* safe to call the function itself. +*/ + eb = read_tree_block(root, blocknr, root->nodesize, 0); + if (extent_buffer_uptodate(eb)) { + ret = traversal_tree_block(root, eb); + err |= -ret; + } + free_extent_buffer(eb); + } + + return -err; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 01/16] btrfs-progs: fsck: Introduce function to check tree block backref in extent tree
From: Lu Fengqi Introduce function check_tree_block_ref() to check whether a tree block has correct backref in extent tree. Unlike old extent tree check method, we only use search_slot() to search reference, no extra structure will be allocated in heap to record what we have checked. This method may cause a little more IO, but should work for super large fs without triggering OOM. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 163 +++ 1 file changed, 163 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index d59968b..27fc26f 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -313,6 +313,16 @@ struct root_item_info { struct cache_extent cache_extent; }; +/* + * Error bit for low memory mode check. + * Return value should be - (ERR_BIT1 | ERR_BIT2 | ...) + * + * Current no caller cares about it yet. + * Just as an internal use error classification + */ +#define MISSING_BACKREF(1 << 0) /* Completely no backref in extent tree */ +#define BAD_BACKREF(1 << 1) /* Backref mismatch */ + static void *print_status_check(void *p) { struct task_ctx *priv = p; @@ -8402,6 +8412,159 @@ loop: goto again; } +/* + * Check backrefs of a tree block given by @bytenr or @eb. + * + * @root: the root containin the @bytenr or @eb + * @eb:tree block extent buffer, can be NULL + * @bytenr:bytenr of the tree block to search + * @level: tree level of the tree block + * @owner: owner of the tree block + * + * Return < 0 for any error found and output error message + * Return 0 for no error found + */ +static int check_tree_block_ref(struct btrfs_root *root, + struct extent_buffer *eb, u64 bytenr, + int level, u64 owner) +{ + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_root *extent_root = root->fs_info->extent_root; + struct btrfs_path path; + struct btrfs_extent_item *ei; + struct btrfs_extent_inline_ref *iref; + struct extent_buffer *leaf; + unsigned long end; + unsigned long ptr; + int slot; + int skinny_level; + int type; + u32 nodesize = root->nodesize; + u32 item_size; + u64 offset; + int found_ref = 0; + int err = 0; + int ret; + + btrfs_init_path(&path); + key.objectid = bytenr; + if (btrfs_fs_incompat(root->fs_info, + BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)) + key.type = BTRFS_METADATA_ITEM_KEY; + else + key.type = BTRFS_EXTENT_ITEM_KEY; + key.offset = (u64)-1; + + /* Search for the backref in extent tree */ + ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0); + if (ret < 0) { + err = MISSING_BACKREF; + goto out; + } + ret = btrfs_previous_extent_item(extent_root, &path, bytenr); + if (ret) { + err = MISSING_BACKREF; + goto out; + } + + leaf = path.nodes[0]; + slot = path.slots[0]; + btrfs_item_key_to_cpu(leaf, &found_key, slot); + + ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); + + if (btrfs_key_type(&found_key) == BTRFS_METADATA_ITEM_KEY) { + skinny_level = (int)found_key.offset; + iref = (struct btrfs_extent_inline_ref *)(ei + 1); + } else { + struct btrfs_tree_block_info *info; + + info = (struct btrfs_tree_block_info *)(ei + 1); + skinny_level = btrfs_tree_block_level(leaf, info); + iref = (struct btrfs_extent_inline_ref *)(info + 1); + } + + if (eb) { + u64 header_gen; + u64 extent_gen; + + if (!(btrfs_extent_flags(leaf, ei) & + BTRFS_EXTENT_FLAG_TREE_BLOCK)) { + error("Extent[%llu %u] backref type mismatch, missing bit: %llx", + found_key.objectid, nodesize, + BTRFS_EXTENT_FLAG_TREE_BLOCK); + err = BAD_BACKREF; + } + header_gen = btrfs_header_generation(eb); + extent_gen = btrfs_extent_generation(leaf, ei); + if (header_gen != extent_gen) { + error("Extent[%llu %u] backref generation mismatch, wanted: %llu, have: %llu", + found_key.objectid, nodesize, header_gen, + extent_gen); + err = BAD_BACKREF; + } + if (level != skinny_level) { + error("Extent[%llu %u] level mismatch, wanted: %u, have: %u", + found_key.objectid, nodesize, level, skinny_level); + err = BAD_BACKREF; + } + if (!is_fstree(own
[PATCH RFC 14/16] btrfs-progs: fsck: Introduce function to speed up fs tree check
From: Lu Fengqi Introduce function should_check() to reduced duplicated tree block check for fs/subvolume tree. The idea is, we only check the fs/subvolue tree block if we have the highest referencer rootid, according to extent tree. In that case, we can skip a lot of fs/subvolume tree block check if there are a lot of snapshots. Although we will do a lot of extent tree search for it. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 90 1 file changed, 90 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index db6fc8e..92f8aa1 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -9554,6 +9554,96 @@ next: return err; } +/* + * Helper function for later fs/subvol tree check. + * To determine if a tree block should be checked. + * This function will ensure only the directly referencer with lowest + * rootid to check a fs/subvolume tree block. + * + * Backref check at extent tree would detect error like missing subvolume + * tree, so we can do aggressive judgement to reduce duplicated check. + */ +static int should_check(struct btrfs_root *root, struct extent_buffer *eb) +{ + struct btrfs_root *extent_root = root->fs_info->extent_root; + struct btrfs_key key; + struct btrfs_path path; + struct extent_buffer *leaf; + int slot; + struct btrfs_extent_item *ei; + unsigned long ptr; + unsigned long end; + int type; + u32 item_size; + u64 offset; + struct btrfs_extent_inline_ref *iref; + int ret; + + btrfs_init_path(&path); + key.objectid = btrfs_header_bytenr(eb); + key.type = BTRFS_METADATA_ITEM_KEY; + key.offset = (u64)-1; + + /* +* Any failure in backref resolving means we can't determine +* who the tree block belongs to. +* So in that case, we need to check that tree block +*/ + ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0); + if (ret < 0) + goto need_check; + + ret = btrfs_previous_extent_item(extent_root, &path, +btrfs_header_bytenr(eb)); + if (ret) + goto need_check; + + leaf = path.nodes[0]; + slot = path.slots[0]; + btrfs_item_key_to_cpu(leaf, &key, slot); + ei = btrfs_item_ptr(leaf, slot, struct btrfs_extent_item); + + if (key.type == BTRFS_METADATA_ITEM_KEY) { + iref = (struct btrfs_extent_inline_ref *)(ei + 1); + } else { + struct btrfs_tree_block_info *info; + + info = (struct btrfs_tree_block_info *)(ei + 1); + iref = (struct btrfs_extent_inline_ref *)(info + 1); + } + + item_size = btrfs_item_size_nr(leaf, slot); + ptr = (unsigned long)iref; + end = (unsigned long)ei + item_size; + while (ptr < end) { + iref = (struct btrfs_extent_inline_ref *)ptr; + type = btrfs_extent_inline_ref_type(leaf, iref); + offset = btrfs_extent_inline_ref_offset(leaf, iref); + + /* +* We only check the tree block if current root is +* the lowest referencer of it. +*/ + if (type == BTRFS_TREE_BLOCK_REF_KEY && + offset < root->objectid) { + btrfs_release_path(&path); + return 0; + } + + ptr += btrfs_extent_inline_ref_size(type); + } + /* +* Normally we should also check keyed tree block ref, +* but that may be very time consuming. +* Inlined ref should already make us skip a lot of refs now. +* So skip search keyed tree block ref. +*/ + +need_check: + btrfs_release_path(&path); + return 1; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 10/16] btrfs-progs: fsck: Introduce function to check dev used space
From: Lu Fengqi Introduce function check_dev_item() to check used space with dev extent items. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 64 1 file changed, 64 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 92c254f..e2d1ebf 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -328,6 +328,7 @@ struct root_item_info { #define CROSSING_STRIPE_BOUNDARY (1 << 4) /* For kernel scrub workaround */ #define BAD_ITEM_SIZE (1 << 5) /* Bad item size */ #define UNKNOWN_TYPE (1 << 6) /* Unknown type */ +#define ACCOUNTING_MISMATCH (1 << 7) /* Used space accounting error */ static void *print_status_check(void *p) { @@ -9183,6 +9184,69 @@ out: return 0; } +/* + * Check the used space is correct with the dev item + */ +static int check_dev_item(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, int slot) +{ + struct btrfs_root *dev_root = fs_info->dev_root; + struct btrfs_dev_item *dev_item; + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_dev_extent *ptr; + u64 dev_id; + u64 used; + u64 total = 0; + int ret; + + dev_item = btrfs_item_ptr(eb, slot, struct btrfs_dev_item); + dev_id = btrfs_device_id(eb, dev_item); + used = btrfs_device_bytes_used(eb, dev_item); + + key.objectid = dev_id; + key.type = BTRFS_DEV_EXTENT_KEY; + key.offset = 0; + + btrfs_init_path(&path); + ret = btrfs_search_slot(NULL, dev_root, &key, &path, 0, 0); + if (ret < 0) { + btrfs_item_key_to_cpu(eb, &key, slot); + error("Couldn't find any releative dev extent for dev[%llu, %u, %llu]", + key.objectid, key.type, key.offset); + btrfs_release_path(&path); + return -MISSING_REFERENCER; + } + + /* Iterate dev_extents to calculate the used space of a device */ + while (1) { + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + + if (key.objectid > dev_id) + break; + if (key.type != BTRFS_DEV_EXTENT_KEY || key.objectid != dev_id) + goto next; + + ptr = btrfs_item_ptr(path.nodes[0], path.slots[0], +struct btrfs_dev_extent); + total += btrfs_dev_extent_length(path.nodes[0], ptr); +next: + ret = btrfs_next_item(dev_root, &path); + if (ret) + break; + } + btrfs_release_path(&path); + + if (used != total) { + btrfs_item_key_to_cpu(eb, &key, slot); + error("Dev extent's total-byte(%llu) is not equal to byte-used(%llu) in dev[%llu, %u, %llu]", + total, used, BTRFS_ROOT_TREE_OBJECTID, + BTRFS_DEV_EXTENT_KEY, dev_id); + return -ACCOUNTING_MISMATCH; + } + return 0; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 04/16] btrfs-progs: fsck: Introduce function to check referencer of a backref
From: Lu Fengqi Introduce a new function check_tree_block_backref() to check if a backref points to correct referencer. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 95 1 file changed, 95 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 6633b6e..81dd4f3 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -323,6 +323,8 @@ struct root_item_info { #define MISSING_BACKREF(1 << 0) /* Completely no backref in extent tree */ #define BAD_BACKREF(1 << 1) /* Backref mismatch */ #define UNALIGNED_BYTES(1 << 2) /* Some bytes are not aligned */ +#define MISSING_REFERENCER (1 << 3) /* Referencer not found */ +#define BAD_REFERENCER (1 << 4) /* Referencer found, but not mismatch */ static void *print_status_check(void *p) { @@ -8736,6 +8738,99 @@ out: return ret; } +/* + * Check if a tree block backref is valid (points to valid tree block) + * if level == -1, level will be resolved + */ +static int check_tree_block_backref(struct btrfs_fs_info *fs_info, u64 root_id, + u64 bytenr, int level) +{ + struct btrfs_root *root; + struct btrfs_key key; + struct btrfs_path path; + struct extent_buffer *eb; + struct extent_buffer *node; + u32 nodesize = btrfs_super_nodesize(fs_info->super_copy); + int err = 0; + int ret; + + /* Query level for level == -1 special case */ + if (level == -1) + level = query_tree_block_level(fs_info, bytenr); + if (level < 0) { + err = MISSING_REFERENCER; + goto out; + } + + key.objectid = root_id; + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = (u64)-1; + + root = btrfs_read_fs_root(fs_info, &key); + if (IS_ERR(root)) { + err |= MISSING_REFERENCER; + goto out; + } + + /* Read out the tree block to get item/node key */ + eb = read_tree_block(root, bytenr, root->nodesize, 0); + /* Impossible, as tree block query has read out the tree block */ + if (!extent_buffer_uptodate(eb)) { + err |= MISSING_REFERENCER; + free_extent_buffer(eb); + goto out; + } + + /* Empty tree, no need to check key */ + if (!btrfs_header_nritems(eb) && !level) { + free_extent_buffer(eb); + goto out; + } + + if (level) + btrfs_node_key_to_cpu(eb, &key, 0); + else + btrfs_item_key_to_cpu(eb, &key, 0); + + free_extent_buffer(eb); + + btrfs_init_path(&path); + /* Search with the first key, to ensure we can reach it */ + ret = btrfs_search_slot(NULL, root, &key, &path, 0, 0); + if (ret) { + err |= MISSING_REFERENCER; + goto release_out; + } + + node = path.nodes[level]; + if (btrfs_header_bytenr(node) != bytenr) { + error("Extent [%llu %d] referencer bytenr mismatch, wanted: %llu, have: %llu", + bytenr, nodesize, bytenr, + btrfs_header_bytenr(node)); + err |= BAD_REFERENCER; + } + if (btrfs_header_level(node) != level) { + error("Extent [%llu %d] referencer level mismatch, wanted: %d, have: %d", + bytenr, nodesize, level, + btrfs_header_level(node)); + err |= BAD_REFERENCER; + } + +release_out: + btrfs_release_path(&path); +out: + if (err & MISSING_REFERENCER) { + if (level < 0) + error("Extent [%llu %d] lost referencer(owner: %llu)", + bytenr, nodesize, root_id); + else + error("Extent [%llu %d] lost referencer(owner: %llu, level: %u)", + bytenr, nodesize, root_id, level); + } + + return -err; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 08/16] btrfs-progs: fsck: Introduce function to check an extent
From: Lu Fengqi Introduce function check_extent_item() using previous introduced functions. With previous function to check referencer and backref, this function can be quite easy. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 113 +++ 1 file changed, 113 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 5588898..7f9f848 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -325,6 +325,9 @@ struct root_item_info { #define UNALIGNED_BYTES(1 << 2) /* Some bytes are not aligned */ #define MISSING_REFERENCER (1 << 3) /* Referencer not found */ #define BAD_REFERENCER (1 << 4) /* Referencer found, but not mismatch */ +#define CROSSING_STRIPE_BOUNDARY (1 << 4) /* For kernel scrub workaround */ +#define BAD_ITEM_SIZE (1 << 5) /* Bad item size */ +#define UNKNOWN_TYPE (1 << 6) /* Unknown type */ static void *print_status_check(void *p) { @@ -9013,6 +9016,116 @@ out: return 0; } +/* + * This function will check a given extent item, including its backref and + * itself (like crossing stripe boundary and type) + * + * Since we don't use extent_record anymore, introduce new error bit + */ +static int check_extent_item(struct btrfs_fs_info *fs_info, +struct extent_buffer *eb, int slot, int metadata) +{ + struct btrfs_extent_item *ei; + struct btrfs_extent_inline_ref *iref; + struct btrfs_extent_data_ref *dref; + unsigned long end; + unsigned long ptr; + int type; + u32 nodesize = btrfs_super_nodesize(fs_info->super_copy); + u32 item_size = btrfs_item_size_nr(eb, slot); + u64 flags; + u64 offset; + int level; + struct btrfs_key key; + int ret; + int err = 0; + + btrfs_item_key_to_cpu(eb, &key, slot); + + /* +* XXX: Do we really need to handle such historic +* extent structure? +*/ + if (item_size < sizeof(*ei)) { +#ifdef BTRFS_COMPAT_EXTENT_TREE_V0 + struct btrfs_extent_item_v0 *ei0; + + BUG_ON(item_size != sizeof(*ei0)); + return 1; +#else + BUG(); +#endif + } + + if (metadata && check_crossing_stripes(key.objectid, eb->len)) { + error("bad metadata [%llu, %llu) crossing stripe boundary", + key.objectid, key.objectid + nodesize); + err |= CROSSING_STRIPE_BOUNDARY; + } + + ei = btrfs_item_ptr(eb, slot, struct btrfs_extent_item); + flags = btrfs_extent_flags(eb, ei); + + ptr = (unsigned long)(ei + 1); + if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK && !metadata) { + struct btrfs_tree_block_info *info; + + info = (struct btrfs_tree_block_info *)ptr; + level = btrfs_tree_block_level(eb, info); + ptr += sizeof(struct btrfs_tree_block_info); + } else + level = key.offset; + end = (unsigned long)ei + item_size; + + if (ptr >= end) { + err |= BAD_ITEM_SIZE; + goto out; + } + + /* Now check every backref in this extent item */ +next: + iref = (struct btrfs_extent_inline_ref *)ptr; + type = btrfs_extent_inline_ref_type(eb, iref); + offset = btrfs_extent_inline_ref_offset(eb, iref); + switch (type) { + case BTRFS_TREE_BLOCK_REF_KEY: + ret = check_tree_block_backref(fs_info, offset, key.objectid, + level); + err |= -ret; + break; + case BTRFS_SHARED_BLOCK_REF_KEY: + ret = check_shared_block_backref(fs_info, offset, key.objectid, +level); + err |= -ret; + break; + case BTRFS_EXTENT_DATA_REF_KEY: + dref = (struct btrfs_extent_data_ref *)(&iref->offset); + ret = check_extent_data_backref(fs_info, + btrfs_extent_data_ref_root(eb, dref), + btrfs_extent_data_ref_objectid(eb, dref), + btrfs_extent_data_ref_offset(eb, dref), + key.objectid, key.offset); + err |= -ret; + break; + case BTRFS_SHARED_DATA_REF_KEY: + ret = check_shared_data_backref(fs_info, offset, key.objectid); + err |= -ret; + break; + default: + error("Extent[%llu %d %llu] has unknown ref type: %d", + key.objectid, key.type, key.offset, type); + err |= UNKNOWN_TYPE; + goto out; + } + + ptr += btrfs_extent_inline_ref_size(type); + if (ptr < end) + goto next; + +out: + return -err; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root
[PATCH RFC 11/16] btrfs-progs: fsck: Introduce function to check block group item
From: Lu Fengqi Introduce function check_block_group_item() to check a block group item. It will check the referencer chunk and the used space accounting with extent tree. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 116 +++ 1 file changed, 116 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index e2d1ebf..b9fbb02 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -329,6 +329,7 @@ struct root_item_info { #define BAD_ITEM_SIZE (1 << 5) /* Bad item size */ #define UNKNOWN_TYPE (1 << 6) /* Unknown type */ #define ACCOUNTING_MISMATCH (1 << 7) /* Used space accounting error */ +#define MISMATCH_TYPE (1 << 8) static void *print_status_check(void *p) { @@ -9247,6 +9248,121 @@ next: return 0; } +/* + * Check a block group item with its referener(chunk) and its used space + * with extent/metadata item + */ +static int check_block_group_item(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, int slot) +{ + struct btrfs_root *extent_root = fs_info->extent_root; + struct btrfs_root *chunk_root = fs_info->chunk_root; + struct btrfs_block_group_item *bi; + struct btrfs_block_group_item bg_item; + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_chunk *chunk; + struct extent_buffer *leaf; + struct btrfs_extent_item *ei; + u32 nodesize = btrfs_super_nodesize(fs_info->super_copy); + u64 flags; + u64 bg_flags; + u64 used; + u64 total = 0; + int ret; + int err = 0; + + btrfs_item_key_to_cpu(eb, &found_key, slot); + bi = btrfs_item_ptr(eb, slot, struct btrfs_block_group_item); + read_extent_buffer(eb, &bg_item, (unsigned long)bi, sizeof(bg_item)); + used = btrfs_block_group_used(&bg_item); + bg_flags = btrfs_block_group_flags(&bg_item); + + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; + key.type = BTRFS_CHUNK_ITEM_KEY; + key.offset = found_key.objectid; + + btrfs_init_path(&path); + /* Search for the referencer chunk */ + ret = btrfs_search_slot(NULL, chunk_root, &key, &path, 0, 0); + if (ret) { + error("Block group[%llu %llu] didn't find the releative chunk item", + found_key.objectid, found_key.offset); + err |= MISSING_REFERENCER; + } else { + chunk = btrfs_item_ptr(path.nodes[0], path.slots[0], + struct btrfs_chunk); + if (btrfs_chunk_length(path.nodes[0], chunk) != + found_key.offset) { + error("Block group[%llu %llu] relative chunk item length don't match", + found_key.objectid, found_key.offset); + err |= BAD_REFERENCER; + } + } + btrfs_release_path(&path); + + key.objectid = 0; + key.type = BTRFS_METADATA_ITEM_KEY; + key.offset = found_key.objectid; + + btrfs_init_path(&path); + ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0); + if (ret < 0) + goto out; + + /* Iterate extent tree to account used space */ + while (1) { + leaf = path.nodes[0]; + btrfs_item_key_to_cpu(leaf, &key, path.slots[0]); + if (key.objectid >= found_key.objectid + found_key.offset) + break; + + if (key.type != BTRFS_METADATA_ITEM_KEY && + key.type != BTRFS_EXTENT_ITEM_KEY) + goto next; + if (key.objectid < found_key.objectid) + goto next; + + if (key.type == BTRFS_METADATA_ITEM_KEY) + total += nodesize; + else + total += key.offset; + + ei = btrfs_item_ptr(leaf, path.slots[0], + struct btrfs_extent_item); + flags = btrfs_extent_flags(leaf, ei); + if (flags & BTRFS_EXTENT_FLAG_DATA) { + if (!(bg_flags & BTRFS_BLOCK_GROUP_DATA)) { + error("bad extent[%llu, %llu) type mismatch with chunk", + key.objectid, key.objectid + key.offset); + err |= MISMATCH_TYPE; + } + } else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { + if (!(bg_flags & (BTRFS_BLOCK_GROUP_SYSTEM | + BTRFS_BLOCK_GROUP_METADATA))) { + error("bad extent[%llu, %llu) type mismatch with chunk", + key.objectid, key.objectid + nodesize); + err |= MISMATCH_TYPE; + } +
[PATCH RFC 12/16] btrfs-progs: fsck: Introduce function to check chunk item
From: Lu Fengqi Introduce function check_chunk_item() to check a chunk item. It will check all chunk stripes with dev extents and the corresponding block group item. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 109 +++ 1 file changed, 109 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index b9fbb02..a02db07 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -9363,6 +9363,115 @@ out: return -err; } +/* + * Check a chunk item. + * Including checking all referred dev_extents and block group + */ +static int check_chunk_item(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, int slot) +{ + struct btrfs_root *extent_root = fs_info->extent_root; + struct btrfs_root *dev_root = fs_info->dev_root; + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_key found_key; + struct btrfs_chunk *chunk; + struct extent_buffer *leaf; + struct btrfs_block_group_item *bi; + struct btrfs_block_group_item bg_item; + struct btrfs_dev_extent *ptr; + u32 sectorsize = btrfs_super_sectorsize(fs_info->super_copy); + u64 length; + u64 type; + u64 profile; + int num_stripes; + u64 offset; + u64 objectid; + int i; + int ret; + int err = 0; + + btrfs_item_key_to_cpu(eb, &found_key, slot); + chunk = btrfs_item_ptr(eb, slot, struct btrfs_chunk); + length = btrfs_chunk_length(eb, chunk); + if (!IS_ALIGNED(length, sectorsize)) { + error("Chunk[%llu %llu] length %llu not aligned to %u", + found_key.objectid, found_key.offset, + length, sectorsize); + err |= UNALIGNED_BYTES; + goto out; + } + + type = btrfs_chunk_type(eb, chunk); + profile = type & BTRFS_BLOCK_GROUP_PROFILE_MASK; + if (!(type & BTRFS_BLOCK_GROUP_TYPE_MASK)) { + error("Chunk[%llu %llu] has no chunk type", + found_key.objectid, found_key.offset); + err |= UNKNOWN_TYPE; + } + if (profile && (profile & (profile - 1))) { + error("Chunk[%llu %llu] multiple profiled detected", + found_key.objectid, found_key.offset); + err |= UNKNOWN_TYPE; + } + + key.objectid = found_key.offset; + btrfs_set_key_type(&key, BTRFS_BLOCK_GROUP_ITEM_KEY); + key.offset = length; + + btrfs_init_path(&path); + ret = btrfs_search_slot(NULL, extent_root, &key, &path, 0, 0); + if (ret) { + error("Chunk[%llu %llu] didn't find the releative block group item", + found_key.objectid, found_key.offset); + err |= MISSING_REFERENCER; + } else{ + leaf = path.nodes[0]; + bi = btrfs_item_ptr(leaf, path.slots[0], + struct btrfs_block_group_item); + read_extent_buffer(leaf, &bg_item, (unsigned long)bi, + sizeof(bg_item)); + if (btrfs_block_group_flags(&bg_item) != type) { + error("Chunk[%llu %llu] releative block group item flags mismatch, wanted: %llu, have: %llu", + found_key.objectid, found_key.offset, type, + btrfs_block_group_flags(&bg_item)); + err |= MISSING_REFERENCER; + } + } + + num_stripes = btrfs_chunk_num_stripes(eb, chunk); + for (i = 0; i < num_stripes; i++) { + btrfs_release_path(&path); + btrfs_init_path(&path); + key.objectid = btrfs_stripe_devid_nr(eb, chunk, i); + btrfs_set_key_type(&key, BTRFS_DEV_EXTENT_KEY); + key.offset = btrfs_stripe_offset_nr(eb, chunk, i); + + ret = btrfs_search_slot(NULL, dev_root, &key, &path, 0, 0); + if (ret) + goto not_match_dev; + + leaf = path.nodes[0]; + ptr = btrfs_item_ptr(leaf, path.slots[0], +struct btrfs_dev_extent); + objectid = btrfs_dev_extent_chunk_objectid(leaf, ptr); + offset = btrfs_dev_extent_chunk_offset(leaf, ptr); + if (objectid != found_key.objectid || + offset != found_key.offset || + btrfs_dev_extent_length(leaf, ptr) != length) + goto not_match_dev; + continue; +not_match_dev: + err |= MISSING_BACKREF; + error("Chunk[%llu %llu] stripe %d didn't find the releative dev extent", + found_key.objectid, found_key.offset, i); + continue; + } + btrfs_release_path(&path); +out: + return -err; +} + static int btrfs_fsck_reinit_root(struct btrfs_tr
[PATCH RFC 00/16] Introduce low memory usage btrfsck mode
The branch can be fetched from my github: https://github.com/adam900710/btrfs-progs.git low_mem_fsck_rebasing Original btrfsck checks extent tree in a very efficient method, by recording every checked extent in extent record tree to ensure every extent will be iterated for at most 2 times. However extent records are all stored in heap memory, and consider how large a btrfs file system can be, it can easily eat up all memory and cause OOM for TB-sized metadata. Instead of such heap memory usage, we introduce low memory usage fsck mode. In this mode, we will use btrfs_search_slot() only and avoid any heap memory allocation. The work flow is: 1) Iterate extent tree (backref check) And check whether the referencer of every backref exists. 2) Iterate other trees (forward ref check) And check whether the backref of every tree block/data exists in extent tree. So in theory, every extent is iterated twice just as original one. But since we don't have extent record, but use btrfs_search_slot() every time we check, it will cause extra IO. I assume the extra IO is reasonable and should make btrfsck able to handle super large fs. TODO features: 1) Repair Repair should be the same as old btrfsck, but still need to determine the repair principle. Current repair sometimes uses backref to repair data extent, sometimes uses data extent to fix backref. We need a consistent principle, or we will screw things up. 2) Replace current fsck code We assume the low memory mode has less lines of code, and may be easier for review and expand. If low memory mode is stable enough, we will consider to replace current extent and chunk tree check codes to free a lot of lines. 3) Further code refining Reduce duplicated codes 4) Unify output Make the output of low-memory mode same as the normal one. Lu Fengqi (16): btrfs-progs: fsck: Introduce function to check tree block backref in extent tree btrfs-progs: fsck: Introduce function to check data backref in extent tree btrfs-progs: fsck: Introduce function to query tree block level btrfs-progs: fsck: Introduce function to check referencer of a backref btrfs-progs: fsck: Introduce function to check shared block ref btrfs-progs: fsck: Introduce function to check referencer for data backref btrfs-progs: fsck: Introduce function to check shared data backref btrfs-progs: fsck: Introduce function to check an extent btrfs-progs: fsck: Introduce function to check dev extent item btrfs-progs: fsck: Introduce function to check dev used space btrfs-progs: fsck: Introduce function to check block group item btrfs-progs: fsck: Introduce function to check chunk item btrfs-progs: fsck: Introduce hub function for later fsck btrfs-progs: fsck: Introduce function to speed up fs tree check btrfs-progs: fsck: Introduce traversal function for fsck btrfs-progs: fsck: Introduce low memory mode Documentation/btrfs-check.asciidoc |2 + cmds-check.c | 1667 +--- ctree.h|2 + extent-tree.c |2 +- 4 files changed, 1536 insertions(+), 137 deletions(-) -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 07/16] btrfs-progs: fsck: Introduce function to check shared data backref
From: Lu Fengqi Introduce the function check_shared_data_backref() to check the referencer of a given shared data backref. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 44 1 file changed, 44 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 8f971b9..5588898 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -8969,6 +8969,50 @@ out: return 0; } +/* + * Check if the referencer of a shared data backref exists + */ +static int check_shared_data_backref(struct btrfs_fs_info *fs_info, +u64 parent, u64 bytenr) +{ + struct extent_buffer *eb; + struct btrfs_key key; + struct btrfs_file_extent_item *fi; + u32 nodesize = btrfs_super_nodesize(fs_info->super_copy); + u32 nr; + int found_parent = 0; + int i; + + eb = read_tree_block_fs_info(fs_info, parent, nodesize, 0); + if (!extent_buffer_uptodate(eb)) + goto out; + + nr = btrfs_header_nritems(eb); + for (i = 0; i < nr; i++) { + btrfs_item_key_to_cpu(eb, &key, i); + if (key.type != BTRFS_EXTENT_DATA_KEY) + continue; + + fi = btrfs_item_ptr(eb, i, struct btrfs_file_extent_item); + if (btrfs_file_extent_type(eb, fi) == BTRFS_FILE_EXTENT_INLINE) + continue; + + if (btrfs_file_extent_disk_bytenr(eb, fi) == bytenr) { + found_parent = 1; + break; + } + } + +out: + free_extent_buffer(eb); + if (!found_parent) { + error("Shared extent %llu referencer lost(parent: %llu)", + bytenr, parent); + return -MISSING_REFERENCER; + } + return 0; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 05/16] btrfs-progs: fsck: Introduce function to check shared block ref
From: Lu Fengqi Introduce function check_shared_block_backref() to check shared block ref. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 42 ++ 1 file changed, 42 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index 81dd4f3..1d1b198 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -8831,6 +8831,48 @@ out: return -err; } +/* + * Check referencer for shared block backref + * If level == -1, this function will resolve the level. + */ +static int check_shared_block_backref(struct btrfs_fs_info *fs_info, +u64 parent, u64 bytenr, int level) +{ + struct extent_buffer *eb; + u32 nodesize = btrfs_super_nodesize(fs_info->super_copy); + u32 nr; + int found_parent = 0; + int i; + + eb = read_tree_block_fs_info(fs_info, parent, nodesize, 0); + if (!extent_buffer_uptodate(eb)) + goto out; + + if (level == -1) + level = query_tree_block_level(fs_info, bytenr); + if (level < 0) + goto out; + + if (level + 1 != btrfs_header_level(eb)) + goto out; + + nr = btrfs_header_nritems(eb); + for (i = 0; i < nr; i++) { + if (bytenr == btrfs_node_blockptr(eb, i)) { + found_parent = 1; + break; + } + } +out: + free_extent_buffer(eb); + if (!found_parent) { + error("Shared extent[%llu %u] lost its parent(parent: %llu, level: %u)", + bytenr, nodesize, parent, level); + return -MISSING_REFERENCER; + } + return 0; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 16/16] btrfs-progs: fsck: Introduce low memory mode
From: Lu Fengqi Introduce a new fsck mode: low memory mode. Old btrfsck is doing a quite efficient but uses some memory for each extent item. Old method will ensure extents are only iterated once at extent/chunk tree check process. But since it uses a little memory for each extent item, for large fs with several TB metadata, this can easily eat up memory and cause OOM. To handle such limitation and improve scalability, the new low-memory mode will not use any heap memory to record which extent is checked. Instead it will use extent backref to avoid most of uneeded check on shared fs/subvolume tree blocks. And with the use forward and backward reference cross check, we can also ensure every tree block is at least checked once. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- Documentation/btrfs-check.asciidoc | 2 + cmds-check.c | 80 +- 2 files changed, 80 insertions(+), 2 deletions(-) diff --git a/Documentation/btrfs-check.asciidoc b/Documentation/btrfs-check.asciidoc index 7371a23..96eadc8 100644 --- a/Documentation/btrfs-check.asciidoc +++ b/Documentation/btrfs-check.asciidoc @@ -35,6 +35,8 @@ run in read-only mode (default) create a new CRC tree and recalculate all checksums --init-extent-tree:: create a new extent tree +--low-memory:: +check fs in low memory usage mode(experimental) --check-data-csum:: verify checksums of data blocks -p|--progress:: diff --git a/cmds-check.c b/cmds-check.c index 85d6cf4..e9d68dd 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -71,6 +71,7 @@ static int repair = 0; static int no_holes = 0; static int init_extent_tree = 0; static int check_data_csum = 0; +static int low_memory = 0; static struct btrfs_fs_info *global_info; static struct task_ctx ctx = { 0 }; static struct cache_tree *roots_info_cache = NULL; @@ -9721,6 +9722,63 @@ static int traversal_tree_block(struct btrfs_root *root, return -err; } +/* + * Low memory usage version check_chunks_and_extents. + */ +static int check_chunks_and_extents_v2(struct btrfs_root *root) +{ + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_root *root1; + struct btrfs_root *cur_root; + int err = 0; + int ret; + + root1 = root->fs_info->chunk_root; + ret = traversal_tree_block(root1, root1->node); + err |= -ret; + + root1 = root->fs_info->tree_root; + ret = traversal_tree_block(root1, root1->node); + err |= -ret; + + btrfs_init_path(&path); + key.objectid = BTRFS_EXTENT_TREE_OBJECTID; + key.offset = 0; + key.type = BTRFS_ROOT_ITEM_KEY; + + ret = btrfs_search_slot(NULL, root1, &key, &path, 0, 0); + if (ret) { + error("couldn't find extent_tree_root from tree_root"); + goto out; + } + + while (1) { + btrfs_item_key_to_cpu(path.nodes[0], &key, path.slots[0]); + if (key.type != BTRFS_ROOT_ITEM_KEY) + goto next; + key.offset = (u64)-1; + + cur_root = btrfs_read_fs_root(root->fs_info, &key); + if (IS_ERR(cur_root) || !cur_root) { + error("Fail to read tree: %lld", key.objectid); + goto next; + } + + ret = traversal_tree_block(cur_root, cur_root->node); + err |= ret; + +next: + ret = btrfs_next_item(root1, &path); + if (ret) + goto out; + } + +out: + btrfs_release_path(&path); + return err; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { @@ -10837,6 +10895,7 @@ const char * const cmd_check_usage[] = { "--readonly run in read-only mode (default)", "--init-csum-treecreate a new CRC tree", "--init-extent-tree create a new extent tree", + "--low-memorycheck in low memory usage mode(experimental)", "--check-data-csum verify checkums of data blocks", "-Q|--qgroup-report print a report on qgroup consistency", "-E|--subvol-extents ", @@ -10868,7 +10927,8 @@ int cmd_check(int argc, char **argv) int c; enum { GETOPT_VAL_REPAIR = 257, GETOPT_VAL_INIT_CSUM, GETOPT_VAL_INIT_EXTENT, GETOPT_VAL_CHECK_CSUM, - GETOPT_VAL_READONLY, GETOPT_VAL_CHUNK_TREE }; + GETOPT_VAL_READONLY, GETOPT_VAL_CHUNK_TREE, + GETOPT_VAL_LOW_MEMORY }; static const struct option long_options[] = { { "super", required_argument, NULL, 's' }, { "repair", no_argument, NULL, GETOPT_VAL_REPAIR }, @@ -10886,6 +10946,8 @@ int cmd_check(int argc, char **argv) { "chun
[PATCH RFC 13/16] btrfs-progs: fsck: Introduce hub function for later fsck
From: Lu Fengqi Introduce a hub function, check_items() to check all known/valuable items and update related accounting like total_bytes and csum_bytes. Signed-off-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 82 1 file changed, 82 insertions(+) diff --git a/cmds-check.c b/cmds-check.c index a02db07..db6fc8e 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -9472,6 +9472,88 @@ out: return -err; } +/* + * Hub function to check known items and update related accounting info + */ +static int check_leaf_items(struct btrfs_root *root, struct extent_buffer *eb) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_key key; + int slot = 0; + int type; + int metadata; + struct btrfs_extent_data_ref *dref; + int ret; + int err = 0; + +next: + btrfs_item_key_to_cpu(eb, &key, slot); + type = btrfs_key_type(&key); + + switch (type) { + case BTRFS_EXTENT_DATA_KEY: + ret = check_extent_data_item(root, eb, slot); + err |= -ret; + break; + case BTRFS_BLOCK_GROUP_ITEM_KEY: + ret = check_block_group_item(fs_info, eb, slot); + err |= -ret; + break; + case BTRFS_DEV_ITEM_KEY: + ret = check_dev_item(fs_info, eb, slot); + err |= -ret; + break; + case BTRFS_CHUNK_ITEM_KEY: + ret = check_chunk_item(fs_info, eb, slot); + err |= -ret; + break; + case BTRFS_DEV_EXTENT_KEY: + ret = check_dev_extent_item(fs_info, eb, slot); + err |= -ret; + break; + case BTRFS_EXTENT_ITEM_KEY: + case BTRFS_METADATA_ITEM_KEY: + metadata = type == BTRFS_METADATA_ITEM_KEY; + ret = check_extent_item(fs_info, eb, slot, metadata); + err |= -ret; + break; + case BTRFS_EXTENT_CSUM_KEY: + total_csum_bytes += btrfs_item_size_nr(eb, slot); + break; + case BTRFS_TREE_BLOCK_REF_KEY: + ret = check_tree_block_backref(fs_info, key.offset, + key.objectid, -1); + err |= -ret; + break; + case BTRFS_EXTENT_DATA_REF_KEY: + dref = btrfs_item_ptr(eb, slot, struct btrfs_extent_data_ref); + ret = check_extent_data_backref(fs_info, + btrfs_extent_data_ref_root(eb, dref), + btrfs_extent_data_ref_objectid(eb, dref), + btrfs_extent_data_ref_offset(eb, dref), + key.objectid, 0); + err |= -ret; + break; + case BTRFS_SHARED_BLOCK_REF_KEY: + ret = check_shared_block_backref(fs_info, key.offset, +key.objectid, -1); + err |= -ret; + break; + case BTRFS_SHARED_DATA_REF_KEY: + ret = check_shared_data_backref(fs_info, key.offset, + key.objectid); + err |= -ret; + break; + default: + break; + } + + if (++slot < btrfs_header_nritems(eb)) + goto next; + + return err; +} + static int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite) { -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Double Freeing of dentry in check_parent_dirs_for_sync
Paulo Dias posted on Mon, 25 Apr 2016 22:40:59 -0300 as excerpted: > hi/2 all.. > > we are in 4.6 rc5 and im still seeing a LOT of this with my SSD: > > Abr 25 22:38:01 hydra kernel: [ cut here ] > Abr 25 22:38:01 hydra kernel: WARNING: CPU: 1 PID: 6236 at > /home/kernel/COD/linux/fs/btrfs/inode.c:9261 > btrfs_destroy_inode+0x247/0x2c0 [btrfs] I, OTOH, am not seeing any of them here, also SSD, after upgrading to pre-4.6-git shortly after 4.6-rc4. But my use-case is apparently less stress on the filesystem than many. Multiple small (largest is 24 GiB usable) btrfs raid1 on a pair of parallel-partitioned ssds, save for /boot, which is tiny (256 MiB) mixed-bg dup mode, with the first backup on the other device and the grub install for each device pointing at its /boot, so I can bios-select the backup when needed. The only serious problems I had were when one of the two ssds was going bad, forcing a replacement, after which I've not had major problems of any sort. Also, as I'm using multiple independent btrfs, including identically sized fallbacks as first backup on the same pair of physical devices, I don't use subvolumes and don't do snapshots. Also, no active quotas and I mount with autodefrag, ssd is automatically detected, and I don't use the discard mount option. So with your ssd showing the problem and mine not, it's not directly ssd related, but if you do snapshotting and/or subvolumes, it could be related to that, or quotas, or trim/discard, or filesystem size. Meanwhile, see the "btrfs_destroy_inode WARN_ON" thread, which interestingly enough, had a followup posted apparently the exact same minute as yours was, to this thread. Based on that, it's not just you, but by that reply anyway, despite seeing lots of the warn-ons and getting scared back to an earlier kernel as a result, no dataloss was observed. So without a pin-down it's tough to say it /can't/ happen, but at least based on the reply there, with the warn-ons apparently happening about every 10 minutes even with light use, no data loss from it to date, so while data loss /might/ still be possible, if it is, thankfully it doesn't seem to actually trigger very often, even under heavy destroy-inode warn-on triggering. So they're obviously aware of the problem and presumably working on it, but it's equally obviously not fixed yet. Were I seeing the problem frequently (again, I've not seen it at all), I'd likely drop back to 4.5 until there's a fix, tho if it takes long enough 4.5 might be going out of support, 4.4-LTS is of course another option. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: fsck: Fix found bytes accounting error
In the new add_extent_rec_nolookup() function, we add bytes_used to update found bytes accounting. However there is a typo that we used tmpl->nr, which should be rec->nr. This will make us to add 1 for data backref, instead the correct size. Reported-by: Lu Fengqi Signed-off-by: Qu Wenruo --- cmds-check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-check.c b/cmds-check.c index d59968b..b207f8e 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -4550,7 +4550,7 @@ static int add_extent_rec_nolookup(struct cache_tree *extent_cache, rec->cache.size = tmpl->nr; ret = insert_cache_extent(extent_cache, &rec->cache); BUG_ON(ret); - bytes_used += tmpl->nr; + bytes_used += rec->nr; if (tmpl->metadata) rec->crossing_stripes = check_crossing_stripes(rec->start, -- 2.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_destroy_inode WARN_ON.
On Mon, Mar 28, 2016 at 04:14:46PM +0200, Markus Trippelsdorf wrote: > On 2016.03.28 at 10:05 -0400, Josef Bacik wrote: > > >Mar 24 10:37:27 x4 kernel: WARNING: CPU: 3 PID: 11838 at > > >fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x22b/0x2a0 > > > > I saw this running some xfstests on our internal kernels but haven't been > > able to reproduce it on my latest enospc work (which is obviously perfect). > > What were you doing when you tripped this? I'd like to see if I actually > > did fix it or if I still need to run it down. Thanks, > > I cannot really tell. Looking at the backtrace, both Dave and I were > running rm. > This warning happened just once on my machine, so the issue is obviously > very hard to trigger. On the other hand, it seems to be triggering really often (on the order of ~10 mins of light use) on my box. I understandably ran away from 4.6-rc to stable kernels (no one likes to risk data loss), but even in that little time it triggered 328 times (over ~20ish boots). Despite all of these WARNs, there's no data loss yet on the disk in question, and the filesystem appears consistent. Call stacks show a variety of callers of btrfs_destroy_inode, originating from do_unlinkat, SyS_rename, btrfs_ioctl_snap_destroy, shrink_zone, or task_work_run, direct callers being: do_unlinkat __dentry_kill dput __dentry_kill shrink_dentry_list dispose_list prune_icache_sb Just tried 4.6-rc5, it's still there. Any way I could help debug this? -- A tit a day keeps the vet away. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Possible Double Freeing of dentry in check_parent_dirs_for_sync
hi/2 all.. we are in 4.6 rc5 and im still seeing a LOT of this with my SSD: Abr 25 22:38:01 hydra kernel: [ cut here ] Abr 25 22:38:01 hydra kernel: WARNING: CPU: 1 PID: 6236 at /home/kernel/COD/linux/fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x247/0x2c0 [btrfs] Abr 25 22:38:01 hydra kernel: Modules linked in: drbg ansi_cprng ctr ccm rfcomm hid_generic usbhid hid rtsx_usb_ms memstick pci_stub bnep vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) binfmt_misc nls_iso8859_1 dell_wmi sparse_keymap ath3k intel_rapl btusb x86_pkg_temp_thermal intel_powerclamp btrtl dell_laptop btbcm btintel coretemp bluetooth dell_smm_hwmon kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul uvcvideo dell_led dell_smbios ghash_clmulni_intel videobuf2_vmalloc dcdbas videobuf2_memops videobuf2_v4l2 aesni_intel videobuf2_core snd_hda_codec_realtek aes_x86_64 snd_hda_codec_generic videodev lrw gf128mul arc4 media glue_helper ablk_helper cryptd snd_hda_intel snd_hda_codec ath9k snd_hda_core input_leds ath9k_common joydev snd_hwdep serio_raw snd_pcm ath9k_hw ath snd_seq_midi mac80211 snd_seq_midi_event Abr 25 22:38:01 hydra kernel: snd_rawmidi lpc_ich snd_seq cfg80211 snd_seq_device snd_timer snd mei_me soundcore mei shpchp soc_button_array mac_hid dell_rbtn parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq rtsx_usb_sdmmc rtsx_usb amdkfd amd_iommu_v2 radeon i915 ttm i2c_algo_bit drm_kms_helper syscopyarea psmouse sysfillrect sysimgblt fb_sys_fops ahci libahci r8169 drm mii wmi video fjes Abr 25 22:38:01 hydra kernel: CPU: 1 PID: 6236 Comm: apt Tainted: G W OE 4.6.0-040600rc5-generic #201604242031 Abr 25 22:38:01 hydra kernel: Hardware name: Dell Inc. Latitude 3540/02R0J9, BIOS A10 01/28/2015 Abr 25 22:38:01 hydra kernel: 0286 c84e716a 8801288bfd18 813eee83 Abr 25 22:38:01 hydra kernel: 8801288bfd58 810827cb Abr 25 22:38:01 hydra kernel: 242d3bead680 8800acddbe40 8800acddbe40 8800354f9000 Abr 25 22:38:01 hydra kernel: Call Trace: Abr 25 22:38:01 hydra kernel: [] dump_stack+0x63/0x90 Abr 25 22:38:01 hydra kernel: [] __warn+0xcb/0xf0 Abr 25 22:38:01 hydra kernel: [] warn_slowpath_null+0x1d/0x20 Abr 25 22:38:01 hydra kernel: [] btrfs_destroy_inode+0x247/0x2c0 [btrfs] Abr 25 22:38:01 hydra kernel: [] destroy_inode+0x3b/0x60 Abr 25 22:38:01 hydra kernel: [] evict+0x136/0x1a0 Abr 25 22:38:01 hydra kernel: [] iput+0x1ba/0x240 Abr 25 22:38:01 hydra kernel: [] __dentry_kill+0x18d/0x1e0 Abr 25 22:38:01 hydra kernel: [] dput+0x12b/0x220 Abr 25 22:38:01 hydra kernel: [] __fput+0x18b/0x230 Abr 25 22:38:01 hydra kernel: [] fput+0xe/0x10 Abr 25 22:38:01 hydra kernel: [] task_work_run+0x73/0x90 Abr 25 22:38:01 hydra kernel: [] exit_to_usermode_loop+0xc2/0xd0 Abr 25 22:38:01 hydra kernel: [] syscall_return_slowpath+0x4e/0x60 Abr 25 22:38:01 hydra kernel: [] entry_SYSCALL_64_fastpath+0xa6/0xa8 Abr 25 22:38:01 hydra kernel: ---[ end trace 7071159cbaf5ff25 ]--- two questions: 1 - is this harmless? i mean, its just a warning or i can get some data loss? 2 - is anyone looking at this yet? best | Paulo Dias | paulo.miguel.d...@gmail.com Tempora mutantur, nos et mutamur in illis. On Wed, Apr 6, 2016 at 9:26 AM, Filipe Manana wrote: > On Wed, Apr 6, 2016 at 4:46 AM, Bastien Philbert > wrote: >> Greetings All, >> After some tracing I am not certain if this is correct due to being newer to >> the btrfs >> codebase. However if someone more experience can show me if I am missing >> something in >> my traces please let me known:) >> Firstly here is the bug trace or the part that matters: >> [ 7195.792492] [ cut here ] >> [ 7195.792532] WARNING: CPU: 0 PID: 5352 at >> /home/kernel/COD/linux/fs/btrfs/inode.c:9261 btrfs_destroy_inode+0x247/0x2c0 >> [btrfs] >> [ 7195.792535] Modules linked in: bnep binfmt_misc intel_rapl >> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel samsung_laptop kvm >> irqbypass crct10dif_pclmul crc32_pclmul btusb ghash_clmulni_intel btrtl >> btbcm btintel cryptd snd_hda_codec_hdmi uvcvideo bluetooth >> snd_hda_codec_realtek videobuf2_vmalloc snd_hda_codec_generic >> videobuf2_memops arc4 videobuf2_v4l2 snd_hda_intel input_leds videobuf2_core >> snd_hda_codec joydev snd_hda_core iwldvm serio_raw snd_hwdep videodev >> snd_pcm mac80211 media snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq >> snd_seq_device iwlwifi snd_timer cfg80211 snd lpc_ich mei_me soundcore >> shpchp mei dell_smo8800 mac_hid parport_pc ppdev lp parport autofs4 btrfs >> xor raid6_pq hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper >> syscopyarea sysfillrect psmouse sysimgblt fb_sys_fops >> [ 7195.792593] drm r8169 ahci libahci mii wmi video fjes >> [ 7195.792602] CPU: 0 PID: 5352 Comm: aptitude Not tainted >> 4.6.0-040600rc1-generic #201603261930 >> [ 7195.792604] Hardware name: SAMSUNG ELECTRONICS CO., LTD. >> 530U3C/530U4C/SAMSUNG_NP1234567890, BIO
Re: [PATCH v8 19/27] btrfs: try more times to alloc metadata reserve space
Josef Bacik wrote on 2016/04/25 10:05 -0400: On 04/24/2016 08:54 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:06 -0400: On 03/21/2016 09:35 PM, Qu Wenruo wrote: From: Wang Xiaoguang In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. When reserve_metadata_bytes() fails to reserve desited metadata space, it has already done some reclaim work, such as write ordered extents. In that case, outstanding_extents and reserved_extents may already changed, and we may reserve enough metadata space then. So this patch will try to call reserve_metadata_bytes() at most 3 times to ensure we really run out of space. Such false ENOSPC is mainly caused by small file extents and time consuming delalloc functions, which mainly affects in-band de-duplication. (Compress should also be affected, but LZO/zlib is faster than SHA256, so still harder to trigger than dedupe). Signed-off-by: Wang Xiaoguang --- fs/btrfs/extent-tree.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index dabd721..016d2ec 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2421,7 +2421,7 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, * a new extent is revered, then deleted * in one tran, and inc/dec get merged to 0. * - * In this case, we need to remove its dedup + * In this case, we need to remove its dedupe * hash. */ btrfs_dedupe_del(trans, fs_info, node->bytenr); @@ -5675,6 +5675,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) bool delalloc_lock = true; u64 to_free = 0; unsigned dropped; +int loops = 0; /* If we are a free space inode we need to not flush since we will be in * the middle of a transaction commit. We also don't need the delalloc @@ -5690,11 +5691,12 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) btrfs_transaction_in_commit(root->fs_info)) schedule_timeout(1); +num_bytes = ALIGN(num_bytes, root->sectorsize); + +again: if (delalloc_lock) mutex_lock(&BTRFS_I(inode)->delalloc_mutex); -num_bytes = ALIGN(num_bytes, root->sectorsize); - spin_lock(&BTRFS_I(inode)->lock); nr_extents = (unsigned)div64_u64(num_bytes + BTRFS_MAX_EXTENT_SIZE - 1, @@ -5815,6 +5817,23 @@ out_fail: } if (delalloc_lock) mutex_unlock(&BTRFS_I(inode)->delalloc_mutex); +/* + * The number of metadata bytes is calculated by the difference + * between outstanding_extents and reserved_extents. Sometimes though + * reserve_metadata_bytes() fails to reserve the wanted metadata bytes, + * indeed it has already done some work to reclaim metadata space, hence + * both outstanding_extents and reserved_extents would have changed and + * the bytes we try to reserve would also has changed(may be smaller). + * So here we try to reserve again. This is much useful for online + * dedupe, which will easily eat almost all meta space. + * + * XXX: Indeed here 3 is arbitrarily choosed, it's a good workaround for + * online dedupe, later we should find a better method to avoid dedupe + * enospc issue. + */ +if (unlikely(ret == -ENOSPC && loops++ < 3)) +goto again; + return ret; } NAK, we aren't going to just arbitrarily retry to make our metadata reservation. Dropping reserved metadata space by completing ordered extents should free enough to make our current reservation, and in fact this only accounts for the disparity, so should be an accurate count most of the time. I can see a case for detecting that the disparity no longer exists and retrying in that case (we free enough ordered extents that we are no longer trying to reserve ours + overflow but now only ours) and retry in _that specific case_, but we need to limit it to this case only. Thanks, Would it be OK to retry only for dedupe enabled case? Currently it's only a workaround and we are still digging the root cause, but for a workaround, I assume it is good enough though for dedupe enabled case. No we're not going to leave things in a known broken state to come back to later, that just makes it so we forget stuff and it sits there forever. Thanks, Josef OK, We'll investigate it and find the best fix. BTW, we also found extent-tree.c is using the same 3 loops code: (and that's why we choose the same method) -- loops = 0; while (delalloc_bytes && loops < 3) { max_reclaim = min(delalloc_bytes, to_reclaim); nr_pages = max_reclaim >> PAGE_CACHE_SHIFT; btrfs_writeback_inod
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
Josef Bacik wrote on 2016/04/25 10:24 -0400: On 04/24/2016 08:56 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: +/* + * Force parent root to be updated, as we recorded it before so its + * last_trans == cur_transid. + * Or it won't be committed again onto disk after later + * insert_dir_item() + */ +if (!ret) +record_root_in_trans(trans, parent, 1); +return ret; +} NACK, holy shit we aren't adding a special transaction commit only for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. I'm less concerned about committing another transaction and more concerned about the fact that it is an special variant of the transaction commit. If this goes wrong, or at some point in the future we fail to update it along with btrfs_transaction_commit we suddenly are corrupting metadata. If we have to commit a transaction then call btrfs_commit_transaction(), don't open code a stripped down version, here be dragons. Thanks, Josef Yes, I also don't like the dirty hack. Although the problem is, we have no other good choice. If we can call commit_transaction() that's the best case, but the problem is, in create_pending_snapshots(), we are already inside commit_transaction(). Or commit_transaction() can be called inside commit_transaction()? No, figure out a different way. IIRC I dealt with this with the no_quota flag for inc_ref/dec_ref since the copy root stuff does strange things with the reference counts, but all this code is gone now. I looked around to see if I could figure out how the refs are ending up this way but it doesn't make sense to me and there isn't enough information in your changelog for me to be able to figure it out. You've created this mess, clean it up without making it messier. Thanks, Josef Unfortunately, your original no_quota flag just hide the bug, and hide it in a bad method. Originally, no_quota flag is used for case like this, to skip quota at snapshot creation, and use quota_inherit() to hack the quota accounting. It seems work, but in fact, if the DIR_ITEM insert need to create a new cousin leaf, then quota is messed up. Your quota rework doesn't really help, as it won't even accounting things well, just check fstest/btrfs/091 on 4.1 kernel. The only perfect fix for this already nasty subvolume creation is to do full subtree rescan. Or no one knows when higher qgroups will be broken. If you think splitting commit_transaction into two variants can cause problem, I can merge this two variants into one. As in btrfs_commit_transaction() the commit process is much the same as the one used in create_pending_snapshot(). If there is only one __commit_roots() to do such commit, then there is nothing special only for quota. Thanks, Qu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs: use dynamic allocation for root item in create_subvol
On 2016/04/25 20:18, David Sterba wrote: > The size of root item is more than 400 bytes, which is quite a lot of > stack space. As we do IO from inside the subvolume ioctls, we should > keep the stack usage low in case the filesystem is on top of other > layers (NFS, device mapper, iscsi, etc). > > Signed-off-by: David Sterba Looks good to me. Reviewed-by: Tsutomu Itoh > --- > fs/btrfs/ioctl.c | 65 > > 1 file changed, 37 insertions(+), 28 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index 053e677839fe..9a63fe07bc2e 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -439,7 +439,7 @@ static noinline int create_subvol(struct inode *dir, > { > struct btrfs_trans_handle *trans; > struct btrfs_key key; > - struct btrfs_root_item root_item; > + struct btrfs_root_item *root_item; > struct btrfs_inode_item *inode_item; > struct extent_buffer *leaf; > struct btrfs_root *root = BTRFS_I(dir)->root; > @@ -455,16 +455,22 @@ static noinline int create_subvol(struct inode *dir, > u64 qgroup_reserved; > uuid_le new_uuid; > > + root_item = kzalloc(sizeof(*root_item), GFP_KERNEL); > + if (!root_item) > + return -ENOMEM; > + > ret = btrfs_find_free_objectid(root->fs_info->tree_root, &objectid); > if (ret) > - return ret; > + goto fail_free; > > /* >* Don't create subvolume whose level is not zero. Or qgroup will be >* screwed up since it assume subvolme qgroup's level to be 0. >*/ > - if (btrfs_qgroup_level(objectid)) > - return -ENOSPC; > + if (btrfs_qgroup_level(objectid)) { > + ret = -ENOSPC; > + goto fail_free; > + } > > btrfs_init_block_rsv(&block_rsv, BTRFS_BLOCK_RSV_TEMP); > /* > @@ -474,14 +480,14 @@ static noinline int create_subvol(struct inode *dir, > ret = btrfs_subvolume_reserve_metadata(root, &block_rsv, > 8, &qgroup_reserved, false); > if (ret) > - return ret; > + goto fail_free; > > trans = btrfs_start_transaction(root, 0); > if (IS_ERR(trans)) { > ret = PTR_ERR(trans); > btrfs_subvolume_release_metadata(root, &block_rsv, >qgroup_reserved); > - return ret; > + goto fail_free; > } > trans->block_rsv = &block_rsv; > trans->bytes_reserved = block_rsv.size; > @@ -509,47 +515,45 @@ static noinline int create_subvol(struct inode *dir, > BTRFS_UUID_SIZE); > btrfs_mark_buffer_dirty(leaf); > > - memset(&root_item, 0, sizeof(root_item)); > - > - inode_item = &root_item.inode; > + inode_item = &root_item->inode; > btrfs_set_stack_inode_generation(inode_item, 1); > btrfs_set_stack_inode_size(inode_item, 3); > btrfs_set_stack_inode_nlink(inode_item, 1); > btrfs_set_stack_inode_nbytes(inode_item, root->nodesize); > btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755); > > - btrfs_set_root_flags(&root_item, 0); > - btrfs_set_root_limit(&root_item, 0); > + btrfs_set_root_flags(root_item, 0); > + btrfs_set_root_limit(root_item, 0); > btrfs_set_stack_inode_flags(inode_item, BTRFS_INODE_ROOT_ITEM_INIT); > > - btrfs_set_root_bytenr(&root_item, leaf->start); > - btrfs_set_root_generation(&root_item, trans->transid); > - btrfs_set_root_level(&root_item, 0); > - btrfs_set_root_refs(&root_item, 1); > - btrfs_set_root_used(&root_item, leaf->len); > - btrfs_set_root_last_snapshot(&root_item, 0); > + btrfs_set_root_bytenr(root_item, leaf->start); > + btrfs_set_root_generation(root_item, trans->transid); > + btrfs_set_root_level(root_item, 0); > + btrfs_set_root_refs(root_item, 1); > + btrfs_set_root_used(root_item, leaf->len); > + btrfs_set_root_last_snapshot(root_item, 0); > > - btrfs_set_root_generation_v2(&root_item, > - btrfs_root_generation(&root_item)); > + btrfs_set_root_generation_v2(root_item, > + btrfs_root_generation(root_item)); > uuid_le_gen(&new_uuid); > - memcpy(root_item.uuid, new_uuid.b, BTRFS_UUID_SIZE); > - btrfs_set_stack_timespec_sec(&root_item.otime, cur_time.tv_sec); > - btrfs_set_stack_timespec_nsec(&root_item.otime, cur_time.tv_nsec); > - root_item.ctime = root_item.otime; > - btrfs_set_root_ctransid(&root_item, trans->transid); > - btrfs_set_root_otransid(&root_item, trans->transid); > + memcpy(root_item->uuid, new_uuid.b, BTRFS_UUID_SIZE); > + btrfs_set_stack_timespec_sec(&root_item->otime, cur_time.tv_sec); > + btrfs_set_stack_timespec_nsec(&root_item->otime, cur_time.tv_nsec); > + root_item->ctime = root_item->otime; > + btrfs_set_root_ctransi
Re: Install to or Recover RAID Array Subvolume Root?
On 22 April 2016 at 06:44, David Alcorn wrote: > > First, I verified that while the Debian Installer will install to a > pre set default BTRFS RAID6 subvolume, the Grub install step fails. > The alternative to restore installation to a RAID6 subvolume requires > installation to a non RAID6 subvolume and then send|receive the > snapshotted installation to the array. To prepare for this attempt, I > reinstalled BTRFS (Debian stable) to a flash drive using separate > partitions for efi, /boot/ and / (in a subvolume). The default > subvolume was set to 5 for both the flash / partition and also the > RAID6 array. I used a separate /boot partition to reduce complexity. > Both the kernel and btrfs tools were upgraded to 4.4. I soon > thereafter got lost. 1. Have you partially filled your RAID6 array? If so, do you have current backups for everything you care about? 2. Please indicate whether you prefer to mount by LABEL, UUID, or /dev 3. If it's by /dev, please send the output of: parted -l 4. If it's by LABEL or UUID, please also send the output of: blkid Sincerely, Nicholas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] btrfs: refactor btrfs_dev_replace_start for reuse
On Thu, Mar 24, 2016 at 06:48:14PM +0800, Anand Jain wrote: > A refactor patch, and avoids user input verification in the > btrfs_dev_replace_start(), and so this function can be reused. > > Signed-off-by: Anand Jain Added on top of the delete-by-id patchset as there's a dependency, plus the 1/3 patch "btrfs: use fs_info directly". -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] btrfs: keep sysfs target add in the last
On Thu, Mar 24, 2016 at 06:48:13PM +0800, Anand Jain wrote: > Sysfs create context should come in the last, so that we > don't have to undo sysfs operation for the reason that any > other operation has failed. Moving the sysfs call will make a visible change: in the old code, the sysfs node exists during the whole replace process, while in the new code it appears only after it finishes. While this is not necessarily a problem, I'd like to check that his is an intended change, as it's not mentioned in the changelog. Besides, the sysfs node seems to be added unconditionally, so if the scrub is running in parallel (checked a few lines above the new code), we'll happily add the target device although no replace happened. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 00/13] Introduce device state 'failed', spare device and auto replace
On Mon, Apr 18, 2016 at 07:31:31PM +0800, Anand Jain wrote: > Thanks for various comments, tests and feedback. Seems working good for me. -- Yauhen Kharuzhy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: cleanup assigning next active device with a check
On Mon, Apr 18, 2016 at 07:25:52PM +0800, Anand Jain wrote: > Creates helper fucntion as needed by the device delete > and replace operations. Also now it checks if the next > device being assigned is an active device. > > Signed-off-by: Anand Jain > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -1684,10 +1684,40 @@ out: > return ret; > } > > +struct btrfs_device *btrfs_find_next_active_device(struct btrfs_fs_devices > *fs_devs, > + struct btrfs_device *device) > + > +void btrfs_assign_next_active_device(struct btrfs_fs_info *fs_info, > + struct btrfs_device *device, struct btrfs_device *this_dev) Please add comments what the functions do so that one does not need to read the whole function to figure it out. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix lock dep warning, move scratch dev out of device_list_mutex and uuid_mutex
On Mon, Apr 18, 2016 at 04:51:23PM +0800, Anand Jain wrote: > When the replace target fails, the target device will be taken > out of fs device list, scratch + update_dev_time and freed. However > we could do the scratch + update_dev_time and free part after the > device has been taken out of device list, so that we don't have to > hold the device_list_mutex and uuid_mutex locks. > > Reported issue: [...] > > Signed-off-by: Anand Jain > Reported-by: Yauhen Kharuzhy Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: Restrict e2fsprogs version for new convert
On Mon, Apr 18, 2016 at 09:20:18AM +0800, Qu Wenruo wrote: > > > David Sterba wrote on 2016/04/15 13:17 +0200: > > On Thu, Apr 14, 2016 at 02:24:34PM +0800, Qu Wenruo wrote: > >> New btrfs-convert is using a lot of new macro in e2fsprogs 1.42. > >> Unfortunately the new compatible layer for older e2fsprogs is still > >> under development. > > > > It hasn't been released yet so it's not really a big problem, although > > it makes testing on my side a bit harder. The configure-time check > > should be 1.41 and until it's fixed we can print a warning. > > > > > Did I missed something? > > I checkout 1.41.14 and it shows no cluster support in ext2fs.h. > > Also git describe shows it's v1.41.14-36-g1da5ef7, after the last v1.41 > version. > > So I think the check should be 1.42, just as the patch. The idea is to keep lowest supported version 1.41, because this version can be commonly found on enterprise distros. The lack of cluster is expected and needs to be dealt with both build- and run-time. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] btrfs: qgroup: Fix qgroup accounting when creating snapshot
On 04/24/2016 08:56 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:23 -0400: On 04/22/2016 02:21 PM, Mark Fasheh wrote: On Fri, Apr 22, 2016 at 02:12:11PM -0400, Josef Bacik wrote: On 04/15/2016 05:08 AM, Qu Wenruo wrote: +/* + * Force parent root to be updated, as we recorded it before so its + * last_trans == cur_transid. + * Or it won't be committed again onto disk after later + * insert_dir_item() + */ +if (!ret) +record_root_in_trans(trans, parent, 1); +return ret; +} NACK, holy shit we aren't adding a special transaction commit only for qgroup snapshots. Figure out a different way. Thanks, Yeah I saw that. To be fair, we run a whole lot of the transaction stuff multiple times (at least from my reading) so I'm really unclear on what the performance impact is. Do you have any suggestion though? We've been banging our heads against this for a while now and as slow as this patch might be, it actually works where nothing else has so far. I'm less concerned about committing another transaction and more concerned about the fact that it is an special variant of the transaction commit. If this goes wrong, or at some point in the future we fail to update it along with btrfs_transaction_commit we suddenly are corrupting metadata. If we have to commit a transaction then call btrfs_commit_transaction(), don't open code a stripped down version, here be dragons. Thanks, Josef Yes, I also don't like the dirty hack. Although the problem is, we have no other good choice. If we can call commit_transaction() that's the best case, but the problem is, in create_pending_snapshots(), we are already inside commit_transaction(). Or commit_transaction() can be called inside commit_transaction()? No, figure out a different way. IIRC I dealt with this with the no_quota flag for inc_ref/dec_ref since the copy root stuff does strange things with the reference counts, but all this code is gone now. I looked around to see if I could figure out how the refs are ending up this way but it doesn't make sense to me and there isn't enough information in your changelog for me to be able to figure it out. You've created this mess, clean it up without making it messier. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 19/27] btrfs: try more times to alloc metadata reserve space
On 04/24/2016 08:54 PM, Qu Wenruo wrote: Josef Bacik wrote on 2016/04/22 14:06 -0400: On 03/21/2016 09:35 PM, Qu Wenruo wrote: From: Wang Xiaoguang In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try to reserve is calculated by the difference between outstanding_extents and reserved_extents. When reserve_metadata_bytes() fails to reserve desited metadata space, it has already done some reclaim work, such as write ordered extents. In that case, outstanding_extents and reserved_extents may already changed, and we may reserve enough metadata space then. So this patch will try to call reserve_metadata_bytes() at most 3 times to ensure we really run out of space. Such false ENOSPC is mainly caused by small file extents and time consuming delalloc functions, which mainly affects in-band de-duplication. (Compress should also be affected, but LZO/zlib is faster than SHA256, so still harder to trigger than dedupe). Signed-off-by: Wang Xiaoguang --- fs/btrfs/extent-tree.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index dabd721..016d2ec 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2421,7 +2421,7 @@ static int run_one_delayed_ref(struct btrfs_trans_handle *trans, * a new extent is revered, then deleted * in one tran, and inc/dec get merged to 0. * - * In this case, we need to remove its dedup + * In this case, we need to remove its dedupe * hash. */ btrfs_dedupe_del(trans, fs_info, node->bytenr); @@ -5675,6 +5675,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) bool delalloc_lock = true; u64 to_free = 0; unsigned dropped; +int loops = 0; /* If we are a free space inode we need to not flush since we will be in * the middle of a transaction commit. We also don't need the delalloc @@ -5690,11 +5691,12 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes) btrfs_transaction_in_commit(root->fs_info)) schedule_timeout(1); +num_bytes = ALIGN(num_bytes, root->sectorsize); + +again: if (delalloc_lock) mutex_lock(&BTRFS_I(inode)->delalloc_mutex); -num_bytes = ALIGN(num_bytes, root->sectorsize); - spin_lock(&BTRFS_I(inode)->lock); nr_extents = (unsigned)div64_u64(num_bytes + BTRFS_MAX_EXTENT_SIZE - 1, @@ -5815,6 +5817,23 @@ out_fail: } if (delalloc_lock) mutex_unlock(&BTRFS_I(inode)->delalloc_mutex); +/* + * The number of metadata bytes is calculated by the difference + * between outstanding_extents and reserved_extents. Sometimes though + * reserve_metadata_bytes() fails to reserve the wanted metadata bytes, + * indeed it has already done some work to reclaim metadata space, hence + * both outstanding_extents and reserved_extents would have changed and + * the bytes we try to reserve would also has changed(may be smaller). + * So here we try to reserve again. This is much useful for online + * dedupe, which will easily eat almost all meta space. + * + * XXX: Indeed here 3 is arbitrarily choosed, it's a good workaround for + * online dedupe, later we should find a better method to avoid dedupe + * enospc issue. + */ +if (unlikely(ret == -ENOSPC && loops++ < 3)) +goto again; + return ret; } NAK, we aren't going to just arbitrarily retry to make our metadata reservation. Dropping reserved metadata space by completing ordered extents should free enough to make our current reservation, and in fact this only accounts for the disparity, so should be an accurate count most of the time. I can see a case for detecting that the disparity no longer exists and retrying in that case (we free enough ordered extents that we are no longer trying to reserve ours + overflow but now only ours) and retry in _that specific case_, but we need to limit it to this case only. Thanks, Would it be OK to retry only for dedupe enabled case? Currently it's only a workaround and we are still digging the root cause, but for a workaround, I assume it is good enough though for dedupe enabled case. No we're not going to leave things in a known broken state to come back to later, that just makes it so we forget stuff and it sits there forever. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: empty disk reports full
El Viernes, 1 de abril de 2016 10:05:07 Hugo Mills escribió: > On Fri, Apr 01, 2016 at 11:50:50AM +0200, Alejandro Vargas wrote: > > I am using a 2Tb disk for incremental backups. > > > > I use rsync for backing up to a subvolume, and each day I creates an > > snapshot of the lastest snapshot and do rsync in this. > > > > When the disk becomes nearly full (100Gb or less available) I deletes the > > oldest subvolume (withbtrfs subvolume delete). > > > > My problem is that *even removing ALL the subvolumes*, the free space does > > not change. It continues reporting the same size (disk is nearly full). > > > > I tried "btrfs balance start /mnt/backup" but it takes hours and hours. > > > > I'm using linux 4.1.15 > > btrfs-progs v4.1.2 > >Can you show us the output of both "sudo btrfs fi show" and "btrfs > fi df /mnt/backup", please? Before deleting subvolumes: [root@backups ~]# df /mnt/backup S.ficheros Tamaño Usados Disp Uso% Montado en /dev/sdb11,9T 1,9T 5,0M 100% /mnt/backup [root@backups ~]# ls -l /mnt/backup total 0 drwxr-xr-x 1 root root 86 mar 20 16:23 back20160318/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160328/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160330/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160401/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160404/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160406/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160408/ [root@backups ~]# btrfs fi show Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 Total devices 1 FS bytes used 1.80TiB devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 btrfs-progs v4.1.2 [root@backups ~]# btrfs fi df /mnt/backup Data, single: total=1.79TiB, used=1.79TiB System, DUP: total=32.00MiB, used=240.00KiB Metadata, DUP: total=17.00GiB, used=15.83GiB GlobalReserve, single: total=512.00MiB, used=0.00B Now I remove the oldest subvolume: [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160318/ Delete subvolume (no-commit): '/mnt/backup/back20160318' [root@backups ~]# df /mnt/backup S.ficheros Tamaño Usados Disp Uso% Montado en /dev/sdb11,9T 1,9T 22M 100% /mnt/backup [root@backups ~]# btrfs fi show Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 Total devices 1 FS bytes used 1.80TiB devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 [root@backups ~]# btrfs fi show Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 Total devices 1 FS bytes used 1.80TiB devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 btrfs-progs v4.1.2 [root@backups ~]# btrfs fi df /mnt/backup Data, single: total=1.79TiB, used=1.79TiB System, DUP: total=32.00MiB, used=240.00KiB Metadata, DUP: total=17.00GiB, used=15.83GiB GlobalReserve, single: total=512.00MiB, used=102.53MiB Now I remove 2 more subvolumes: [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160328/ Delete subvolume (no-commit): '/mnt/backup/back20160328' [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160330/ Delete subvolume (no-commit): '/mnt/backup/back20160330' [root@backups ~]# df /mnt/backup/ S.ficheros Tamaño Usados Disp Uso% Montado en /dev/sdb11,9T 1,9T 348M 100% /mnt/backup [root@backups ~]# btrfs fi show Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 Total devices 1 FS bytes used 1.80TiB devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 btrfs-progs v4.1.2 Data, single: total=1.79TiB, used=1.79TiB System, DUP: total=32.00MiB, used=240.00KiB Metadata, DUP: total=17.00GiB, used=15.83GiB GlobalReserve, single: total=512.00MiB, used=98.94MiB [root@backups ~]# ls -l /mnt/backup/ total 0 drwxr-xr-x 1 root root 86 mar 20 16:23 back20160401/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160404/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160406/ drwxr-xr-x 1 root root 86 mar 20 16:23 back20160408/ Now I will remove the resting subvolumes [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160401/ Delete subvolume (no-commit): '/mnt/backup/back20160401' [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160404/ Delete subvolume (no-commit): '/mnt/backup/back20160404' [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160406/ Delete subvolume (no-commit): '/mnt/backup/back20160406' [root@backups ~]# btrfs subvolume delete /mnt/backup/back20160408/ Delete subvolume (no-commit): '/mnt/backup/back20160408' [root@backups ~]# ls -l /mnt/backup/ total 0 [root@backups ~]# df /mnt/backup/ S.ficheros Tamaño Usados Disp Uso% Montado en /dev/sdb11,9T 1,9T 4,6G 100% /mnt/backup [root@backups ~]# btrfs fi show Label: 'disco_backup' uuid: cbfe8735-9f53-46f5-be7e-40f6a61a5506 Total devices 1 FS bytes used 1.80TiB devid1 size 1.82TiB used 1.82TiB path /dev/sdb1 btrfs-progs v4.1.2 [root@backups ~]# btrfs fi df /mnt/backup Data, single: total=1.79TiB, used=1.78TiB System, DUP: total=32.00MiB,
[PATCH] Btrfs: use root when checking need_async_flush
Instead of doing fs_info->fs_root in need_async_flush, which may not be set during recovery when mounting, just pass the root itself in, which makes more sense as thats what btrfs_calc_reclaim_metadata_size takes. Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index f23f426..e760cf7 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4872,7 +4872,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_root *root, } static inline int need_do_async_reclaim(struct btrfs_space_info *space_info, - struct btrfs_fs_info *fs_info, u64 used) + struct btrfs_root *root, u64 used) { u64 thresh = div_factor_fine(space_info->total_bytes, 98); @@ -4880,11 +4880,12 @@ static inline int need_do_async_reclaim(struct btrfs_space_info *space_info, if ((space_info->bytes_used + space_info->bytes_reserved) >= thresh) return 0; - if (!btrfs_calc_reclaim_metadata_size(fs_info->fs_root, space_info)) + if (!btrfs_calc_reclaim_metadata_size(root, space_info)) return 0; - return (used >= thresh && !btrfs_fs_closing(fs_info) && - !test_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state)); + return (used >= thresh && !btrfs_fs_closing(root->fs_info) && + !test_bit(BTRFS_FS_STATE_REMOUNTING, + &root->fs_info->fs_state)); } static void wake_all_tickets(struct list_head *head) @@ -5129,7 +5130,7 @@ static int __reserve_metadata_bytes(struct btrfs_root *root, * the async reclaim as we will panic. */ if (!root->fs_info->log_root_recovering && - need_do_async_reclaim(space_info, root->fs_info, used) && + need_do_async_reclaim(space_info, root, used) && !work_busy(&root->fs_info->async_reclaim_work)) { trace_btrfs_trigger_flush(root->fs_info, space_info->flags, -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: primary location of btrfs-progs changelog: The wiki?
On Mon, Apr 25, 2016 at 08:06:47AM -0400, Nicholas D Steeves wrote: > On 25 April 2016 at 07:36, David Sterba wrote: > > The conversion looks relatively ok, indentation could be 2 spaces and > > all bullet lists with '*'. Thanks. > > Done. I also added one line before each new version. I've attached > it, since it's just one file; however, if you prefer I can clone your > repo on github and submit it that way. Thanks. I had a look how a changes file is usually formatted and made further changes: fixed ordering of the minor releases, added exact dates of release and un-indented in a few more places, plus some minor fixes. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add device while rebalancing
On 2016-04-25 08:43, Duncan wrote: Austin S. Hemmelgarn posted on Mon, 25 Apr 2016 07:18:10 -0400 as excerpted: On 2016-04-23 01:38, Duncan wrote: And again with snapshotting operations. Making a snapshot is normally nearly instantaneous, but there's a scaling issue if you have too many per filesystem (try to keep it under 2000 snapshots per filesystem total, if possible, and definitely keep it under 10K or some operations will slow down substantially), and deleting snapshots is more work, so while you should ordinarily automatically thin down snapshots if you're automatically making them quite frequently (say daily or more frequently), you may want to put the snapshot deletion, at least, on hold while you scrub or balance or device delete or replace. I would actually recommend putting all snapshot operations on hold, as well as most writes to the filesystem, while doing a balance or device deletion. The more writes you have while doing those, the longer they take, and the less likely that you end up with a good on-disk layout of the data. The thing with snapshot writing is that all snapshot creation effectively does is a bit of metadata writing. What snapshots primarily do is lock existing extents in place (down within their chunk, with the higher chunk level being the scope at which balance works), that would otherwise be COWed elsewhere with the existing extent deleted on change, or simply deleted on on file delete. A snapshot simply adds a reference to the current version, so that deletion, either directly or from the COW, never happens, and to do that simply requires a relatively small metadata write. Unless I'm mistaken about the internals of BTRFS (which might be the case), creating a snapshot has to update reference counts on every single extent in every single file in the snapshot. For something small this isn't much, but if you are snapshotting something big (say, snapshotting an entire system with all the data in one subvolume), it can amount to multiple MB of writes, and it gets even worse if you have no shared extents to begin with (which is still pretty typical). On some of the systems I work with at work, snapshotting a terabyte of data can end up resulting in 10-20 MB of writes to disk (in this case, that figure came from a partition containing mostly small files that were just big enough that they didn't fit in-line in the metadata blocks). This is of course still significantly faster than copying everything, but it's not free either. So while I agree in general that more writes means balances taking longer, snapshot creation writes are pretty tiny in the scheme of things, and won't affect the balance much, compared to larger writes you'll very possibly still be doing unless you really do suspend pretty much all write operations to that filesystem during the balance. In general, yes, except that there's the case of running with mostly full metadata chunks, where it might result in a further chunk allocation, which in turn can throw off the balanced layout. Balance always allocates new chunks, and doesn't write into existing ones, so if you're writing enough to allocate a new chunk while a balance is happening: 1. That chunk may or may not get considered by the balance code (I'm not 100% certain about this, but I believe it will be ignored by any balance running at the time it gets allocated). 2. You run the risk of ending up with a chunk with almost nothing in it which could be packed into another existing chunk. Snapshots are not likely to trigger this, but it is still possible, especially if you're taking lots of snapshots in a short period of time. But as I said, snapshot deletions are an entirely different story, as then all those previously locked in place extents are potentially freed, and the filesystem must do a lot of work to figure out which ones it can actually free and free them, vs. ones that still have other references which therefore cannot yet be freed. Most of the issue here with balance is that you end up potentially doing an amount of unnecessary work which is unquantifiable before it's done. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] btrfs: send: silence an integer overflow warning
On Wed, Apr 13, 2016 at 09:40:59AM +0300, Dan Carpenter wrote: > The "sizeof(*arg->clone_sources) * arg->clone_sources_count" expression > can overflow. It causes several static checker warnings. It's all > under CAP_SYS_ADMIN so it's not that serious but lets silence the > warnings. > > Signed-off-by: Dan Carpenter Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Btrfs fails desatrerous on fuzzy tests
On Tue, Apr 12, 2016 at 04:24:32PM +0200, Juergen Sauer wrote: > Hi! > do you know this paper ? > > http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016.pdf Yes. There were several bugreports resulting from the fuzzing, all fixed in 4.5 and IIRC all of them happen during mount. Thus the awkwardly low amount of time to trigger the bugs. The fuzzing suite is not yet released and instrumenting all the code is not all trivial, but the Oracle guys promised to do a release but at least we have the generated images in the btrfs-progs testsuite. I'm curious about this level of fuzzing as it can help to make the error handling more robust, but we'll be never able to completely defend against crafted images. For example we can detect a missing extent mapping when looking for it, but we cannot distinguish that from an existing but wrong mapping. That would be like doing a full filesystem integrity check all the time (because we cannot trust any data we read from disk). There are exceptions where there's enough information cached or available from other contexts, but overall too hard to fix. And this applies to all filesystem.s -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add device while rebalancing
Austin S. Hemmelgarn posted on Mon, 25 Apr 2016 07:18:10 -0400 as excerpted: > On 2016-04-23 01:38, Duncan wrote: >> >> And again with snapshotting operations. Making a snapshot is normally >> nearly instantaneous, but there's a scaling issue if you have too many >> per filesystem (try to keep it under 2000 snapshots per filesystem >> total, if possible, and definitely keep it under 10K or some operations >> will slow down substantially), and deleting snapshots is more work, so >> while you should ordinarily automatically thin down snapshots if you're >> automatically making them quite frequently (say daily or more >> frequently), you may want to put the snapshot deletion, at least, on >> hold while you scrub or balance or device delete or replace. > I would actually recommend putting all snapshot operations on hold, as > well as most writes to the filesystem, while doing a balance or device > deletion. The more writes you have while doing those, the longer they > take, and the less likely that you end up with a good on-disk layout of > the data. The thing with snapshot writing is that all snapshot creation effectively does is a bit of metadata writing. What snapshots primarily do is lock existing extents in place (down within their chunk, with the higher chunk level being the scope at which balance works), that would otherwise be COWed elsewhere with the existing extent deleted on change, or simply deleted on on file delete. A snapshot simply adds a reference to the current version, so that deletion, either directly or from the COW, never happens, and to do that simply requires a relatively small metadata write. So while I agree in general that more writes means balances taking longer, snapshot creation writes are pretty tiny in the scheme of things, and won't affect the balance much, compared to larger writes you'll very possibly still be doing unless you really do suspend pretty much all write operations to that filesystem during the balance. But as I said, snapshot deletions are an entirely different story, as then all those previously locked in place extents are potentially freed, and the filesystem must do a lot of work to figure out which ones it can actually free and free them, vs. ones that still have other references which therefore cannot yet be freed. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Switch to generic xattr handlers
On Fri, Apr 22, 2016 at 10:36:44PM +0200, Andreas Gruenbacher wrote: > The btrfs_{set,remove}xattr inode operations check for a read-only root > (btrfs_root_readonly) before calling into generic_{set,remove}xattr. If > this check is moved into __btrfs_setxattr, we can get rid of > btrfs_{set,remove}xattr. > > This patch applies to mainline, I would like to keep it together with > the other xattr cleanups if possible, though. Could you please review? > > Thanks, > Andreas > > > Signed-off-by: Andreas Gruenbacher Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] btrfs-progs: Fix return value bug of qgroups check
On Mon, Apr 18, 2016 at 10:27:07AM +0800, Qu Wenruo wrote: > Before this patch, although btrfsck will check qgroups if quota is > enabled, it always return 0 even qgroup numbers are corrupted. > > Fix it by allowing return value from report_qgroups function (formally > defined as print_qgroup_difference). > > Signed-off-by: Qu Wenruo All three applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: primary location of btrfs-progs changelog: The wiki?
On 25 April 2016 at 07:36, David Sterba wrote: > The conversion looks relatively ok, indentation could be 2 spaces and > all bullet lists with '*'. Thanks. Done. I also added one line before each new version. I've attached it, since it's just one file; however, if you prefer I can clone your repo on github and submit it that way. Cheers, Nick changelog.gz Description: GNU Zip compressed data
Re: [PATCH] btrfs-progs: prop: remove an unnecessary condition on parse_args
On Wed, Apr 20, 2016 at 03:32:48PM +0900, Satoru Takeuchi wrote: > >From commit c742debab11f ('btrfs-progs: fix a regression that > "property" with -t option doesn't work'), the number of arguments > is checked strictly. So the following condition never be > satisfied. > > Signed-off-by: Satoru Takeuchi Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] btrfs-progs: "device ready" accepts just one device
On Mon, Mar 14, 2016 at 01:05:15PM +0100, David Sterba wrote: > On Mon, Mar 14, 2016 at 09:27:22AM +0900, Satoru Takeuchi wrote: > > * actual result > > > > === > > # ./btrfs device ready /dev/sdb foo > > # > > === > > > > * expecting result > > > > === > > # ./btrfs device ready /dev/sdb foo > > btrfs device ready: too many arguments > > usage: btrfs device ready > > > > Check device to see if it has all of its devices in cache for mounting > > > > # > > === > > > > Signed-off-by: Satoru Takeuchi > > --- > > cmds-device.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/cmds-device.c b/cmds-device.c > > index 33da2ce..23656c3 100644 > > --- a/cmds-device.c > > +++ b/cmds-device.c > > @@ -326,7 +326,7 @@ static int cmd_device_ready(int argc, char **argv) > > > > clean_args_no_options(argc, argv, cmd_device_ready_usage); > > > > - if (check_argc_min(argc - optind, 1)) > > + if (check_argc_exact(argc - optind, 1)) > > This silently changes the semantics, so far it accepts multiple values > though it contradicts the documentation. I'm not yet sure how to resolve > that. More than one argument did not work before, so I think it's ok to expect just one device. Applied. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: primary location of btrfs-progs changelog: The wiki?
On Mon, Apr 25, 2016 at 07:24:27AM -0400, Nicholas D Steeves wrote: > On 25 April 2016 at 07:12, David Sterba wrote: > > On Fri, Apr 22, 2016 at 08:41:36PM -0400, Nicholas D Steeves wrote: > >> I'm just wondering where the primary location of the btrfs-progs > >> changelog is located > > > > At the moment it's the release announcement in this mailinglist, that > > gets copied to the wiki with some formatting adjustments. I'm willing to > > copy the announcement text to a file in git (and will do for the next > > release). But at the moment I won't add all the past changelogs so if > > anybody wants to do that I'l appreciate that. > > I'd be happy to. Are you looking for something like: > > curl https://btrfs.wiki.kernel.org/index.php/Changelog | html2text | > sed '0,/(announcement)/d;/By version (linux kernel)/Q' | gzip -9 > > changelog > > With some formatting adjustments? The conversion looks relatively ok, indentation could be 2 spaces and all bullet lists with '*'. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: primary location of btrfs-progs changelog: The wiki?
oops, that gzip -9 shouldn't be there :-/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: primary location of btrfs-progs changelog: The wiki?
On 25 April 2016 at 07:12, David Sterba wrote: > On Fri, Apr 22, 2016 at 08:41:36PM -0400, Nicholas D Steeves wrote: >> I'm just wondering where the primary location of the btrfs-progs >> changelog is located > > At the moment it's the release announcement in this mailinglist, that > gets copied to the wiki with some formatting adjustments. I'm willing to > copy the announcement text to a file in git (and will do for the next > release). But at the moment I won't add all the past changelogs so if > anybody wants to do that I'l appreciate that. I'd be happy to. Are you looking for something like: curl https://btrfs.wiki.kernel.org/index.php/Changelog | html2text | sed '0,/(announcement)/d;/By version (linux kernel)/Q' | gzip -9 > changelog With some formatting adjustments? Cheers, Nicholas -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add device while rebalancing
On 2016-04-23 01:38, Duncan wrote: Juan Alberto Cirez posted on Fri, 22 Apr 2016 14:36:44 -0600 as excerpted: Good morning, I am new to this list and to btrfs in general. I have a quick question: Can I add a new device to the pool while the btrfs filesystem balance command is running on the drive pool? Adding a device while balancing shouldn't be a problem. However, depending on your redundancy mode, you may wish to cancel the balance and start a new one after the device add, so the balance will take account of it as well and balance it into the mix. I'm not 100% certain about how balance will handle this, except that nothing should break. I believe that it picks a device each time it goes to move a chunk, so it should evaluate any chunks operated on after the addition of the device for possible placement on that device (and it will probably end up putting a lot of them there because that device will almost certainly be less full than any of the others). That said, you probably do want to cancel the balance, add the device, and re-run the balance so that things end up more evenly distributed. Note that while device add doesn't do more than that on its own, device delete/remove effectively initiates its own balance, moving the chunks on the device being removed to the other devices. So you wouldn't want to be running a balance and then do a device remove at the same time. IIRC, trying to delete a device while running a balance will fail, and return an error, because only one balance can be running at a given moment. Similarly with btrfs replace, altho in that case, it's more directly moving data from the device being replaced (if it's still there, or using redundancy or parity to recover it if not) to the replacement device, a more limited and often faster operation. But you probably still don't want to do a balance at the same time as it places unnecessary stress on both the filesystem and the hardware, and even if the filesystem and devices handle the stress fine, the result is going to be that both operations take longer as they're both intensive operations that will interfere with each other to some extent. Agreed, this is generally not a good idea because of the stress it puts on the devices (and because it probably isn't well tested). Similarly with btrfs scrub. The operations are logically different enough that they shouldn't really interfere with each other logically, but they're both hardware intensive operations that will put unnecessary stress on the system if you're doing more than one at a time, and will result in both going slower than they normally would. Actually, depending on a number of factors, scrubbing while balancing can actually finish faster than running one then the other in sequence. It's really dependent on how both decide to pick chunks, and how your underlying devices handle read and write caching, but it can happen. Most of the time though, it should take around the same amount of time as running one then the other, or a little bit longer if you're on traditional disks. And again with snapshotting operations. Making a snapshot is normally nearly instantaneous, but there's a scaling issue if you have too many per filesystem (try to keep it under 2000 snapshots per filesystem total, if possible, and definitely keep it under 10K or some operations will slow down substantially), and deleting snapshots is more work, so while you should ordinarily automatically thin down snapshots if you're automatically making them quite frequently (say daily or more frequently), you may want to put the snapshot deletion, at least, on hold while you scrub or balance or device delete or replace. I would actually recommend putting all snapshot operations on hold, as well as most writes to the filesystem, while doing a balance or device deletion. The more writes you have while doing those, the longer they take, and the less likely that you end up with a good on-disk layout of the data. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs: use dynamic allocation for root item in create_subvol
The size of root item is more than 400 bytes, which is quite a lot of stack space. As we do IO from inside the subvolume ioctls, we should keep the stack usage low in case the filesystem is on top of other layers (NFS, device mapper, iscsi, etc). Signed-off-by: David Sterba --- fs/btrfs/ioctl.c | 65 1 file changed, 37 insertions(+), 28 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 053e677839fe..9a63fe07bc2e 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -439,7 +439,7 @@ static noinline int create_subvol(struct inode *dir, { struct btrfs_trans_handle *trans; struct btrfs_key key; - struct btrfs_root_item root_item; + struct btrfs_root_item *root_item; struct btrfs_inode_item *inode_item; struct extent_buffer *leaf; struct btrfs_root *root = BTRFS_I(dir)->root; @@ -455,16 +455,22 @@ static noinline int create_subvol(struct inode *dir, u64 qgroup_reserved; uuid_le new_uuid; + root_item = kzalloc(sizeof(*root_item), GFP_KERNEL); + if (!root_item) + return -ENOMEM; + ret = btrfs_find_free_objectid(root->fs_info->tree_root, &objectid); if (ret) - return ret; + goto fail_free; /* * Don't create subvolume whose level is not zero. Or qgroup will be * screwed up since it assume subvolme qgroup's level to be 0. */ - if (btrfs_qgroup_level(objectid)) - return -ENOSPC; + if (btrfs_qgroup_level(objectid)) { + ret = -ENOSPC; + goto fail_free; + } btrfs_init_block_rsv(&block_rsv, BTRFS_BLOCK_RSV_TEMP); /* @@ -474,14 +480,14 @@ static noinline int create_subvol(struct inode *dir, ret = btrfs_subvolume_reserve_metadata(root, &block_rsv, 8, &qgroup_reserved, false); if (ret) - return ret; + goto fail_free; trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { ret = PTR_ERR(trans); btrfs_subvolume_release_metadata(root, &block_rsv, qgroup_reserved); - return ret; + goto fail_free; } trans->block_rsv = &block_rsv; trans->bytes_reserved = block_rsv.size; @@ -509,47 +515,45 @@ static noinline int create_subvol(struct inode *dir, BTRFS_UUID_SIZE); btrfs_mark_buffer_dirty(leaf); - memset(&root_item, 0, sizeof(root_item)); - - inode_item = &root_item.inode; + inode_item = &root_item->inode; btrfs_set_stack_inode_generation(inode_item, 1); btrfs_set_stack_inode_size(inode_item, 3); btrfs_set_stack_inode_nlink(inode_item, 1); btrfs_set_stack_inode_nbytes(inode_item, root->nodesize); btrfs_set_stack_inode_mode(inode_item, S_IFDIR | 0755); - btrfs_set_root_flags(&root_item, 0); - btrfs_set_root_limit(&root_item, 0); + btrfs_set_root_flags(root_item, 0); + btrfs_set_root_limit(root_item, 0); btrfs_set_stack_inode_flags(inode_item, BTRFS_INODE_ROOT_ITEM_INIT); - btrfs_set_root_bytenr(&root_item, leaf->start); - btrfs_set_root_generation(&root_item, trans->transid); - btrfs_set_root_level(&root_item, 0); - btrfs_set_root_refs(&root_item, 1); - btrfs_set_root_used(&root_item, leaf->len); - btrfs_set_root_last_snapshot(&root_item, 0); + btrfs_set_root_bytenr(root_item, leaf->start); + btrfs_set_root_generation(root_item, trans->transid); + btrfs_set_root_level(root_item, 0); + btrfs_set_root_refs(root_item, 1); + btrfs_set_root_used(root_item, leaf->len); + btrfs_set_root_last_snapshot(root_item, 0); - btrfs_set_root_generation_v2(&root_item, - btrfs_root_generation(&root_item)); + btrfs_set_root_generation_v2(root_item, + btrfs_root_generation(root_item)); uuid_le_gen(&new_uuid); - memcpy(root_item.uuid, new_uuid.b, BTRFS_UUID_SIZE); - btrfs_set_stack_timespec_sec(&root_item.otime, cur_time.tv_sec); - btrfs_set_stack_timespec_nsec(&root_item.otime, cur_time.tv_nsec); - root_item.ctime = root_item.otime; - btrfs_set_root_ctransid(&root_item, trans->transid); - btrfs_set_root_otransid(&root_item, trans->transid); + memcpy(root_item->uuid, new_uuid.b, BTRFS_UUID_SIZE); + btrfs_set_stack_timespec_sec(&root_item->otime, cur_time.tv_sec); + btrfs_set_stack_timespec_nsec(&root_item->otime, cur_time.tv_nsec); + root_item->ctime = root_item->otime; + btrfs_set_root_ctransid(root_item, trans->transid); + btrfs_set_root_otransid(root_item, trans->transid); btrfs_tree_unlock(leaf); free_extent_buffer(leaf);
Re: primary location of btrfs-progs changelog: The wiki?
On Fri, Apr 22, 2016 at 08:41:36PM -0400, Nicholas D Steeves wrote: > I'm just wondering where the primary location of the btrfs-progs > changelog is located, because I'd like to include upstream changes in > the Debian package. Is it really the wiki? If so, it would seem my > options are copying+pasting with every release, or writing a script to > download the page, convert it to text, and then do something like cut > everything before By version (btrfs-progs) and everything after By > version (linux kernel). At the moment it's the release announcement in this mailinglist, that gets copied to the wiki with some formatting adjustments. I'm willing to copy the announcement text to a file in git (and will do for the next release). But at the moment I won't add all the past changelogs so if anybody wants to do that I'l appreciate that. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs forced readonly + errno=-28 No space left
Dne 22.4.2016 v 23:00 Nicholas D Steeves napsal(a): > On 21 April 2016 at 18:44, Chris Murphy wrote: >> On Thu, Apr 21, 2016 at 6:53 AM, Martin Svec wrote: >>> Hello, >>> >>> we use btrfs subvolumes for rsync-based backups. During backups btrfs often >>> fails with "No space >>> left" error and goes to readonly mode (dmesg output is below) while there's >>> still plenty of >>> unallocated space: >> Are you snapshotting near the time of enospc? If so it's a known >> problem that's been around for a while. There are some suggestions in >> the archives but I think the main thing is to back off on the workload >> momentarily, take the snapshot, and then resume the workload. I don't >> think it has to come to a complete stop but it's a lot more >> reproducible with heavy writes. > Is this known problem specific to heavy writes + take a snapshot + -o > compress (either zlib or lzo), or does this enospc also affect the > more simple heavy writes + take a snapshot case? Is there a greater > likelyhood of running into it if using compression? In our case, I saw no difference when the compression was disabled. Martin Svec -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs forced readonly + errno=-28 No space left
Dne 22.4.2016 v 0:44 Chris Murphy napsal(a): > On Thu, Apr 21, 2016 at 6:53 AM, Martin Svec wrote: >> Hello, >> >> we use btrfs subvolumes for rsync-based backups. During backups btrfs often >> fails with "No space >> left" error and goes to readonly mode (dmesg output is below) while there's >> still plenty of >> unallocated space: > Are you snapshotting near the time of enospc? What do you mean by "near"? Milliseconds, seconds, minutes? In general, yes, but it's hard to say exactly because multiple backup jobs run in parallel every night. > If so it's a known problem that's been around for a while. There are some > suggestions in > the archives but I think the main thing is to back off on the workload > momentarily, take the snapshot, and then resume the workload. I don't > think it has to come to a complete stop but it's a lot more > reproducible with heavy writes. I'm afraid we cannot throttle the workload, due to backup jobs concurrency. I would expect this to be done at the filesystem level. Anyway, how can I help to fix this bug? Is there anybody who works on fixing it or is it considered a "feature"? Best regards Martin Svec -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html