Re: [PATCH] Btrfs: print error messages when failing to read trees

2018-03-28 Thread Nikolay Borisov


On 29.03.2018 01:11, Liu Bo wrote:
> When mount fails to read trees like fs tree, checksum tree, extent
> tree, etc, there is not enough information about where went wrong.
> 
> With this, messages like
> 
> "BTRFS warning (device sdf): failed to read root (objectid=7): -5"
> 
> would help us a bit.
> 
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/disk-io.c | 31 ++-
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 21f34ad..954616c 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2334,23 +2334,29 @@ static int btrfs_read_roots(struct btrfs_fs_info 
> *fs_info)
>   location.offset = 0;
>  
>   root = btrfs_read_tree_root(tree_root, &location);
> - if (IS_ERR(root))
> - return PTR_ERR(root);
> + if (IS_ERR(root)) {
> + ret = PTR_ERR(root);
> + goto out;
> + }
>   set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
>   fs_info->extent_root = root;
>  
>   location.objectid = BTRFS_DEV_TREE_OBJECTID;
>   root = btrfs_read_tree_root(tree_root, &location);
> - if (IS_ERR(root))
> - return PTR_ERR(root);
> + if (IS_ERR(root)) {
> + ret = PTR_ERR(root);
> + goto out;
> + }
>   set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
>   fs_info->dev_root = root;
>   btrfs_init_devices_late(fs_info);
>  
>   location.objectid = BTRFS_CSUM_TREE_OBJECTID;
>   root = btrfs_read_tree_root(tree_root, &location);
> - if (IS_ERR(root))
> - return PTR_ERR(root);
> + if (IS_ERR(root)) {
> + ret = PTR_ERR(root);
> + goto out;
> + }
>   set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
>   fs_info->csum_root = root;
>  
> @@ -2367,7 +2373,7 @@ static int btrfs_read_roots(struct btrfs_fs_info 
> *fs_info)
>   if (IS_ERR(root)) {
>   ret = PTR_ERR(root);
>   if (ret != -ENOENT)
> - return ret;
> + goto out;
>   } else {
>   set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
>   fs_info->uuid_root = root;
> @@ -2376,13 +2382,19 @@ static int btrfs_read_roots(struct btrfs_fs_info 
> *fs_info)
>   if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
>   location.objectid = BTRFS_FREE_SPACE_TREE_OBJECTID;
>   root = btrfs_read_tree_root(tree_root, &location);
> - if (IS_ERR(root))
> - return PTR_ERR(root);
> + if (IS_ERR(root)) {
> + ret = PTR_ERR(root);
> + goto out;
> + }
>   set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
>   fs_info->free_space_root = root;
>   }
>  
>   return 0;
> +out:
> + btrfs_warn(fs_info, "failed to read root (objectid=%llu): %d",
> +location.objectid, ret);

Since those are critical errors don't we want btrfs_err here?

> + return ret;
>  }
>  
>  int open_ctree(struct super_block *sb,
> @@ -2953,6 +2965,7 @@ int open_ctree(struct super_block *sb,
>   fs_info->fs_root = btrfs_read_fs_root_no_name(fs_info, &location);
>   if (IS_ERR(fs_info->fs_root)) {
>   err = PTR_ERR(fs_info->fs_root);
> + btrfs_warn(fs_info, "failed to read fs tree: %d", err);

ditto

>   goto fail_qgroup;
>   }
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Corruption on Big Endian System

2018-03-28 Thread Ashu Tiwary
On Tue, Mar 27, 2018 at 7:13 AM, David Sterba  wrote:
> On Mon, Mar 26, 2018 at 10:00:04AM -0500, Ashu Tiwary wrote:
>> It appears my system may have hit the issue reverted here (
>> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg74621.html
>> ) ( [PATCH] Revert "btrfs: use proper endianness accessors for
>> super_copy" ); system is an IBM OpenPower 720 (Big Endian) running
>> Fedora 27; kernel was at 4.15.9; attempting to reboot after updating
>> system (including kernel to 4.15.10) caused system to not be able to
>> boot:
>>
>> =
>>  Mounting /sysroot...
>> [   34.644721] BTRFS warning (device dm-3): suspicious: generation <
>> chunk_root_generation: 15959351903540740096 < 17261735521269317632
>> [   34.644761] BTRFS info (device dm-3): disk space caching is enabled
>> [   34.644771] BTRFS info (device dm-3): has skinny extents
>> [   34.645925] BTRFS critical (device dm-3): unable to find logical
>> 71472550772736 length 65536
>> [   34.645941] BTRFS critical (device dm-3): unable to find logical
>> 71472550772736 length 65536
>> [   34.645952] BTRFS error (device dm-3): failed to read chunk root
>> [   34.807156] BTRFS error (device dm-3): open_ctree failed
>> [FAILED] Failed to mount /sysroot.
>> =
>>
>> It appears there is a manual method available to repair the corrupted
>> superblock: can that be made available?
>
> Yes, details below.
>
> Quick check from your logs:
>
> chunk_root_generation: 15959351903540740096 = 0xdd7b0200
> should be: 162781 = 0x0x27bdd
>
> tree root block pointer: 71472550772736 = 0x4101
> should be: 21037056 = 0x141
>
> The tool is available in the branch devel in my repos.  Setup repository
> unless you already have one:
>
> $ git clone git://github.com/kdave/btrfs-progs
> $ cd btrfs-progs
> $ git checkout devel
> $ ./autogen.sh
> $ ./configure
>
> Build the tool:
>
> $ make btrfs-sb-mod
>
> Use it (replace the path with the real one) and save the output in case
> something goes unexpectedly wrong:
>
> device=/path/to/the/device
> ./btrfs-sb-mod $device root @0
> ./btrfs-sb-mod $device generation @0
> ./btrfs-sb-mod $device chunk_root @0
> ./btrfs-sb-mod $device chunk_root_generation @0
> ./btrfs-sb-mod $device cache_generation @0
> ./btrfs-sb-mod $device uuid_tree_generation @0
>
> It prints the current and new values so it's reversible, besides that
> the byteswap can be run twice to get back to the starting point.
>
> Then the filesystem should be mountable again.


Hi,

That utility worked perfectly (it took me some time as I had to find
another system on which to build the recovery tool).

Thanks,

-Ashu

--
Ashu Tiwary
ashuat...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: tests/mkfs: Test if mkfs.btrfs --rootdir can handle broken soft link well

2018-03-28 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 .../016-rootdir-bad-symbolic-link/test.sh  | 26 ++
 1 file changed, 26 insertions(+)
 create mode 100755 tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh

diff --git a/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh 
b/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh
new file mode 100755
index ..d12efa629042
--- /dev/null
+++ b/tests/mkfs-tests/016-rootdir-bad-symbolic-link/test.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+# Regression test for mkfs.btrfs --rootdir with bad symbolic link
+# (points to non-existing location)
+#
+# Since mkfs.btrfs --rootdir will just create symbolic link other than
+# follow it, we shouldn't hit any problem
+
+source "$TEST_TOP/common"
+
+check_prereq mkfs.btrfs
+
+prepare_test_dev
+
+tmp=$(mktemp -d --tmpdir btrfs-progs-mkfs.rootdirXXX)
+
+non_existing="/no/such/file"
+
+if [ -f "$non_existing" ]; then
+   # Some smartass don't want to this test case to run
+   _not_run "Some one created $non_exist, which is not expect to exist"
+fi
+
+run_check ln -sf "$non_existing" "$tmp/foobar"
+
+run_check "$TOP/mkfs.btrfs" -f --rootdir "$tmp" "$TEST_DEV"
+run_check "$TOP/btrfs" check "$TEST_DEV"
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: mkfs/rootdir: Don't follow symbolic link when calcuating size

2018-03-28 Thread Qu Wenruo


On 2018年03月28日 22:55, David Sterba wrote:
> On Wed, Mar 28, 2018 at 02:39:09PM +0800, Qu Wenruo wrote:
>> [BUG]
>> If we have a symbolic link in rootdir pointing to non-existing location,
>> mkfs.btrfs --rootdir will just fail:
>> --
>> $ mkfs.btrfs  -f --rootdir /tmp/rootdir/ /dev/data/btrfs
>> btrfs-progs v4.15.1
>> See http://btrfs.wiki.kernel.org for more information.
>>
>> ERROR: ftw subdir walk of /tmp/rootdir/ failed: No such file or directory
>> --
>>
>> [CAUSE]
>> Commit 599a0abed564 ("btrfs-progs: mkfs/rootdir: Use over-reserve method
>> to make size estimate easier") add extra ftw walk to estimate the
>> filesystem size.
>>
>> Such default ftw walk will follow symbolic link and gives ENOENT error.
>>
>> [FIX]
>> Use nftw() to specify FTW_PHYS so we won't follow symbolic link for size
>> calculation.
>>
>> Reported-by: Alexander Kanavin 
>> Fixes: 599a0abed564 ("btrfs-progs: mkfs/rootdir: Use over-reserve method to 
>> make size estimate easier")
>> Signed-off-by: Qu Wenruo 
> 
> Applied, thanks. Can you please write a testcase for that?
> 
Sorry, I planned but forgot that.

Will come soon.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


[PATCH v2.1 2/2] btrfs: Validate child tree block's level and first key

2018-03-28 Thread Qu Wenruo
We have several reports about node pointer points to incorrect child
tree blocks, which could have even wrong owner and level but still with
valid generation and checksum.

Although btrfs check could handle it and print error message like:
leaf parent key incorrect 60670574592

Kernel doesn't have enough check on this type of corruption correctly.
At least add such check to read_tree_block() and btrfs_read_buffer(),
where we need two new parameters @level and @first_key to verify the
child tree block.

The new @level check is mandatory and all call sites are already
modified to extract expected level from its call chain.

While @first_key is optional, the following call sites are skipping such
check:
1) Root node/leaf
   As ROOT_ITEM doesn't contain the first key, skip @first_key check.
2) Direct backref
   Only parent bytenr and level is known and we need to resolve the key
   all by ourselves, skip @first_key check.

Another note of this verification is, it needs extra info from nodeptr
or ROOT_ITEM, so it can't fit into current tree-checker framework, which
is limited to node/leaf boundary.

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Make @level check mandatory, suggesed by Jeff and Nikolay.
  Change parameter order as @level is now mandatory, put it in front of
  @first_key.
  Change verify_parent_level() to verify_key_level() to avoid confusion
  on the @level parameter.
  Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging.
v2.1
  Rebased to misc-next branch, to use parent_level, it needs to revert
  Nikolay's patch "btrfs: Remove unused parent_level var from 
btrfs_realloc_node",
  sorry Nikolay.
---
 fs/btrfs/backref.c |  6 ++--
 fs/btrfs/ctree.c   | 31 
 fs/btrfs/disk-io.c | 95 +++---
 fs/btrfs/disk-io.h |  8 +++--
 fs/btrfs/extent-tree.c |  6 +++-
 fs/btrfs/print-tree.c  | 10 --
 fs/btrfs/qgroup.c  |  7 ++--
 fs/btrfs/ref-verify.c  |  7 +++-
 fs/btrfs/relocation.c  | 21 ---
 fs/btrfs/tree-log.c| 28 +--
 10 files changed, 173 insertions(+), 46 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 6007dd6b799e..571024bc632e 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -738,7 +738,8 @@ static int add_missing_keys(struct btrfs_fs_info *fs_info,
BUG_ON(ref->key_for_search.type);
BUG_ON(!ref->wanted_disk_byte);
 
-   eb = read_tree_block(fs_info, ref->wanted_disk_byte, 0);
+   eb = read_tree_block(fs_info, ref->wanted_disk_byte, 0,
+ref->level - 1, NULL);
if (IS_ERR(eb)) {
free_pref(ref);
return PTR_ERR(eb);
@@ -1288,7 +1289,8 @@ static int find_parent_nodes(struct btrfs_trans_handle 
*trans,
ref->level == 0) {
struct extent_buffer *eb;
 
-   eb = read_tree_block(fs_info, ref->parent, 0);
+   eb = read_tree_block(fs_info, ref->parent, 0,
+ref->level, NULL);
if (IS_ERR(eb)) {
ret = PTR_ERR(eb);
goto out;
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 88a8891ef2ec..7c8faeb868f4 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1354,6 +1354,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
struct tree_mod_root *old_root = NULL;
u64 old_generation = 0;
u64 logical;
+   int level;
 
eb_root = btrfs_read_lock_root_node(root);
tm = __tree_mod_log_oldest_root(eb_root, time_seq);
@@ -1364,15 +1365,17 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
old_root = &tm->old_root;
old_generation = tm->generation;
logical = old_root->logical;
+   level = old_root->level;
} else {
logical = eb_root->start;
+   level = btrfs_header_level(eb_root);
}
 
tm = tree_mod_log_search(fs_info, logical, time_seq);
if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
btrfs_tree_read_unlock(eb_root);
free_extent_buffer(eb_root);
-   old = read_tree_block(fs_info, logical, 0);
+   old = read_tree_block(fs_info, logical, 0, level, NULL);
if (WARN_ON(IS_ERR(old) || !extent_buffer_uptodate(old))) {
if (!IS_ERR(old))
free_extent_buffer(old);
@@ -1571,11 +1574,14 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
int end_slot;
int i;
int err = 0;
+   int parent_level;
int uptodate;
u32 blocksize;
int progress_passed = 0;
struct btrfs_disk_key disk_key;
 
+  

[PATCH] btrfs: delayed-inode: Don't double reserve qgroup metadata for btrfs_delayed_item_reserve_metadata()

2018-03-28 Thread Qu Wenruo
[BUG]
With latest qgroup metadata reservation patch applied, it's possible to
hit BUG_ON() when running btrfs/042:
--
 run fstests btrfs/042 at 2018-03-28 12:14:26
 BTRFS: device fsid cc876c27-bf31-44d6-bd6a-2c19b8c8e1b8 devid 1 transid 5 
/dev/sdc1
 BTRFS info (device sdc1): disk space caching is enabled
 BTRFS info (device sdc1): has skinny extents
 BTRFS info (device sdc1): flagging fs with big metadata feature
 BTRFS info (device sdc1): creating UUID tree
 BTRFS info (device sdc1): qgroup scan completed (inconsistency flag cleared)
 [ cut here ]
 kernel BUG at fs/btrfs/delayed-inode.c:1467!
 invalid opcode:  [#1] PREEMPT SMP PTI
 CPU: 12 PID: 28830 Comm: xfs_io Tainted: G  I  
4.16.0-rc7-1.gab9e909-default+ #92
 Hardware name: HP ProLiant DL380 G6, BIOS P62 10/01/2009
 RIP: 0010:btrfs_insert_delayed_dir_index+0x230/0x240 [btrfs]
 RSP: 0018:9b4488777a30 EFLAGS: 00010282
 RAX: ff86 RBX: 8f1bd86ee600 RCX: 0020
 RDX: 000c RSI: 0004 RDI: 8f1bd3a830a8
 RBP: 8f1bd86ee618 R08:  R09: 0001
 R10:  R11: 0001 R12: 8f1bd5cdf908
 R13: 0004 R14: 8f1bc38f5680 R15: 8f1bc6364820
 FS:  7ffae04eb700() GS:8f1bdaa0() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 00626880 CR3: 0001155b6000 CR4: 06e0
 Call Trace:
  ? _raw_spin_unlock+0x24/0x40
  btrfs_insert_dir_item+0x1a9/0x210 [btrfs]
  btrfs_add_link+0xec/0x3d0 [btrfs]
  btrfs_create+0x19c/0x1e0 [btrfs]
  lookup_open+0x64c/0x720
  ? __wake_up_common_lock+0x51/0x90
  ? find_held_lock+0x3c/0xa0
  path_openat+0x5ff/0xcf0
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0xc/0xb0
  do_filp_open+0x7e/0xd0
  ? _raw_spin_unlock+0x24/0x40
  do_sys_open+0x116/0x1e0
  do_syscall_64+0x6e/0x110
  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
 RIP: 0033:0x7ffae00d3520
 RSP: 002b:7ffc0554bf48 EFLAGS: 0246 ORIG_RAX: 0002
 RAX: ffda RBX: 4042 RCX: 7ffae00d3520
 RDX: 0180 RSI: 4042 RDI: 7ffc0554d537
 RBP: 0022 R08: 7ffc0554c110 R09: 0001
 R10: 12ef9861 R11: 0246 R12: 7ffadfe7220c
 R13: 7ffc0554c110 R14: 7ffc0554d537 R15: 7ffc0554c260
 RIP: btrfs_insert_delayed_dir_index+0x230/0x240 [btrfs] RSP: 9b4488777a30
 ---[ end trace 88182a219a9080f6 ]---
--
Where the BUG_ON() is triggered by -EDQUOT.

[CAUSE]
btrfs_qgroup_reserve_meta_prealloc() is called in
btrfs_delayed_item_reserve_metadata(), however
btrfs_delayed_item_reserve_metadata() is just migrating reserved space
from current transaction to delayed_block_rsv, all metadata space is
already reserved when starting a new transaction.

And for all @trans passed into btrfs_delayed_item_reserve_metadata(),
they are all new transaction allocated by btrfs_start_transaction(), or
operation won't increase btrfs reserved metadata space (delete item,
like unlink).

So the extra btrfs_qgroup_reserve_meta_prealloc() call here could cause
unexpected -EDQUOT and hit BUG_ON() for its caller.

[FIX]
Fix it by just remove the btrfs_qgroup_reserve_meta_prealloc() call.

Reported-by: David Sterba 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/delayed-inode.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index f1bc110e229c..492af96d0e38 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -569,9 +569,6 @@ static int btrfs_delayed_item_reserve_metadata(struct 
btrfs_trans_handle *trans,
dst_rsv = &fs_info->delayed_block_rsv;
 
num_bytes = btrfs_calc_trans_metadata_size(fs_info, 1);
-   ret = btrfs_qgroup_reserve_meta_prealloc(root, num_bytes, true);
-   if (ret < 0)
-   return ret;
 
ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes, 1);
if (!ret) {
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: tests/qgroup: Fix wrong tree backref level

2018-03-28 Thread Qu Wenruo


On 2018年03月28日 23:32, David Sterba wrote:
> On Tue, Mar 27, 2018 at 08:44:18PM +0800, Qu Wenruo wrote:
>> The extent tree of the test fs is like the following:
>>
>>  BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free 
>> space 3919
>>   item 0 key (4096 168 4096) itemoff 3944 itemsize 51
>>   extent refs 1 gen 1 flags 2
>>   tree block key (68719476736 0 0) level 1
>>^^^
>>   ref#0: tree block backref root 5
>>
>> And it's using an empty tree for fs tree, so there is no way that its
>> level can be 1.
>>
>> For REAL (created by mkfs) fs tree backref with no skinny metadata, the
>> result should look like:
>>
>>  item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
>>  refs 1 gen 4 flags TREE_BLOCK
>>  tree block key (256 INODE_ITEM 0) level 0
>>^^^
>>  tree block backref root 5
>>
>> Fix the level to 0, so it won't break later tree level checker.
>>
>> Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting 
>> code")
>> Signed-off-by: Qu Wenruo 
> 
> So this is just a bug in the self-tests and does not have any other
> impact, right?

Yep, until we're implementing level check for backref (and all other
tree block reader)

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: error report: misc-test 006 sometimes fails in current devel branch

2018-03-28 Thread Misono Tomohiro
On 2018/03/28 23:50, David Sterba wrote:
> On Wed, Mar 28, 2018 at 10:55:56AM +0900, Misono Tomohiro wrote:
>> current devel branch of btrfs-progs (github) occasionally fails at misc-test 
>> 006:
>> (kernel is 4.16.0-rc7)
> 
> Can you please also open an issue on github, with the details attached? 
> Thanks.

Yes, I opened the issue: https://github.com/kdave/btrfs-progs/issues/118

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] fstests: generic: Check the fs after each FUA writes

2018-03-28 Thread Dave Chinner
On Wed, Mar 28, 2018 at 12:40:23PM +0800, Qu Wenruo wrote:
> Basic test case which triggers fsstress with dm-log-writes, and then
> check the fs after each FUA writes.
> With needed infrastructure and special handlers for journal based fs.
> 
> Signed-off-by: Qu Wenruo 
.
> For xfs:
> _check_xfs_filesystem: filesystem on /dev/mapper/test-scratch1 is 
> inconsistent (r)
> *** xfs_repair -n output ***
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan (but don't clear) agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> - agno = 1
> - agno = 2
> bad symlink header ino 8409190, file block 0, disk block 1051147
> problem with symbolic link in inode 8409190
> would have cleared inode 8409190
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 3
> - agno = 2
> entry "lb" in shortform directory 8409188 references free inode 8409190
> would have junked entry "lb" in directory inode 8409188
> bad symlink header ino 8409190, file block 0, disk block 1051147
> problem with symbolic link in inode 8409190
> would have cleared inode 8409190
> No modify flag set, skipping phase 5
> Phase 6 - check inode connectivity...
> - traversing filesystem ...
> entry "lb" in shortform directory inode 8409188 points to free inode 8409190
> would junk entry
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
> Phase 7 - verify link counts...
> Maximum metadata LSN (1:396) is ahead of log (1:63).

That warning implies a write ordering problem - there's a write with
an LSN on disk that does not yet exist in the log. i.e. the last FUA
write to the log had 1:63 in it, yet there's metadata on disk that
could only be *issued* after a REQ_FLUSH|REQ_FUA log write with
1:396 in it was *completed*. If we've only replayed up to the
FUA write with 1:63 in it, then no metadata writes should have been
*issued* with 1:396 in it as the LSN that is stamped into metadata
is only updated on log IO completion

On first glance, this implies a bug in the underlying device write
replay code

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: print error messages when failing to read trees

2018-03-28 Thread Liu Bo
When mount fails to read trees like fs tree, checksum tree, extent
tree, etc, there is not enough information about where went wrong.

With this, messages like

"BTRFS warning (device sdf): failed to read root (objectid=7): -5"

would help us a bit.

Signed-off-by: Liu Bo 
---
 fs/btrfs/disk-io.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 21f34ad..954616c 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2334,23 +2334,29 @@ static int btrfs_read_roots(struct btrfs_fs_info 
*fs_info)
location.offset = 0;
 
root = btrfs_read_tree_root(tree_root, &location);
-   if (IS_ERR(root))
-   return PTR_ERR(root);
+   if (IS_ERR(root)) {
+   ret = PTR_ERR(root);
+   goto out;
+   }
set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
fs_info->extent_root = root;
 
location.objectid = BTRFS_DEV_TREE_OBJECTID;
root = btrfs_read_tree_root(tree_root, &location);
-   if (IS_ERR(root))
-   return PTR_ERR(root);
+   if (IS_ERR(root)) {
+   ret = PTR_ERR(root);
+   goto out;
+   }
set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
fs_info->dev_root = root;
btrfs_init_devices_late(fs_info);
 
location.objectid = BTRFS_CSUM_TREE_OBJECTID;
root = btrfs_read_tree_root(tree_root, &location);
-   if (IS_ERR(root))
-   return PTR_ERR(root);
+   if (IS_ERR(root)) {
+   ret = PTR_ERR(root);
+   goto out;
+   }
set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
fs_info->csum_root = root;
 
@@ -2367,7 +2373,7 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info)
if (IS_ERR(root)) {
ret = PTR_ERR(root);
if (ret != -ENOENT)
-   return ret;
+   goto out;
} else {
set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
fs_info->uuid_root = root;
@@ -2376,13 +2382,19 @@ static int btrfs_read_roots(struct btrfs_fs_info 
*fs_info)
if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) {
location.objectid = BTRFS_FREE_SPACE_TREE_OBJECTID;
root = btrfs_read_tree_root(tree_root, &location);
-   if (IS_ERR(root))
-   return PTR_ERR(root);
+   if (IS_ERR(root)) {
+   ret = PTR_ERR(root);
+   goto out;
+   }
set_bit(BTRFS_ROOT_TRACK_DIRTY, &root->state);
fs_info->free_space_root = root;
}
 
return 0;
+out:
+   btrfs_warn(fs_info, "failed to read root (objectid=%llu): %d",
+  location.objectid, ret);
+   return ret;
 }
 
 int open_ctree(struct super_block *sb,
@@ -2953,6 +2965,7 @@ int open_ctree(struct super_block *sb,
fs_info->fs_root = btrfs_read_fs_root_no_name(fs_info, &location);
if (IS_ERR(fs_info->fs_root)) {
err = PTR_ERR(fs_info->fs_root);
+   btrfs_warn(fs_info, "failed to read fs tree: %d", err);
goto fail_qgroup;
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] btrfs: Validate child tree block's level and first key

2018-03-28 Thread David Sterba
On Tue, Mar 27, 2018 at 08:44:19PM +0800, Qu Wenruo wrote:
> We have several reports about node pointer points to incorrect child
> tree blocks, which could have even wrong owner and level but still with
> valid generation and checksum.
> 
> Although btrfs check could handle it and print error message like:
> leaf parent key incorrect 60670574592
> 
> Kernel doesn't have enough check on this type of corruption correctly.
> At least add such check to read_tree_block() and btrfs_read_buffer(),
> where we need two new parameters @level and @first_key to verify the
> child tree block.
> 
> The new @level check is mandatory and all call sites are already
> modified to extract expected level from its call chain.
> 
> While @first_key is optional, the following call sites are skipping such
> check:
> 1) Root node/leaf
>As ROOT_ITEM doesn't contain the first key, skip @first_key check.
> 2) Direct backref
>Only parent bytenr and level is known and we need to resolve the key
>all by ourselves, skip @first_key check.
> 
> Another note of this verification is, it needs extra info from nodeptr
> or ROOT_ITEM, so it can't fit into current tree-checker framework, which
> is limited to node/leaf boundary.
> 
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   Make @level check mandatory, suggesed by Jeff and Nikolay.
>   Change parameter order as @level is now mandatory, put it in front of
>   @first_key.
>   Change verify_parent_level() to verify_key_level() to avoid confusion
>   on the @level parameter.
>   Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging.

That's much better overall, thanks. Adding it to next.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: tests/qgroup: Fix wrong tree backref level

2018-03-28 Thread David Sterba
On Tue, Mar 27, 2018 at 08:44:18PM +0800, Qu Wenruo wrote:
> The extent tree of the test fs is like the following:
> 
>  BTRFS info (device (null)): leaf 16327509003777336587 total ptrs 1 free 
> space 3919
>   item 0 key (4096 168 4096) itemoff 3944 itemsize 51
>   extent refs 1 gen 1 flags 2
>   tree block key (68719476736 0 0) level 1
>^^^
>   ref#0: tree block backref root 5
> 
> And it's using an empty tree for fs tree, so there is no way that its
> level can be 1.
> 
> For REAL (created by mkfs) fs tree backref with no skinny metadata, the
> result should look like:
> 
>  item 3 key (30408704 EXTENT_ITEM 4096) itemoff 3845 itemsize 51
>  refs 1 gen 4 flags TREE_BLOCK
>  tree block key (256 INODE_ITEM 0) level 0
>^^^
>  tree block backref root 5
> 
> Fix the level to 0, so it won't break later tree level checker.
> 
> Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code")
> Signed-off-by: Qu Wenruo 

So this is just a bug in the self-tests and does not have any other
impact, right?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Panic and corruption after update and IO trouble

2018-03-28 Thread Sidney San Martín
Hi btrfs devs,

I recently updated Linux (4.15.x) and rebooted on a machine with a 12x4TB-disk 
btrfs volume, and it hung on boot. I did some initial troubleshooting and 
eventually saw in `btrfs dev stats` that one disk had a ton of errors. I 
settled on a theory that either the disk or the SAS backplane got wedged in a 
state where operations on that disk failed, because since power cycling the 
system the disk has behaved perfectly and smartctl doesn’t see any evidence of 
trouble.

When I tried to boot again, there were a flurry of messages like “csum mismatch 
on free space cache” and “parent transit verify failed on 64508387606528 wanted 
1555425 found 1548963”, culminating in a panic and hang whose middle looks like 
this (unfortunately I couldn’t scroll up in the console due to the hang, so I 
only have the end of it — and only as a picture):

 bvec_alloc+0x86/0xe0
 bio_alloc_bioset+0x132/0x1e0
 btrfs_bio_alloc+0x23/0x90 [btrfs]
 submit_extend_page+0x191/0x250 [btrfs]
 ? btrfs_create_repair_bio+0xf0/0xf0 [btrfs]
...

I ran a scrub, which finished successfully like this:

scrub started at Wed Mar 21 18:33:14 2018 and finished after 05:07:53
total bytes scrubbed: 27.05TiB with 1463058 errors
error details: verify=12491 csum=1450567
corrected errors: 1463058, uncorrectable errors: 0, unverified errors: 0

…which looks super promising. However, the machine still panicked on boot, or 
if I tried to mount the filesystem from a lived and perform more than a few 
operations on it.

I did some research and ended up going through roughly this set of recovery 
attempts:

1. Attempt to clear the space_cache.
2. btrfs-zero-log (maybe — I actually can’t remember if I did this, and I now 
see https://btrfs.wiki.kernel.org/index.php/Btrfs-zero-log saying to *please 
not* if the FS mounts).
3. `btrfs check --init-extent-tree`

I realize that, in reality, the right set of steps more likely involved 
"btrfs-image” and “send an email to this mailing list”. Unfortunately I exist 
in a state of mild impatience and high optimism and, as a result, I didn’t. My 
loss :/. The current state of things is that `btrfs check --init-extent-tree` 
has been running for 119 hours and has generated about 50MB of log output that 
looks like this:

ref mismatch on [51614823280640 4096] extent item 0, found 1
data backref 51614823280640 parent 57412861165568 owner 0 offset 0 num_refs 
0 not found in extent tree
incorrect local backref count on 51614823280640 parent 57412861165568 owner 
0 offset 0 found 1 wanted 0 back 0x5652a69bde00
backpointer mismatch on [51614823280640 4096]
adding new data backref on 51614823280640 parent 57412861165568 owner 0 
offset 0 found 1
Repaired extent references for 51614823280640

…which seems scary. I made the KVM recordings and screenshots I have, and the 
full `btrfs check --init-extent-tree` output (since only near the end did I 
spend time getting the system set up with a proper livecd and network access) 
available here in case anyone wants to look over my shoulder:

https://ipfs.io/ipfs/QmXTgYgA4fQs4BSM8GFXrRdufF39ZVxMTV77z8ANjmZvzk

At this point, I’m genuinely not sure whether `init-extent-tree` is moving in a 
useful direction (i.e. will ever finish), whether stopping and trying something 
else would be better, or whether it’s time to salvage some `btrfs-image`s if 
they’re useful and then start restoring from offsite backups.

I’d appreciate any guidance, but realize that my own troubleshooting so far was 
*far* from ideal and would accept “It’s time to restore from backups, but 
please do X next time instead of all that”. Happy to answer any other 
questions, too.

Sidney--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: mkfs/rootdir: Don't follow symbolic link when calcuating size

2018-03-28 Thread David Sterba
On Wed, Mar 28, 2018 at 02:39:09PM +0800, Qu Wenruo wrote:
> [BUG]
> If we have a symbolic link in rootdir pointing to non-existing location,
> mkfs.btrfs --rootdir will just fail:
> --
> $ mkfs.btrfs  -f --rootdir /tmp/rootdir/ /dev/data/btrfs
> btrfs-progs v4.15.1
> See http://btrfs.wiki.kernel.org for more information.
> 
> ERROR: ftw subdir walk of /tmp/rootdir/ failed: No such file or directory
> --
> 
> [CAUSE]
> Commit 599a0abed564 ("btrfs-progs: mkfs/rootdir: Use over-reserve method
> to make size estimate easier") add extra ftw walk to estimate the
> filesystem size.
> 
> Such default ftw walk will follow symbolic link and gives ENOENT error.
> 
> [FIX]
> Use nftw() to specify FTW_PHYS so we won't follow symbolic link for size
> calculation.
> 
> Reported-by: Alexander Kanavin 
> Fixes: 599a0abed564 ("btrfs-progs: mkfs/rootdir: Use over-reserve method to 
> make size estimate easier")
> Signed-off-by: Qu Wenruo 

Applied, thanks. Can you please write a testcase for that?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: build: modify cscope/ctags rules to include directories such as check

2018-03-28 Thread David Sterba
On Wed, Mar 28, 2018 at 02:07:38PM +0800, Lu Fengqi wrote:
> Modify cscope/ctags rule to include directories such as check/
> libbtrfsutil/kernel-lib/kernel-shared.
> 
> Signed-off-by: Lu Fengqi 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: error report: misc-test 006 sometimes fails in current devel branch

2018-03-28 Thread David Sterba
On Wed, Mar 28, 2018 at 10:55:56AM +0900, Misono Tomohiro wrote:
> current devel branch of btrfs-progs (github) occasionally fails at misc-test 
> 006:
> (kernel is 4.16.0-rc7)

Can you please also open an issue on github, with the details attached? Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: error report: misc-test 006 sometimes fails in current devel branch

2018-03-28 Thread David Sterba
On Wed, Mar 28, 2018 at 10:55:56AM +0900, Misono Tomohiro wrote:
> current devel branch of btrfs-progs (github) occasionally fails at misc-test 
> 006:
> (kernel is 4.16.0-rc7)

That's strange, the test should be deterministic. There seems to be some
timing involved. I'm able to reproduce here, thanks for the report.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: fix fsync after hole punching when using no-holes feature

2018-03-28 Thread David Sterba
On Mon, Mar 26, 2018 at 11:59:00PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> When we have the no-holes mode enabled and fsync a file after punching a
> hole in it, we can end up not logging the whole hole range in the log tree.
> This happens if the file has extent items that span more than one leaf and
> we punch a hole that covers a range that starts in a leaf but does not go
> beyond the offset of the first extent in the next leaf.
> 
> Example:
> 
>   $ mkfs.btrfs -f -O no-holes -n 65536 /dev/sdb
>   $ mount /dev/sdb /mnt
>   $ for ((i = 0; i <= 831; i++)); do
>   offset=$((i * 2 * 256 * 1024))
>   xfs_io -f -c "pwrite -S 0xab -b 256K $offset 256K" \
>   /mnt/foobar >/dev/null
> done
>   $ sync
> 
>   # We now have 2 leafs in our filesystem fs tree, the first leaf has an
>   # item corresponding the extent at file offset 216530944 and the second
>   # leaf has a first item corresponding to the extent at offset 217055232.
>   # Now we punch a hole that partially covers the range of the extent at
>   # offset 216530944 but does go beyond the offset 217055232.
> 
>   $ xfs_io -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" /mnt/foobar
>   $ xfs_io -c "fsync" /mnt/foobar
> 
>   
> 
>   # mount to replay the log
>   $ mount /dev/sdb /mnt
> 
>   # Before this patch, only the subrange [216658016, 216662016[ (length of
>   # 4000 bytes) was logged, leaving an incorrect file layout after log
>   # replay.
> 
> Fix this by checking if there is a hole between the last extent item that
> we processed and the first extent item in the next leaf, and if there is
> one, log an explicit hole extent item.
> 
> Fixes: 16e7549f045d ("Btrfs: incompatible format change to remove hole 
> extents")
> Signed-off-by: Filipe Manana 

1 and 2 added to next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: lift errors from add_extent_changeset to the callers

2018-03-28 Thread David Sterba
On Tue, Mar 27, 2018 at 07:44:03PM +0200, David Sterba wrote:
> The missing error handling in add_extent_changeset was hidden, so make
> it at least visible in the callers.
> 
> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/extent_io.c | 19 +++
>  1 file changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index f27bad003f8e..f141d0fd7a65 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -119,23 +119,22 @@ struct extent_page_data {
>   unsigned int sync_io:1;
>  };
>  
> -static void add_extent_changeset(struct extent_state *state, unsigned bits,
> +static int add_extent_changeset(struct extent_state *state, unsigned bits,
>struct extent_changeset *changeset,
>int set)
>  {
>   int ret;
>  
>   if (!changeset)
> - return;
> + return 0;
>   if (set && (state->state & bits) == bits)
> - return;
> + return 0;
>   if (!set && (state->state & bits) == 0)
> - return;
> + return 0;
>   changeset->bytes_changed += state->end - state->start + 1;
>   ret = ulist_add(&changeset->range_changed, state->start, state->end,
>   GFP_ATOMIC);
> - /* ENOMEM */
> - BUG_ON(ret < 0);
> + return ret;
>  }
>  
>  static void flush_write_bio(struct extent_page_data *epd);
> @@ -527,6 +526,7 @@ static struct extent_state *clear_state_bit(struct 
> extent_io_tree *tree,
>  {
>   struct extent_state *next;
>   unsigned bits_to_clear = *bits & ~EXTENT_CTLBITS;
> + int ret;
>  
>   if ((bits_to_clear & EXTENT_DIRTY) && (state->state & EXTENT_DIRTY)) {
>   u64 range = state->end - state->start + 1;
> @@ -534,7 +534,8 @@ static struct extent_state *clear_state_bit(struct 
> extent_io_tree *tree,
>   tree->dirty_bytes -= range;
>   }
>   clear_state_cb(tree, state, bits);
> - add_extent_changeset(state, bits_to_clear, changeset, 0);
> + ret = add_extent_changeset(state, bits_to_clear, changeset, 0);
> + BUG_ON(ret);

This must be BUG_ON(ret < 0) of course.

>   state->state &= ~bits_to_clear;
>   if (wake)
>   wake_up(&state->wq);
> @@ -805,13 +806,15 @@ static void set_state_bits(struct extent_io_tree *tree,
>  unsigned *bits, struct extent_changeset *changeset)
>  {
>   unsigned bits_to_set = *bits & ~EXTENT_CTLBITS;
> + int ret;
>  
>   set_state_cb(tree, state, bits);
>   if ((bits_to_set & EXTENT_DIRTY) && !(state->state & EXTENT_DIRTY)) {
>   u64 range = state->end - state->start + 1;
>   tree->dirty_bytes += range;
>   }
> - add_extent_changeset(state, bits_to_set, changeset, 1);
> + ret = add_extent_changeset(state, bits_to_set, changeset, 1);
> + BUG_ON(ret);

Same.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 0/7] btrfs-progs: Allow normal user to call "subvolume list/show"

2018-03-28 Thread Zygo Blaxell
On Wed, Mar 28, 2018 at 09:45:02AM -0400, Zygo Blaxell wrote:
> On Mon, Mar 19, 2018 at 04:30:17PM +0900, Misono, Tomohiro wrote:
> > This is a part of RFC I sent last December[1] whose aim is to improve 
> > normal users' usability.
> > The remaining works of RFC are: 
> >   - Allow "sub delete" for empty subvolume
> 
> I don't mean to scope creep on you, but I have a couple of wishes related
> to this topic:
> 
>   - allow "rmdir" to remove an empty subvolume, i.e. when a subvolume is

Ignore me... I just noticed your patch to do exactly this.  ;)

> detected in rmdir, try switching to subvol delete before returning
> an error.  This lets admin tools that are not btrfs-aware do 'rm
> -fr' on a user directory when it contains a subvolume.  Legacy admin
> tools (or legacy tools in general) can't remove a subvol, and there
> is no solution for environments where we can't just fire users who
> create them.
> 
>   - mount option to restrict "sub create" and "sub snapshot" to root only.
> If we get "rmdir" working then this is significantly less important.
> 
> >   - Allow "qgroup show" to check quota limit
> > 
> > [1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg70991.html
> 




signature.asc
Description: PGP signature


Re: [RFC PATCH v3 0/7] btrfs-progs: Allow normal user to call "subvolume list/show"

2018-03-28 Thread Zygo Blaxell
On Mon, Mar 19, 2018 at 04:30:17PM +0900, Misono, Tomohiro wrote:
> This is a part of RFC I sent last December[1] whose aim is to improve normal 
> users' usability.
> The remaining works of RFC are: 
>   - Allow "sub delete" for empty subvolume

I don't mean to scope creep on you, but I have a couple of wishes related
to this topic:

  - allow "rmdir" to remove an empty subvolume, i.e. when a subvolume is
detected in rmdir, try switching to subvol delete before returning
an error.  This lets admin tools that are not btrfs-aware do 'rm
-fr' on a user directory when it contains a subvolume.  Legacy admin
tools (or legacy tools in general) can't remove a subvol, and there
is no solution for environments where we can't just fire users who
create them.

  - mount option to restrict "sub create" and "sub snapshot" to root only.
If we get "rmdir" working then this is significantly less important.

>   - Allow "qgroup show" to check quota limit
> 
> [1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg70991.html



signature.asc
Description: PGP signature


Re: [PATCH] fstests: test btrfs fsync after hole punching with no-holes mode

2018-03-28 Thread Eryu Guan
On Wed, Mar 28, 2018 at 09:48:17AM +0100, Filipe Manana wrote:
> On Wed, Mar 28, 2018 at 3:17 AM, Eryu Guan  wrote:
> > On Mon, Mar 26, 2018 at 11:59:21PM +0100, fdman...@kernel.org wrote:
> >> From: Filipe Manana 
> >>
> >> Test that when we have the no-holes mode enabled and a specific metadata
> >> layout, if we punch a hole and fsync the file, at replay time the whole
> >> hole was preserved.
> >>
> >> This issue is fixed by the following btrfs patch for the linux kernel:
> >>
> >>   "Btrfs: fix fsync after hole punching when using no-holes feature"
> >
> > I'd expect a test failure with 4.16-rc6 kernel, as the mentioned fix
> > above is not there. But test always passes for me. Did I miss anything?
> > btrfs-progs version is btrfs-progs-4.11.1-3.fc27.
> 
> It should fail on any kernel, with any btrfs-progs version (which
> should be irrelevant).
> Somehow on your system we are not getting the specific metadata layout
> needed to trigger the issue.
> 
> Can you apply the following patch on top of the test and provide the
> result 159.full file?
> 
> https://friendpaste.com/6xAuLeN4xl1AGjO9Qc5I8L
> 
> So that I can see what metadata layout you are getting.
> Thanks!

Sure, please see attachment.

Thanks,
Eryu


btrfs-159.full.bz2
Description: BZip2 compressed data


Re: [PATCH] fstests: test btrfs fsync after hole punching with no-holes mode

2018-03-28 Thread Filipe Manana
On Wed, Mar 28, 2018 at 3:17 AM, Eryu Guan  wrote:
> On Mon, Mar 26, 2018 at 11:59:21PM +0100, fdman...@kernel.org wrote:
>> From: Filipe Manana 
>>
>> Test that when we have the no-holes mode enabled and a specific metadata
>> layout, if we punch a hole and fsync the file, at replay time the whole
>> hole was preserved.
>>
>> This issue is fixed by the following btrfs patch for the linux kernel:
>>
>>   "Btrfs: fix fsync after hole punching when using no-holes feature"
>
> I'd expect a test failure with 4.16-rc6 kernel, as the mentioned fix
> above is not there. But test always passes for me. Did I miss anything?
> btrfs-progs version is btrfs-progs-4.11.1-3.fc27.

It should fail on any kernel, with any btrfs-progs version (which
should be irrelevant).
Somehow on your system we are not getting the specific metadata layout
needed to trigger the issue.

Can you apply the following patch on top of the test and provide the
result 159.full file?

https://friendpaste.com/6xAuLeN4xl1AGjO9Qc5I8L

So that I can see what metadata layout you are getting.
Thanks!

>
>>
>> Signed-off-by: Filipe Manana 
>> ---
>>  tests/btrfs/159 | 100 
>> 
>>  tests/btrfs/159.out |   5 +++
>>  tests/btrfs/group   |   1 +
>>  3 files changed, 106 insertions(+)
>>  create mode 100755 tests/btrfs/159
>>  create mode 100644 tests/btrfs/159.out
>>
>> diff --git a/tests/btrfs/159 b/tests/btrfs/159
>> new file mode 100755
>> index ..6083975a
>> --- /dev/null
>> +++ b/tests/btrfs/159
>> @@ -0,0 +1,100 @@
>> +#! /bin/bash
>> +# FSQA Test No. 159
>> +#
>> +# Test that when we have the no-holes mode enabled and a specific metadata
>> +# layout, if we punch a hole and fsync the file, at replay time the whole
>> +# hole was preserved.
>> +#
>> +#---
>> +#
>> +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
>> +# Author: Filipe Manana 
>> +#
>> +# This program is free software; you can redistribute it and/or
>> +# modify it under the terms of the GNU General Public License as
>> +# published by the Free Software Foundation.
>> +#
>> +# This program is distributed in the hope that it would be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program; if not, write the Free Software Foundation,
>> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
>> +#---
>> +#
>> +
>> +seq=`basename $0`
>> +seqres=$RESULT_DIR/$seq
>> +echo "QA output created by $seq"
>> +tmp=/tmp/$$
>> +status=1 # failure is the default!
>> +trap "_cleanup; exit \$status" 0 1 2 3 15
>> +
>> +_cleanup()
>> +{
>> + _cleanup_flakey
>> + cd /
>> + rm -f $tmp.*
>> +}
>> +
>> +# get standard environment, filters and checks
>> +. ./common/rc
>> +. ./common/filter
>> +. ./common/dmflakey
>> +
>> +# real QA test starts here
>> +_supported_fs generic
>  ^^^ btrfs
>> +_supported_os Linux
>> +_require_scratch
>> +_require_dm_target flakey
>> +_require_xfs_io_command "fpunch"
>> +
>> +rm -f $seqres.full
>> +
>> +# We create the filesystem with a node size of 64Kb because we need to 
>> create a
>> +# specific metadata layout in order to trigger the bug we are testing. At 
>> the
>> +# moment the node size can not be smaller then the system's page size, so 
>> given
>> +# that the largest possible page size is 64Kb and by default the node size 
>> is
>> +# set to the system's page size value, we explictly create a filesystem 
>> with a
>> +# 64Kb node size.
>> +_scratch_mkfs -O no-holes -n $((64 * 1024)) >>$seqres.full 2>&1
>> +_require_metadata_journaling $SCRATCH_DEV
>> +_init_flakey
>> +_mount_flakey
>> +
>> +# Create our test file with 831 extents of 256Kb each. Before each extent, 
>> there
>
> I think it's 832 extents in total? As the index starts with 0.
>
> Otherwise looks good to me.
>
> Thanks,
> Eryu
>
>> +# is a 256Kb hole (except for the first extent, which starts at offset 0). 
>> This
>> +# creates two leafs in the filesystem tree, with the extent at offset 
>> 216530944
>> +# being the last item in the first leaf and the extent at offset 217055232 
>> being
>> +# the first item in the second leaf.
>> +for ((i = 0; i <= 831; i++)); do
>> + offset=$((i * 2 * 256 * 1024))
>> + $XFS_IO_PROG -f -c "pwrite -S 0xab -b 256K $offset 256K" \
>> + $SCRATCH_MNT/foobar >/dev/null
>> +done
>> +
>> +# Now persist everything done so far.
>> +sync
>> +
>> +# Now punch a hole that covers part of the extent at offset 216530944.
>> +$XFS_IO_PROG -c "fpunch $((216530944 + 128 * 1024 - 4000)) 256K" \
>> +  -c "fsync" \
>> +  $SCRATCH_MNT/foobar
>> +
>> +echo

Re: [PATCH v2 6/8] btrfs: verify superblock checksum during scan

2018-03-28 Thread Nikolay Borisov


On 28.03.2018 02:39, Anand Jain wrote:
> During the scan context, we aren't verifying the superblock-
> checksum when read.
> This patch fixes it by adding the checksum verification function
> btrfs_check_super_csum() in the function btrfs_read_disk_super().
> And makes device scan to error fail if the primary superblock csum
> is wrong, whereas if the copy-superblock csum is wrong it will just
> just report mismatch and continue mount/scan as usual. When the

Where in this patch do you deal with the secondary sb.
btrfs_read_disk_super verifies only the primary sb which corresponds to
the first part of the sentence. But I'm confused about the second?

I also think that 4/8 should be merged into this one.

> mount is successful We anyway overwrite all superblocks upon unmount.
> 
> The context in which this will be called is - device scan, device ready,
> and mount -o device option.
> 
> Test script:
> 
>  Corrupt primary superblock and check if device scan and mount
>  fails:
>   mkfs.btrfs -fq /dev/sdc
>   dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=64K
>   btrfs dev scan /dev/sdc
>   mount /dev/sdc /btrfs
> 
>  Corrupt secondary superblock and check if device scan and mount
>  is succcessful, check for the dmesg for errors.
>   mkfs.btrfs -fq /dev/sdc
>   dd if=/dev/urandom of=/dev/sdc ibs=1 obs=1 count=1 seek=67108864
>   btrfs dev scan /dev/sdc
>   mount /dev/sdc /btrfs
> 
> Signed-off-by: Anand Jain 
> ---
> v1->v2:
>  changed title.
>  use explicit (< 0) check for %errr.
>  Un-split pr_err() string.
>  Fix typo in the git commit log.
>  Move the csum check after bytenr and btrfs magic verified.
> 
>  fs/btrfs/volumes.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index b099823f60d1..eda86ba258fc 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1149,6 +1149,7 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>struct page **page,
>struct btrfs_super_block **disk_super)
>  {
> + int err;
>   void *p;
>   pgoff_t index;
>  
> @@ -1183,6 +1184,18 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>   return -EINVAL;
>   }
>  
> + err = btrfs_check_super_csum((char *) *disk_super);
> + if (err < 0) {
> + if (err == -EINVAL)
> + pr_err("BTRFS error (device %pg): unsupported checksum 
> type, bytenr=%llu",
> + bdev, bytenr);
> + else
> + pr_err("BTRFS error (device %pg): superblock checksum 
> failed, bytenr=%llu",
> + bdev, bytenr);
> + btrfs_release_disk_super(*page);
> + return err;
> + }
> +
>   if ((*disk_super)->label[0] &&
>   (*disk_super)->label[BTRFS_LABEL_SIZE - 1])
>   (*disk_super)->label[BTRFS_LABEL_SIZE - 1] = '\0';
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 5/8] btrfs: check if the fsid in the primary sb and copy sb are same

2018-03-28 Thread Nikolay Borisov


On 28.03.2018 02:39, Anand Jain wrote:
> During the btrfs dev scan make sure that other copies of superblock
> contain the same fsid as the primary SB. So that we bring to the
> user notice if the superblock has been overwritten.
> 
>  mkfs.btrfs -fq /dev/sdc
>  mkfs.btrfs -fq /dev/sdb
>  dd if=/dev/sdb of=/dev/sdc count=4K skip=64K seek=64K obs=1 ibs=1
>  mount /dev/sdc /btrfs
> 
> Caveat: Pls note that older btrfs-progs do not wipe the non-overwriting
> stale superblock like copy2 if a smaller mkfs.btrfs -b   is created.
> Thus this patch in the kernel will report error. The workaround is to wipe
> the superblock manually, like
>   dd if=/dev/zero of= seek=274877906944 ibs=1 obs=1 count4K
> OR apply the btrfs-progs patch
>   btrfs-progs: wipe copies of the stale superblock beyond -b size
> which shall find and wipe the non overwriting superblock during mkfs.
> 
> Signed-off-by: Anand Jain 
> ---
> v1->v2:
>   Do an explicit read for primary superblock. Drop kzalloc().
>   Fix split pr_err().
> 
>  fs/btrfs/volumes.c | 47 ---
>  1 file changed, 36 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e63723f23227..b099823f60d1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1198,40 +1198,65 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>  int btrfs_scan_one_device(const char *path, fmode_t flags, void *holder,
> struct btrfs_fs_devices **fs_devices_ret)
>  {
> + struct btrfs_super_block *disk_super_primary;
>   struct btrfs_super_block *disk_super;
>   struct btrfs_device *device;
>   struct block_device *bdev;
> + struct page *page_primary;
>   struct page *page;
>   int ret = 0;
>   u64 bytenr;
> + int i;
>  
> - /*
> -  * we would like to check all the supers, but that would make
> -  * a btrfs mount succeed after a mkfs from a different FS.
> -  * So, we need to add a special mount option to scan for
> -  * later supers, using BTRFS_SUPER_MIRROR_MAX instead
> -  */
> - bytenr = btrfs_sb_offset(0);
>   flags |= FMODE_EXCL;
>  
>   bdev = blkdev_get_by_path(path, flags, holder);
>   if (IS_ERR(bdev))
>   return PTR_ERR(bdev);
>  
> - ret = btrfs_read_disk_super(bdev, bytenr, &page, &disk_super);
> + /*
> +  * We would like to check all the supers and use one good copy,
> +  * but that would make a btrfs mount succeed after a mkfs from
> +  * a different FS.
> +  * So, we need to add a special mount option to scan for
> +  * later supers, using BTRFS_SUPER_MIRROR_MAX instead.
> +  * So, just validate if all copies of the superblocks are ok
> +  * and have the same fsid.
> +  */
> + bytenr = btrfs_sb_offset(0);
> + ret = btrfs_read_disk_super(bdev, bytenr, &page_primary,
> + &disk_super_primary);
>   if (ret < 0)
>   goto error_bdev_put;
>  
> + for (i = 1; i < BTRFS_SUPER_MIRROR_MAX; i++) {
> + bytenr = btrfs_sb_offset(i);
> + ret = btrfs_read_disk_super(bdev, bytenr, &page, &disk_super);
> + if (ret < 0) {
> + ret = 0;
> + continue;
> + }
> +
> + if (memcmp(disk_super_primary->fsid, disk_super->fsid,
> +BTRFS_FSID_SIZE)) {
> + pr_err("BTRFS (device %pg): superblock fsid mismatch, 
> primary %pU copy%d %pU",
> + bdev, disk_super_primary->fsid, i,
> + disk_super->fsid);
> + ret = -EINVAL;
> + btrfs_release_disk_super(page);
> + goto error_bdev_put;
> + }
> + btrfs_release_disk_super(page);
> + }
> +
>   mutex_lock(&uuid_mutex);
> - device = device_list_add(path, disk_super);
> + device = device_list_add(path, disk_super_primary);
>   if (IS_ERR(device))
>   ret = PTR_ERR(device);
>   else
>   *fs_devices_ret = device->fs_devices;
>   mutex_unlock(&uuid_mutex);
>  
> - btrfs_release_disk_super(page);
> ->  error_bdev_put:

You need a check whether page_primary is set here and release it,
otherwise you are leaking it. This needs to handle both the primary sb
read failure (where page_primary might not be set) as well as any FSID
mismatch from loop. (where it will be set)

>   blkdev_put(bdev, flags);
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/8] btrfs: cleanup btrfs_read_disk_super() to return std error

2018-03-28 Thread Nikolay Borisov


On 28.03.2018 02:39, Anand Jain wrote:
> The only caller btrfs_scan_one_device() sets -EINVAL for error from
> btrfs_read_disk_super(), so this patch returns -EINVAL from the latter
> function.
> 
> A preparatory patch to add csum check in btrfs_read_disk_super().
> 
> Signed-off-by: Anand Jain 

Reviewed-by: Nikolay Borisov 

> ---
> v1->v2:
>  Check btrfs_read_disk_super() to be more explicit ret < 0
> 
>  fs/btrfs/volumes.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 87d183accdb2..e63723f23227 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1154,23 +1154,23 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>  
>   /* make sure our super fits in the device */
>   if (bytenr + PAGE_SIZE >= i_size_read(bdev->bd_inode))
> - return 1;
> + return -EINVAL;
>  
>   /* make sure our super fits in the page */
>   if (sizeof(**disk_super) > PAGE_SIZE)
> - return 1;
> + return -EINVAL;
>  
>   /* make sure our super doesn't straddle pages on disk */
>   index = bytenr >> PAGE_SHIFT;
>   if ((bytenr + sizeof(**disk_super) - 1) >> PAGE_SHIFT != index)
> - return 1;
> + return -EINVAL;
>  
>   /* pull in the page with our super */
>   *page = read_cache_page_gfp(bdev->bd_inode->i_mapping,
>  index, GFP_KERNEL);
>  
>   if (IS_ERR_OR_NULL(*page))
> - return 1;
> + return -EINVAL;
>  
>   p = kmap(*page);
>  
> @@ -1180,7 +1180,7 @@ static int btrfs_read_disk_super(struct block_device 
> *bdev, u64 bytenr,
>   if (btrfs_super_bytenr(*disk_super) != bytenr ||
>   btrfs_super_magic(*disk_super) != BTRFS_MAGIC) {
>   btrfs_release_disk_super(*page);
> - return 1;
> + return -EINVAL;
>   }
>  
>   if ((*disk_super)->label[0] &&
> @@ -1218,10 +1218,9 @@ int btrfs_scan_one_device(const char *path, fmode_t 
> flags, void *holder,
>   if (IS_ERR(bdev))
>   return PTR_ERR(bdev);
>  
> - if (btrfs_read_disk_super(bdev, bytenr, &page, &disk_super)) {
> - ret = -EINVAL;
> + ret = btrfs_read_disk_super(bdev, bytenr, &page, &disk_super);
> + if (ret < 0)
>   goto error_bdev_put;
> - }
>  
>   mutex_lock(&uuid_mutex);
>   device = device_list_add(path, disk_super);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] btrfs: check if the fsid in the primary sb and copy sb are same

2018-03-28 Thread Anand Jain






+    for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) {
+    u64 bytenr = btrfs_sb_offset(i);
+
+    ret = btrfs_read_disk_super(bdev, bytenr, &page, &disk_super);
+    if (ret) {
+    if (i == 0)
+    goto error_kfree;
+    /* copy2 is optional */
+    ret = 0;
+    continue;
+    }
+
+    if (i == 0) {
+    memcpy(disk_super_primary, disk_super,
+   sizeof(*disk_super_primary));
+    btrfs_release_disk_super(page);
+    continue;


Doing the memcpy is enough here, the bottom of the loop already releases
the disk page and continues on the next iteration.


  The page map happens inside btrfs_read_disk_super(), we
  need unmap before going for the next superblock.


You already have  btrfs_release_disk_super(page); called at the end of
the iteration, right after the closing bracket for the else, see below...


  Ah. You meant only memcpy. Yes.
  This part of the code is completely changed in V2. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html