Re: Bad magic on superblock on /dev/sda at 65536

2018-04-06 Thread Anand Jain


 First superblock is zero-ed and its not some random corruption,
 most probably someone else other than btrfs used the disk when
 it was unmounted? Or if the partition (if any) was changed? or
 if its a SAN storge hope the LUN wasn't recreated at the storage end.

Thanks, Anand

On 04/07/2018 08:56 AM, Qu Wenruo wrote:



On 2018年04月07日 08:35, Ben Parsons wrote:

btrfs inspect-internal dump-super -Ffa /path


superblock: bytenr=65536, device=/dev/sda
-
csum_type0 (crc32c)
csum_size4
csum0x [DON'T MATCH]
bytenr0
flags0x0
magic [DON'T MATCH]
fsid----


First super block is completely gone.


label
generation0
root0
sys_array_size0
chunk_root_generation0
root_level0
chunk_root0
chunk_root_level0
log_root0
log_root_transid0
log_root_level0
total_bytes0
bytes_used0
sectorsize0
nodesize0
leafsize (deprecated)0
stripesize0
root_dir0
num_devices0
compat_flags0x0
compat_ro_flags0x0
incompat_flags0x0
cache_generation0
uuid_tree_generation0
dev_item.uuid----
dev_item.fsid---- [match]
dev_item.type0
dev_item.total_bytes0
dev_item.bytes_used0
dev_item.io_align0
dev_item.io_width0
dev_item.sector_size0
dev_item.devid0
dev_item.dev_group0
dev_item.seek_speed0
dev_item.bandwidth0
dev_item.generation0
sys_chunk_array[2048]:
backup_roots[4]:



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bad magic on superblock on /dev/sda at 65536

2018-04-06 Thread Ben Parsons
On 7 April 2018 at 10:56, Qu Wenruo  wrote:
>
>
> On 2018年04月07日 08:35, Ben Parsons wrote:
>>> btrfs inspect-internal dump-super -Ffa /path
>>
>> superblock: bytenr=65536, device=/dev/sda
>> -
>> csum_type0 (crc32c)
>> csum_size4
>> csum0x [DON'T MATCH]
>> bytenr0
>> flags0x0
>> magic [DON'T MATCH]
>> fsid----
>
> First super block is completely gone.
>
>> label
>> generation0
>> root0
>> sys_array_size0
>> chunk_root_generation0
>> root_level0
>> chunk_root0
>> chunk_root_level0
>> log_root0
>> log_root_transid0
>> log_root_level0
>> total_bytes0
>> bytes_used0
>> sectorsize0
>> nodesize0
>> leafsize (deprecated)0
>> stripesize0
>> root_dir0
>> num_devices0
>> compat_flags0x0
>> compat_ro_flags0x0
>> incompat_flags0x0
>> cache_generation0
>> uuid_tree_generation0
>> dev_item.uuid----
>> dev_item.fsid---- [match]
>> dev_item.type0
>> dev_item.total_bytes0
>> dev_item.bytes_used0
>> dev_item.io_align0
>> dev_item.io_width0
>> dev_item.sector_size0
>> dev_item.devid0
>> dev_item.dev_group0
>> dev_item.seek_speed0
>> dev_item.bandwidth0
>> dev_item.generation0
>> sys_chunk_array[2048]:
>> backup_roots[4]:
>>
>> superblock: bytenr=67108864, device=/dev/sda
>> -
>> csum_type65178 (INVALID)
>> csum_size32
>> csum
>> 0x24f2057c939118ef8cf9c276a05ff294223c99ec79e0b5cfe8ed795fe0a96715
>> [DON'T MATCH]
>> bytenr6481065229944367737
>
> Neither this backup superblock is valid.
>
>> flags0x527ffc9117fc11
>> ( WRITTEN |
>>   CHANGING_FSID |
>>   METADUMP_V2 |
>>   unknown flag: 0x527ff09117fc10 )
>> magic...;)... [DON'T MATCH]
>> fsid7011f2d5-0afe-5dc6-fce2-70e04b80939d
>> label
>> ;.."`8..8.x.?.N../zF..H...|h].i.C)j)...4d_..5.../...1.?.rr5.E..
>> generation7112314448606197494
> [snip]
>>
>>
>> superblock: bytenr=274877906944, device=/dev/sda
>> -
>> csum_type63651 (INVALID)
>> csum_size32
>> csum
>> 0x39db30683b693c4ff05c0073a1fa00db390c32963bae3c37
>> [DON'T MATCH]
>> bytenr6341162744368070656
>
> 2nd backup is also gone.
>
>> flags0x9731d639d900f69a
>> ( RELOC |
>>   CHANGING_FSID |
>>   SEEDING |
>>   unknown flag: 0x9731d630d900f698 )
>> magic;.<. [DON'T MATCH]
>> fside25b006e-a0f9-00df-390a-32923bae3c00
>> label.9~2.;.<
>> generation15852938566880484858
>> root17063576824041017
>> sys_array_size2956094009
>> chunk_root_generation9223372036858758203
>> root_level0
>> chunk_root2305935309654720512
>> chunk_root_level212
>> log_root12624074888383091845
>> log_root_transid9583660007048386619
>> log_root_level57
>> total_bytes14916195329329095163
>> bytes_used17051482582002489
> [snip]
>
> unfortunately, the filesystem seems to be totally corrupted.
>
>>
>>> Despite that, any extra info on how this happened is also appreciated,
>>> as similar problem happened twice, which means we need to pay attention
>>> on this.
>>
>> I dont know exactly what happened but here is some background:
>>
>> i am running Arch Linux on mainline kernel (4.16.0-1) and mesa-git
>> (101352.498d9d0f4d-1) as I have a rx vega 64
>
> Vega is nice, however I would wait until mesa in extra/ repo get updated.
>
>> over the past few months I have been getting hard locks when opening
>> certain programs (usually due to a bad versions of mesa-git /
>> llvm-git, etc).
>>
>> i was at the time trying to open the program "cheese" and when I did,
>> my machine hard locked and only alt+shift+sysrq+b got my screen to go
>> black - and then did nothing else, so I held the power button for 3
>> seconds and then my machine rebooted.
>
> Pretty common hard power reset.
>
>> looking at journalctl, there is a large stacktrace from kernel: amdgpu
>> (see attached).
>> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount.
>
> I'd say such corruption is pretty serious.
>
> And what's the profile of the btrfs? If metadata is raid1, we could at
> least try to recovery the superblock from the remaining disk.

I am not sure what the metadata was but the two disks had no parity
and just appeared as a single disk with total space of the two disks

how 

Re: Bad magic on superblock on /dev/sda at 65536

2018-04-06 Thread Qu Wenruo


On 2018年04月07日 08:35, Ben Parsons wrote:
>> btrfs inspect-internal dump-super -Ffa /path
> 
> superblock: bytenr=65536, device=/dev/sda
> -
> csum_type0 (crc32c)
> csum_size4
> csum0x [DON'T MATCH]
> bytenr0
> flags0x0
> magic [DON'T MATCH]
> fsid----

First super block is completely gone.

> label
> generation0
> root0
> sys_array_size0
> chunk_root_generation0
> root_level0
> chunk_root0
> chunk_root_level0
> log_root0
> log_root_transid0
> log_root_level0
> total_bytes0
> bytes_used0
> sectorsize0
> nodesize0
> leafsize (deprecated)0
> stripesize0
> root_dir0
> num_devices0
> compat_flags0x0
> compat_ro_flags0x0
> incompat_flags0x0
> cache_generation0
> uuid_tree_generation0
> dev_item.uuid----
> dev_item.fsid---- [match]
> dev_item.type0
> dev_item.total_bytes0
> dev_item.bytes_used0
> dev_item.io_align0
> dev_item.io_width0
> dev_item.sector_size0
> dev_item.devid0
> dev_item.dev_group0
> dev_item.seek_speed0
> dev_item.bandwidth0
> dev_item.generation0
> sys_chunk_array[2048]:
> backup_roots[4]:
> 
> superblock: bytenr=67108864, device=/dev/sda
> -
> csum_type65178 (INVALID)
> csum_size32
> csum
> 0x24f2057c939118ef8cf9c276a05ff294223c99ec79e0b5cfe8ed795fe0a96715
> [DON'T MATCH]
> bytenr6481065229944367737

Neither this backup superblock is valid.

> flags0x527ffc9117fc11
> ( WRITTEN |
>   CHANGING_FSID |
>   METADUMP_V2 |
>   unknown flag: 0x527ff09117fc10 )
> magic...;)... [DON'T MATCH]
> fsid7011f2d5-0afe-5dc6-fce2-70e04b80939d
> label
> ;.."`8..8.x.?.N../zF..H...|h].i.C)j)...4d_..5.../...1.?.rr5.E..
> generation7112314448606197494
[snip]
> 
> 
> superblock: bytenr=274877906944, device=/dev/sda
> -
> csum_type63651 (INVALID)
> csum_size32
> csum
> 0x39db30683b693c4ff05c0073a1fa00db390c32963bae3c37
> [DON'T MATCH]
> bytenr6341162744368070656

2nd backup is also gone.

> flags0x9731d639d900f69a
> ( RELOC |
>   CHANGING_FSID |
>   SEEDING |
>   unknown flag: 0x9731d630d900f698 )
> magic;.<. [DON'T MATCH]
> fside25b006e-a0f9-00df-390a-32923bae3c00
> label.9~2.;.<
> generation15852938566880484858
> root17063576824041017
> sys_array_size2956094009
> chunk_root_generation9223372036858758203
> root_level0
> chunk_root2305935309654720512
> chunk_root_level212
> log_root12624074888383091845
> log_root_transid9583660007048386619
> log_root_level57
> total_bytes14916195329329095163
> bytes_used17051482582002489
[snip]

unfortunately, the filesystem seems to be totally corrupted.

> 
>> Despite that, any extra info on how this happened is also appreciated,
>> as similar problem happened twice, which means we need to pay attention
>> on this.
> 
> I dont know exactly what happened but here is some background:
> 
> i am running Arch Linux on mainline kernel (4.16.0-1) and mesa-git
> (101352.498d9d0f4d-1) as I have a rx vega 64

Vega is nice, however I would wait until mesa in extra/ repo get updated.

> over the past few months I have been getting hard locks when opening
> certain programs (usually due to a bad versions of mesa-git /
> llvm-git, etc).
> 
> i was at the time trying to open the program "cheese" and when I did,
> my machine hard locked and only alt+shift+sysrq+b got my screen to go
> black - and then did nothing else, so I held the power button for 3
> seconds and then my machine rebooted.

Pretty common hard power reset.

> looking at journalctl, there is a large stacktrace from kernel: amdgpu
> (see attached).
> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount.

I'd say such corruption is pretty serious.

And what's the profile of the btrfs? If metadata is raid1, we could at
least try to recovery the superblock from the remaining disk.

And is there special mount options used here like discard?

Thanks,
Qu

> 
> Thanks,
> Ben
> 
> On 7 April 2018 at 09:44, Qu Wenruo  wrote:
>>
>>
>> On 2018年04月07日 01:03, David Sterba wrote:
>>> On Fri, Apr 06, 2018 at 11:32:34PM +1000, Ben Parsons wrote:
 Hi,

 I just had an unexpected restart and now my btrfs pool wont 

Re: Bad magic on superblock on /dev/sda at 65536

2018-04-06 Thread Ben Parsons
>btrfs inspect-internal dump-super -Ffa /path

superblock: bytenr=65536, device=/dev/sda
-
csum_type0 (crc32c)
csum_size4
csum0x [DON'T MATCH]
bytenr0
flags0x0
magic [DON'T MATCH]
fsid----
label
generation0
root0
sys_array_size0
chunk_root_generation0
root_level0
chunk_root0
chunk_root_level0
log_root0
log_root_transid0
log_root_level0
total_bytes0
bytes_used0
sectorsize0
nodesize0
leafsize (deprecated)0
stripesize0
root_dir0
num_devices0
compat_flags0x0
compat_ro_flags0x0
incompat_flags0x0
cache_generation0
uuid_tree_generation0
dev_item.uuid----
dev_item.fsid---- [match]
dev_item.type0
dev_item.total_bytes0
dev_item.bytes_used0
dev_item.io_align0
dev_item.io_width0
dev_item.sector_size0
dev_item.devid0
dev_item.dev_group0
dev_item.seek_speed0
dev_item.bandwidth0
dev_item.generation0
sys_chunk_array[2048]:
backup_roots[4]:

superblock: bytenr=67108864, device=/dev/sda
-
csum_type65178 (INVALID)
csum_size32
csum
0x24f2057c939118ef8cf9c276a05ff294223c99ec79e0b5cfe8ed795fe0a96715
[DON'T MATCH]
bytenr6481065229944367737
flags0x527ffc9117fc11
( WRITTEN |
  CHANGING_FSID |
  METADUMP_V2 |
  unknown flag: 0x527ff09117fc10 )
magic...;)... [DON'T MATCH]
fsid7011f2d5-0afe-5dc6-fce2-70e04b80939d
label
;.."`8..8.x.?.N../zF..H...|h].i.C)j)...4d_..5.../...1.?.rr5.E..
generation7112314448606197494
root10814850762639476856
sys_array_size774240540
chunk_root_generation17716845740647334363
root_level123
chunk_root9039947042838677183
chunk_root_level7
log_root11588818316475425470
log_root_transid1970336570145243359
log_root_level255
total_bytes5626579194689281529
bytes_used10936644453437477355
sectorsize2711280660
nodesize2105571139
leafsize (deprecated)2624302184
stripesize3748622636
root_dir12031892002480545941
num_devices1887426113366288834
compat_flags0x986e28a7d6a0eedf
compat_ro_flags0x67bf5c50764fabec
( unknown flag: 0x67bf5c50764fabec )
incompat_flags0xb6351e01f2cbb867
( MIXED_BACKREF |
  DEFAULT_SUBVOL |
  MIXED_GROUPS |
  BIG_METADATA |
  EXTENDED_IREF |
  unknown flag: 0xb6351e01f2cbb800 )
cache_generation16803576046500197625
uuid_tree_generation9151978410922426283
dev_item.uuid3eff5038-c5ed-7c44-b841-bfcaefa127ff
dev_item.fsidae705a3f-dcee-f7b0-9331-410c837e0ce8 [DON'T MATCH]
dev_item.type20862153328580
dev_item.total_bytes4033499057947390500
dev_item.bytes_used14123877185665736413
dev_item.io_align356589416
dev_item.io_width2238618352
dev_item.sector_size33234003
dev_item.devid4647837691355179893
dev_item.dev_group3237710941
dev_item.seek_speed159
dev_item.bandwidth35
dev_item.generation13692119449717181535
sys_chunk_array[2048]:
ERROR: sys_array_size 774240540 shouldn't exceed 2048 bytes
backup_roots[4]:
backup 0:
backup_tree_root:9098106006959284508gen:
3422959743402530751level: 13
backup_chunk_root:8653729137999036921gen:
9805354230117732311level: 13
backup_extent_root:2227142819947659262gen:
16710030944005764576level: 250
backup_fs_root:17250344053212875712gen:
11109972073411492560level: 195
backup_dev_root:10813366787773230487gen:
4733558095364468453level: 64
backup_csum_root:15995327235362395775gen:
17585993390550392957level: 223
backup_total_bytes:187327539044806356
backup_bytes_used:11088092626626268919
backup_num_devices:1646767651564978160

backup 1:
backup_tree_root:6132816855654833723gen:
7933636135630997331level: 175
backup_chunk_root:4500476885298477552gen:
17588667198258184431level: 49
backup_extent_root:17341284452428219997gen:
6122825786466476477level: 27
backup_fs_root:4578178975399312410gen:
4558088662074948842level: 229
backup_dev_root:17378404189136548866gen:
8942807062595821441level: 3
backup_csum_root:13954259417814538534gen:
17582753360836298151level: 135
backup_total_bytes:

Re: Bad magic on superblock on /dev/sda at 65536

2018-04-06 Thread Qu Wenruo


On 2018年04月07日 01:03, David Sterba wrote:
> On Fri, Apr 06, 2018 at 11:32:34PM +1000, Ben Parsons wrote:
>> Hi,
>>
>> I just had an unexpected restart and now my btrfs pool wont mount.
>> The error on mount is:
>>
>> "ERROR: unsupported checksum algorithm 41700"
>>
>> and when running
>>
>> btrfs inspect-internal dump-super /dev/sda
>> ERROR: bad magic on superblock on /dev/sda at 65536
>>
>> I saw a thread in the mailing list about it:
>> https://www.spinics.net/lists/linux-btrfs/msg75326.html
>> However I am told on IRC that Qu fixed it using magic.
>>
>> Any help would be much appreciated.
> 
> In the previous report, there were 2 isolated areas of superblock
> damaged. Please post output of
> 
>   btrfs inspect dump-super /path

And don't forget -Ffa option.
-F to force btrfs-progs to recognize it as btrfs no matter what the magic is
-f shows all data so we could find all corruption and fix them if possible
-a shows all backup superblocks, and if some backup is good, "btrfs
rescue super-recovery" mentioned by Nikolay would be the best solution.

Despite that, any extra info on how this happened is also appreciated,
as similar problem happened twice, which means we need to pay attention
on this.

Thanks,
Qu

Thanks,
Qu

> 
> so we can see if it's a similar issue.
> 
> In case it is, there's a tool in the btrfs-progs repo that can fix the
> individual values.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] btrfs: Validate child tree block's level and first key

2018-04-06 Thread Qu Wenruo


On 2018年04月07日 01:07, David Sterba wrote:
> On Mon, Apr 02, 2018 at 06:47:32PM +0800, Qu Wenruo wrote:
>> On 2018年03月28日 23:49, David Sterba wrote:
>>> On Tue, Mar 27, 2018 at 08:44:19PM +0800, Qu Wenruo wrote:
 We have several reports about node pointer points to incorrect child
 tree blocks, which could have even wrong owner and level but still with
 valid generation and checksum.

 Although btrfs check could handle it and print error message like:
 leaf parent key incorrect 60670574592

 Kernel doesn't have enough check on this type of corruption correctly.
 At least add such check to read_tree_block() and btrfs_read_buffer(),
 where we need two new parameters @level and @first_key to verify the
 child tree block.

 The new @level check is mandatory and all call sites are already
 modified to extract expected level from its call chain.

 While @first_key is optional, the following call sites are skipping such
 check:
 1) Root node/leaf
As ROOT_ITEM doesn't contain the first key, skip @first_key check.
 2) Direct backref
Only parent bytenr and level is known and we need to resolve the key
all by ourselves, skip @first_key check.

 Another note of this verification is, it needs extra info from nodeptr
 or ROOT_ITEM, so it can't fit into current tree-checker framework, which
 is limited to node/leaf boundary.

 Signed-off-by: Qu Wenruo 
 ---
 changelog:
 v2:
   Make @level check mandatory, suggesed by Jeff and Nikolay.
   Change parameter order as @level is now mandatory, put it in front of
   @first_key.
   Change verify_parent_level() to verify_key_level() to avoid confusion
   on the @level parameter.
   Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging.
>>>
>>> That's much better overall, thanks. Adding it to next.
>>
>> Nikolay reported a case where @first_key check seems to cause false alert.
>> (Although my xfstests check hasn't exposed it yet)
>>
>> Please discard this patch since it has the possibility to cause false
>> alert for btrfs core functionality.
> 
> Too late, the patch is in master now, so we need to fix it.

Seems to be a very rare race in tree operations, still under investigation.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount

2018-04-06 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 16.7330)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Failed to apply! Possible dependencies:
509cdd5c938a ("btrfs: add sanity check when resuming balance after mount")

v4.4.126: Failed to apply! Possible dependencies:
509cdd5c938a ("btrfs: add sanity check when resuming balance after mount")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] bitmap: fix memset optimization on big-endian systems

2018-04-06 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 2a98dc028f91 include/linux/bitmap.h: turn bitmap_set and 
bitmap_clear into memset when possible.

The bot has also determined it's probably a bug fixing patch. (score: 65.4067)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!

--
Thanks,
Sasha--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace

2018-04-06 Thread Sasha Levin
Hi,

[This is an automated email]

This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 34.4419)

The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, 
v4.4.126.

v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Failed to apply! Possible dependencies:
2799d90f3887 ("btrfs: add proper safety check before resuming dev-replace")

v4.4.126: Failed to apply! Possible dependencies:
2799d90f3887 ("btrfs: add proper safety check before resuming dev-replace")


Please let us know if you'd like to have this patch included in a stable tree.

--
Thanks,
Sasha--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: do not abort transaction when failing to insert hole extent

2018-04-06 Thread Liu Bo
On Fri, Apr 6, 2018 at 6:21 AM, David Sterba  wrote:
> On Thu, Apr 05, 2018 at 11:58:16AM -0700, Liu Bo wrote:
>> On Thu, Apr 5, 2018 at 9:48 AM, David Sterba  wrote:
>> > On Sat, Mar 31, 2018 at 06:11:55AM +0800, Liu Bo wrote:
>> >> This is running in a typical write path, not inside a critical path
>> >> where we have to abort the running transaction, so it's OK to return
>> >> errors to callers and eventually to userspace.
>> >
>> > I'm not sure this is entierly correct, several other places do not abort
>> > after btrfs_drop_extents as there's nothing that would leave the
>> > structres in some half-state.
>> >
>> >> Signed-off-by: Liu Bo 
>> >> ---
>> >>  fs/btrfs/inode.c | 5 +
>> >>  1 file changed, 1 insertion(+), 4 deletions(-)
>> >>
>> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> >> index c7b75dd..b9310f8 100644
>> >> --- a/fs/btrfs/inode.c
>> >> +++ b/fs/btrfs/inode.c
>> >> @@ -4939,16 +4939,13 @@ static int maybe_insert_hole(struct btrfs_root 
>> >> *root, struct inode *inode,
>> >>
>> >>   ret = btrfs_drop_extents(trans, root, inode, offset, offset + len, 
>> >> 1);
>> >>   if (ret) {
>> >> - btrfs_abort_transaction(trans, ret);
>> >>   btrfs_end_transaction(trans);
>> >>   return ret;
>> >>   }
>> >>
>> >>   ret = btrfs_insert_file_extent(trans, root, 
>> >> btrfs_ino(BTRFS_I(inode)),
>> >>   offset, 0, 0, len, 0, len, 0, 0, 0);
>> >
>> > But here the extents have been already dropped and missing to insert the
>> > items does not seem to lead to a consistent state.
>> >
>> > It's possible that I'm missing something. In a call path that can be
>> > safely rolled back even with a started transaction, we don't need to
>> > abort in all cases. But if the rollback requires some non-trivial
>> > modifications, I don't see options how to avoid the abort.
>> >
>> > __btrfs_drop_extents does a lot of state changes and can itself fail
>> > in the middle of dropping the range, aborting looks like the safest
>> > option.
>> >
>>
>> As maybe_insert_hole is only called by btrfs_cont_expand here, which
>> means it's a really hole, I don't expect drop_extents would drop
>> anything, we can remove this drop_extents and put an assert after
>> btrfs_insert_file_extent for checking EEXIST.
>
> Sounds good.
>

Let me make a v2 and have a fstests run.

thanks,
liubo

>> It's different from punch hole where we need to explicitly drop an
>> actual extent and replace it with a hole range.
>
> Right, that's what I didn't see at first.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] btrfs: Validate child tree block's level and first key

2018-04-06 Thread David Sterba
On Mon, Apr 02, 2018 at 06:47:32PM +0800, Qu Wenruo wrote:
> On 2018年03月28日 23:49, David Sterba wrote:
> > On Tue, Mar 27, 2018 at 08:44:19PM +0800, Qu Wenruo wrote:
> >> We have several reports about node pointer points to incorrect child
> >> tree blocks, which could have even wrong owner and level but still with
> >> valid generation and checksum.
> >>
> >> Although btrfs check could handle it and print error message like:
> >> leaf parent key incorrect 60670574592
> >>
> >> Kernel doesn't have enough check on this type of corruption correctly.
> >> At least add such check to read_tree_block() and btrfs_read_buffer(),
> >> where we need two new parameters @level and @first_key to verify the
> >> child tree block.
> >>
> >> The new @level check is mandatory and all call sites are already
> >> modified to extract expected level from its call chain.
> >>
> >> While @first_key is optional, the following call sites are skipping such
> >> check:
> >> 1) Root node/leaf
> >>As ROOT_ITEM doesn't contain the first key, skip @first_key check.
> >> 2) Direct backref
> >>Only parent bytenr and level is known and we need to resolve the key
> >>all by ourselves, skip @first_key check.
> >>
> >> Another note of this verification is, it needs extra info from nodeptr
> >> or ROOT_ITEM, so it can't fit into current tree-checker framework, which
> >> is limited to node/leaf boundary.
> >>
> >> Signed-off-by: Qu Wenruo 
> >> ---
> >> changelog:
> >> v2:
> >>   Make @level check mandatory, suggesed by Jeff and Nikolay.
> >>   Change parameter order as @level is now mandatory, put it in front of
> >>   @first_key.
> >>   Change verify_parent_level() to verify_key_level() to avoid confusion
> >>   on the @level parameter.
> >>   Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging.
> > 
> > That's much better overall, thanks. Adding it to next.
> 
> Nikolay reported a case where @first_key check seems to cause false alert.
> (Although my xfstests check hasn't exposed it yet)
> 
> Please discard this patch since it has the possibility to cause false
> alert for btrfs core functionality.

Too late, the patch is in master now, so we need to fix it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bad magic on superblock on /dev/sda at 65536

2018-04-06 Thread David Sterba
On Fri, Apr 06, 2018 at 11:32:34PM +1000, Ben Parsons wrote:
> Hi,
> 
> I just had an unexpected restart and now my btrfs pool wont mount.
> The error on mount is:
> 
> "ERROR: unsupported checksum algorithm 41700"
> 
> and when running
> 
> btrfs inspect-internal dump-super /dev/sda
> ERROR: bad magic on superblock on /dev/sda at 65536
> 
> I saw a thread in the mailing list about it:
> https://www.spinics.net/lists/linux-btrfs/msg75326.html
> However I am told on IRC that Qu fixed it using magic.
> 
> Any help would be much appreciated.

In the previous report, there were 2 isolated areas of superblock
damaged. Please post output of

btrfs inspect dump-super /path

so we can see if it's a similar issue.

In case it is, there's a tool in the btrfs-progs repo that can fix the
individual values.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bad magic on superblock on /dev/sda at 65536

2018-04-06 Thread Nikolay Borisov


On  6.04.2018 16:32, Ben Parsons wrote:
> Hi,
> 
> I just had an unexpected restart and now my btrfs pool wont mount.
> The error on mount is:
> 
> "ERROR: unsupported checksum algorithm 41700"
> 
> and when running
> 
> btrfs inspect-internal dump-super /dev/sda
> ERROR: bad magic on superblock on /dev/sda at 65536
> 
> I saw a thread in the mailing list about it:
> https://www.spinics.net/lists/linux-btrfs/msg75326.html
> However I am told on IRC that Qu fixed it using magic.
> 
> Any help would be much appreciated.

Try recovering the super block from one of the backup copies via
"btrfs rescue super-recover /dev/sda"

> 
> Thanks,
> Ben
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs progs release 4.16

2018-04-06 Thread David Sterba
Hi,

btrfs-progs version 4.16 have been released.

This version brings the new library that should help applications to use the
btrfs functionality in a more convenient way than plain ioctls. And has python
bindings. The rest are bugfixes and small enhancements.

The library is hosted in the progs git because of the close dependency and
maintained primarily by Omar Sandoval.

Changes:

  * libbtrfsutil - new LGPL library to wrap userspace functionality
* several 'btrfs' commands converted to use it:
  * properties
  * filesystem sync
  * subvolume set-default/get-default/delete/show/sync
* python bindings, tests
  * build
* use configured pkg-config path
* CI: add python, musl/clang, built dependencies caching
* convert: build fix for e2fsprogs 1.44+
* don't install library links with wrong permissions
  * fixes
* prevent incorrect use of subvol_strip_mountpoint
* dump-super: don't verify csum for unknown type
* convert: fix inline extent creation condition
  * check:
* lowmem: fix false alert for 'data extent backref lost for snapshot'
* lowmem: fix false alert for orphan inode
* lowmem: fix false alert for shared prealloc extents
  * mkfs:
* add UUID and otime to root of FS_TREE - with the uuid, snapshots will
  be now linked to the toplevel subvol by the parent UUID
* don't follow symlinks when calculating size
* pre-create the UUID tree
* fix --rootdir with selinux enabled
  * dump-tree: add option to print only children nodes of a given block
  * image: handle missing device for RAID1
  * other
* new tests
* test script cleanups (quoting, helpers)
* tool to edit superblocks
* updated docs

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

Shortlog:

Axel Burri (1):
  btrfs-progs: prevent incorrect use of subvol_strip_mountpoint

David Sterba (24):
  btrfs-progs: build: configure.ac hard-codes the pkg-config command
  btrfs-progs: tests: add test for send -p on 2 mont points
  btrfs-progs: tests: add helper to log pipe stdout
  btrfs-progs: ci: add python dependencies for libbtrfsutil
  libbtrfsutil: add stub for reallocarray
  btrfs-progs: ci: cache built dependencies
  libbtrfsutils: add python-devel detection
  btrfs-progs: ci: update test image packages - add clang and python
  btrfs-progs: ci: enable clang and python for musl build test
  btrfs-progs: docs: add section about filesystem limits to btrfs(5)
  btrfs-progs: tests: fix source path for testsuite
  btrfs-progs: tests: don't use fallocate in mkfs/014-rootdir-inline-extent
  btrfs-progs: tests: mkfs fills uuid and otime for FS_TREE
  btrfs-progs: tests: update README, images, coding style
  btrfs-progs: tests: convert/014 use shell builtin for generating content
  btrfs-progs: tests: add shell quoting to fuzz test scripts
  btrfs-progs: tests: remove trivial use of local variables
  btrfs-progs: tests: add shell quotes to mkfs test scripts
  btrfs-progs: tests: add shell quotes to misc test scripts
  btrfs-progs: add tool to edit super blocks
  btrfs-progs: mkfs: precreate the uuid tree
  btrfs-progs: docs: fix typos
  btrfs-progs: update CHANGES for v4.16
  Btrfs progs v4.16

Filipe Manana (2):
  Btrfs-progs: check, fix false error reports for shared prealloc extents
  Btrfs-progs: add fsck test for filesystem with shared prealloc extents

Gu Jinxiang (1):
  btrfs-progs: Remove unused parameter

Lu Fengqi (4):
  btrfs-progs: check/lowmem: Fix the incorrect error message of 
check_extent_data_item
  btrfs-progs: check/lowmem: Fix false alert of data extent backref lost 
for snapshot
  btrfs-progs: fsck-tests: Introduce test case with keyed data backref with 
the extent offset
  btrfs-progs: build: modify cscope/ctags rules to include directories such 
as check

Misono Tomohiro (2):
  btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of, FS_TREE
  btrfs-progs: mkfs rootdir: use lgetxattr() not to follow a symbolic link

Misono, Tomohiro (1):
  btrfs-progs: remove BTRFS_CRC32_SIZE definition

Nicholas D Steeves (1):
  btrfs-progs: Fix typos in docs and user-facing strings

Nikolay Borisov (1):
  btrfs-progs: Beautify owner when printing leaf/nodes

Omar Sandoval (30):
  Add libbtrfsutil
  libbtrfsutil: add Python bindings
  libbtrfsutil: add qgroup inheritance helpers
  libbtrfsutil: add filesystem sync helpers
  libbtrfsutil: fix Python tests
  libbtrfsutil: copy in Btrfs UAPI headers
  libbtrfsutil: add btrfs_util_is_subvolume() and btrfs_util_subvolume_id()
  libbtrfsutil: add btrfs_util_create_subvolume()
  libbtrfsutil: add btrfs_util_subvolume_path()
  libbtrfsutil: add btrfs_util_subvolume_info()
  libbtrfsutil: add btrfs_util_[gs]et_read_only()
  

Bad magic on superblock on /dev/sda at 65536

2018-04-06 Thread Ben Parsons
Hi,

I just had an unexpected restart and now my btrfs pool wont mount.
The error on mount is:

"ERROR: unsupported checksum algorithm 41700"

and when running

btrfs inspect-internal dump-super /dev/sda
ERROR: bad magic on superblock on /dev/sda at 65536

I saw a thread in the mailing list about it:
https://www.spinics.net/lists/linux-btrfs/msg75326.html
However I am told on IRC that Qu fixed it using magic.

Any help would be much appreciated.

Thanks,
Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] btrfs: Allow rmdir(2) to delete a subvolume

2018-04-06 Thread David Sterba
On Fri, Mar 30, 2018 at 03:16:47PM +0900, Misono Tomohiro wrote:
> This patch changes the behavior of rmdir(2) to allow it to delete
> an empty subvolume by default, unless it is not a default subvolume
> and send is not in progress.
> 
> New function btrfs_delete_subvolume() is almost equal to the second half
> of btrfs_ioctl_snap_destroy(). This function requires inode_lock for both
> @dir and inode of @dentry. For rmdir(2) it is already acquired in vfs
> layer before calling btrfs_rmdir().
> 
> Note that while a non-privileged user cannot delete a read-only subvolume
> by "btrfs subvolume delete" when user_subvol_rm_allowd mount option is
> enabled, rmdir(2) can delete an empty read-only subvolume.
> 
> Tested-by: Goffredo Baroncelli 
> Signed-off-by: Tomohiro Misono 
> ---
>  fs/btrfs/inode.c | 141 
> ++-
>  1 file changed, 140 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index db66fa4fede6..84dbb9cafd6b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -4387,6 +4387,145 @@ noinline int may_destroy_subvol(struct btrfs_root 
> *root)
>   return ret;
>  }
>  
> +static int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry)
> +{
> + struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb);
> + struct btrfs_root *root = BTRFS_I(dir)->root;
> + struct inode *inode = d_inode(dentry);
> + struct btrfs_root *dest = BTRFS_I(inode)->root;
> + struct btrfs_trans_handle *trans;
> + struct btrfs_block_rsv block_rsv;
> + u64 root_flags;
> + u64 qgroup_reserved;
> + int ret;
> + int err;
> +
> + /*
> +  * Don't allow to delete a subvolume with send in progress. This is
> +  * inside the i_mutex so the error handling that has to drop the bit
> +  * again is not run concurrently.
> +  */
> + spin_lock(>root_item_lock);
> + root_flags = btrfs_root_flags(>root_item);
> + if (dest->send_in_progress == 0) {
> + btrfs_set_root_flags(>root_item,
> + root_flags | BTRFS_ROOT_SUBVOL_DEAD);
> + spin_unlock(>root_item_lock);
> + } else {
> + spin_unlock(>root_item_lock);
> + btrfs_warn(fs_info,
> +"Attempt to delete subvolume %llu during send",
> +dest->root_key.objectid);
> + err = -EPERM;
> + return err;
> + }
> +
> + down_write(_info->subvol_sem);
> +
> + err = may_destroy_subvol(dest);
> + if (err)
> + goto out_up_write;
> +
> + btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP);
> + /*
> +  * One for dir inode, two for dir entries, two for root
> +  * ref/backref.
> +  */
> + err = btrfs_subvolume_reserve_metadata(root, _rsv,
> +5, _reserved, true);
> + if (err)
> + goto out_up_write;
> +
> + trans = btrfs_start_transaction(root, 0);
> + if (IS_ERR(trans)) {
> + err = PTR_ERR(trans);
> + goto out_release;
> + }
> + trans->block_rsv = _rsv;
> + trans->bytes_reserved = block_rsv.size;
> +
> + btrfs_record_snapshot_destroy(trans, BTRFS_I(dir));
> +
> + ret = btrfs_unlink_subvol(trans, root, dir,
> + dest->root_key.objectid,
> + dentry->d_name.name,
> + dentry->d_name.len);
> + if (ret) {
> + err = ret;
> + btrfs_abort_transaction(trans, ret);
> + goto out_end_trans;
> + }
> +
> + btrfs_record_root_in_trans(trans, dest);
> +
> + memset(>root_item.drop_progress, 0,
> + sizeof(dest->root_item.drop_progress));
> + dest->root_item.drop_level = 0;
> + btrfs_set_root_refs(>root_item, 0);
> +
> + if (!test_and_set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, >state)) {
> + ret = btrfs_insert_orphan_item(trans,
> + fs_info->tree_root,
> + dest->root_key.objectid);
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + err = ret;
> + goto out_end_trans;
> + }
> + }
> +
> + ret = btrfs_uuid_tree_rem(trans, fs_info, dest->root_item.uuid,
> +   BTRFS_UUID_KEY_SUBVOL,
> +   dest->root_key.objectid);
> + if (ret && ret != -ENOENT) {
> + btrfs_abort_transaction(trans, ret);
> + err = ret;
> + goto out_end_trans;
> + }
> + if (!btrfs_is_empty_uuid(dest->root_item.received_uuid)) {
> + ret = btrfs_uuid_tree_rem(trans, fs_info,
> +   dest->root_item.received_uuid,
> +   

Re: [PATCH] Btrfs: fix loss of prealloc extents past i_size after fsync log replay

2018-04-06 Thread David Sterba
On Thu, Apr 05, 2018 at 10:55:12PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Currently if we allocate extents beyond an inode's i_size (through the
> fallocate system call) and then fsync the file, we log the extents but
> after a power failure we replay them and then immediately drop them.
> This behaviour happens since about 2009, commit c71bf099abdd ("Btrfs:
> Avoid orphan inodes cleanup while replaying log"), because it marks
> the inode as an orphan instead of dropping any extents beyond i_size
> before replaying logged extents, so after the log replay, and while
> the mount operation is still ongoing, we find the inode marked as an
> orphan and then perform a truncation (drop extents beyond the inode's
> i_size). Because the processing of orphan inodes is still done
> right after replaying the log and before the mount operation finishes,
> the intention of that commit does not make any sense (at least as
> of today). However reverting that behaviour is not enough, because
> we can not simply discard all extents beyond i_size and then replay
> logged extents, because we risk dropping extents beyond i_size created
> in past transactions, for example:
> 
>   add prealloc extent beyond i_size
>   fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
>   transaction commit
>   add another prealloc extent beyond i_size
>   fsync - triggers the fast fsync path
>   power failure
> 
> In that scenario, we would drop the first extent and then replay the
> second one. To fix this just make sure that all prealloc extents
> beyond i_size are logged, and if we find too many (which is far from
> a common case), fallback to a full transaction commit (like we do when
> logging regular extents in the fast fsync path).
> 
> Trivial reproducer:
> 
>  $ mkfs.btrfs -f /dev/sdb
>  $ mount /dev/sdb /mnt
>  $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
>  $ sync
>  $ xfs_io -c "falloc -k 256K 1M" /mnt/foo
>  $ xfs_io -c "fsync" /mnt/foo
>  
> 
>  # mount to replay log
>  $ mount /dev/sdb /mnt
>  # at this point the file only has one extent, at offset 0, size 256K
> 
> A test case for fstests follows soon, covering multiple scenarios that
> involve adding prealloc extents with previous shrinking truncates and
> without such truncates.
> 
> Fixes: c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log")
> Signed-off-by: Filipe Manana 

Added to next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: do not abort transaction when failing to insert hole extent

2018-04-06 Thread David Sterba
On Thu, Apr 05, 2018 at 11:58:16AM -0700, Liu Bo wrote:
> On Thu, Apr 5, 2018 at 9:48 AM, David Sterba  wrote:
> > On Sat, Mar 31, 2018 at 06:11:55AM +0800, Liu Bo wrote:
> >> This is running in a typical write path, not inside a critical path
> >> where we have to abort the running transaction, so it's OK to return
> >> errors to callers and eventually to userspace.
> >
> > I'm not sure this is entierly correct, several other places do not abort
> > after btrfs_drop_extents as there's nothing that would leave the
> > structres in some half-state.
> >
> >> Signed-off-by: Liu Bo 
> >> ---
> >>  fs/btrfs/inode.c | 5 +
> >>  1 file changed, 1 insertion(+), 4 deletions(-)
> >>
> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >> index c7b75dd..b9310f8 100644
> >> --- a/fs/btrfs/inode.c
> >> +++ b/fs/btrfs/inode.c
> >> @@ -4939,16 +4939,13 @@ static int maybe_insert_hole(struct btrfs_root 
> >> *root, struct inode *inode,
> >>
> >>   ret = btrfs_drop_extents(trans, root, inode, offset, offset + len, 
> >> 1);
> >>   if (ret) {
> >> - btrfs_abort_transaction(trans, ret);
> >>   btrfs_end_transaction(trans);
> >>   return ret;
> >>   }
> >>
> >>   ret = btrfs_insert_file_extent(trans, root, 
> >> btrfs_ino(BTRFS_I(inode)),
> >>   offset, 0, 0, len, 0, len, 0, 0, 0);
> >
> > But here the extents have been already dropped and missing to insert the
> > items does not seem to lead to a consistent state.
> >
> > It's possible that I'm missing something. In a call path that can be
> > safely rolled back even with a started transaction, we don't need to
> > abort in all cases. But if the rollback requires some non-trivial
> > modifications, I don't see options how to avoid the abort.
> >
> > __btrfs_drop_extents does a lot of state changes and can itself fail
> > in the middle of dropping the range, aborting looks like the safest
> > option.
> >
> 
> As maybe_insert_hole is only called by btrfs_cont_expand here, which
> means it's a really hole, I don't expect drop_extents would drop
> anything, we can remove this drop_extents and put an assert after
> btrfs_insert_file_extent for checking EEXIST.

Sounds good.

> It's different from punch hole where we need to explicitly drop an
> actual extent and replace it with a hole range.

Right, that's what I didn't see at first.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted

2018-04-06 Thread David Sterba
On Thu, Apr 05, 2018 at 11:45:55AM -0700, Liu Bo wrote:
> On Thu, Apr 5, 2018 at 9:11 AM, David Sterba  wrote:
> > On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote:
> >> Currently if some fatal errors occur, like all IO get -EIO, resources
> >> would be cleaned up when
> >> a) transaction is being committed or
> >> b) BTRFS_FS_STATE_ERROR is set
> >>
> >> However, in some rare cases, resources may be left alone after transaction
> >> gets aborted and umount may run into some ASSERT(), e.g.
> >> ASSERT(list_empty(_group->dirty_list));
> >>
> >> For case a), in btrfs_commit_transaciton(), there're several places at the
> >> beginning where we just call btrfs_end_transaction() without cleaning up
> >> resources.  For case b), it is possible that the trans handle doesn't have
> >> any dirty stuff, then only trans hanlde is marked as aborted while
> >> BTRFS_FS_STATE_ERROR is not set, so resources remain in memory.
> >>
> >> This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that
> >> all resources won't stay in memory after umount.
> >>
> >> Signed-off-by: Liu Bo 
> >
> > Is it possible that the following stactrace could be caused by the
> > missing check? It roughly matches what you describe (ie. close_ctree and
> > unreleased resources). This is from generic/475, that does some error
> > injection:
> >
> > [16991.455178] WARNING: CPU: 6 PID: 23518 at fs/btrfs/extent-tree.c:9896 
> > btrfs_free_block_groups+0x2c8/0x420 [btrfs]
> >
> 
> Hmm...I don't think so, while running 475, the one I got pretty stable is
> ASSERT(list_empty(_group->dirty_list));

There's a number of things that 475 catches so this might depend on
timing, memory, disks etc.

> And I did see this warning a few times, but I thought that was due to
> the new flag (ZERO) of fallocate for which we had fixes from Filipe,
> not sure if they've been merged?

Merged to 4.15:

* f27451f22996687 Btrfs: add support for fallocate's zero range operation
* 9f13ce743b1bd4e Btrfs: fix missing inode i_size update after zero range 
operation
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: mkfs rootdir: use lgetxattr() not to follow a symbolic link

2018-04-06 Thread David Sterba
On Mon, Apr 02, 2018 at 10:59:31AM +0900, Misono Tomohiro wrote:
> mkfs-test 016 "rootdir-bad-symbolic-link" fails when selinux is enabled.
> This is because add_xattr_item() uses getxattr() and tries to follow a
> bad symbolic link for selinux item, which causes ENOENT error.
> 
> The line above already uses llistxattr() for getting list of xattr in
> order not to follow a symbolic link, so just use lgetxattr() too.
> 
> Signed-off-by: Tomohiro Misono 

Applied and added to 4.16, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: build: Do not use cp -a to install files

2018-04-06 Thread David Sterba
On Wed, Apr 04, 2018 at 04:04:59PM +0200, Peter Kjellerstedt wrote:
> Using cp -a to install files will preserve the ownership of the
> original files (if possible), which is typically not wanted. E.g., if
> the files were built by a normal user, but are being installed by
> root, then the installed files would maintain the UIDs/GIDs of the
> user that built the files rather than be owned by root.
> 
> Signed-off-by: Peter Kjellerstedt 

Applied and added to 4.16, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: generic test for fsync after fallocate

2018-04-06 Thread fdmanana
From: Filipe Manana 

Test that fsync operations preserve extents allocated with fallocate(2)
that are placed beyond a file's size.

This test is motivated by a bug found in btrfs where unwritten extents
beyond the inode's i_size were not preserved after a fsync and power
failure. The btrfs bug is fixed by the following patch for the linux
kernel:

 "Btrfs: fix loss of prealloc extents past i_size after fsync log replay"

Signed-off-by: Filipe Manana 
---
 tests/generic/482 | 118 ++
 tests/generic/482.out |  10 +
 tests/generic/group   |   1 +
 3 files changed, 129 insertions(+)
 create mode 100755 tests/generic/482
 create mode 100644 tests/generic/482.out

diff --git a/tests/generic/482 b/tests/generic/482
new file mode 100755
index ..43bbc913
--- /dev/null
+++ b/tests/generic/482
@@ -0,0 +1,118 @@
+#! /bin/bash
+# FSQA Test No. 482
+#
+# Test that fsync operations preserve extents allocated with fallocate(2) that
+# are placed beyond a file's size.
+#
+#---
+#
+# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   _cleanup_flakey
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmflakey
+. ./common/punch
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_dm_target flakey
+_require_xfs_io_command "falloc" "-k"
+_require_xfs_io_command "fiemap"
+
+rm -f $seqres.full
+
+_scratch_mkfs >>$seqres.full 2>&1
+_require_metadata_journaling $SCRATCH_DEV
+_init_flakey
+_mount_flakey
+
+# Create our test files.
+$XFS_IO_PROG -f -c "pwrite -S 0xea 0 256K" $SCRATCH_MNT/foo >/dev/null
+
+# Create a file with many extents. We later want to shrink truncate it and
+# add a prealloc extent beyond its new size.
+for ((i = 1; i <= 500; i++)); do
+   offset=$(((i - 1) * 4 * 1024))
+   $XFS_IO_PROG -f -s -c "pwrite -S 0xcf $offset 4K" \
+   $SCRATCH_MNT/bar >/dev/null
+done
+
+# A file which already has a prealloc extent beyond its size.
+# The fsync done on it is motivated by differences in the btrfs implementation
+# of fsync (first fsync has different logic from subsequent fsyncs).
+$XFS_IO_PROG -f -c "pwrite -S 0xf1 0 256K" \
+-c "falloc -k 256K 768K" \
+-c "fsync" \
+$SCRATCH_MNT/baz >/dev/null
+
+# Make sure everything done so far is durably persisted.
+sync
+
+# Allocate an extent beyond the size of the first test file and fsync it.
+$XFS_IO_PROG -c "falloc -k 256K 1M"\
+-c "fsync" \
+$SCRATCH_MNT/foo
+
+# Do a shrinking truncate of our test file, add a prealloc extent to it after
+# its new size and fsync it.
+$XFS_IO_PROG -c "truncate 256K" \
+-c "falloc -k 256K 1M"\
+-c "fsync" \
+$SCRATCH_MNT/bar
+
+# Allocate another extent beyond the size of file baz.
+$XFS_IO_PROG -c "falloc -k 1M 2M"\
+-c "fsync" \
+$SCRATCH_MNT/baz
+
+# Simulate a power failure and mount the filesystem to check that the extents
+# previously allocated were not lost.
+_flakey_drop_and_remount
+
+echo "File foo fiemap:"
+$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foo | _filter_fiemap
+
+echo "File bar fiemap:"
+$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/bar | _filter_fiemap
+
+echo "File baz fiemap:"
+$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/baz | _filter_fiemap
+
+_unmount_flakey
+_cleanup_flakey
+
+status=0
+exit
diff --git a/tests/generic/482.out b/tests/generic/482.out
new file mode 100644
index ..7e3ed139
--- /dev/null
+++ b/tests/generic/482.out
@@ -0,0 +1,10 @@
+QA output created by 482
+File foo fiemap:
+0: [0..511]: data
+1: [512..2559]: unwritten
+File bar fiemap:
+0: [0..511]: data
+1: [512..2559]: unwritten
+File baz fiemap:
+0: [0..511]: data
+1: [512..6143]: unwritten
diff --git a/tests/generic/group b/tests/generic/group
index 

[PATCH] Btrfs: fix loss of prealloc extents past i_size after fsync log replay

2018-04-06 Thread fdmanana
From: Filipe Manana 

Currently if we allocate extents beyond an inode's i_size (through the
fallocate system call) and then fsync the file, we log the extents but
after a power failure we replay them and then immediately drop them.
This behaviour happens since about 2009, commit c71bf099abdd ("Btrfs:
Avoid orphan inodes cleanup while replaying log"), because it marks
the inode as an orphan instead of dropping any extents beyond i_size
before replaying logged extents, so after the log replay, and while
the mount operation is still ongoing, we find the inode marked as an
orphan and then perform a truncation (drop extents beyond the inode's
i_size). Because the processing of orphan inodes is still done
right after replaying the log and before the mount operation finishes,
the intention of that commit does not make any sense (at least as
of today). However reverting that behaviour is not enough, because
we can not simply discard all extents beyond i_size and then replay
logged extents, because we risk dropping extents beyond i_size created
in past transactions, for example:

  add prealloc extent beyond i_size
  fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode
  transaction commit
  add another prealloc extent beyond i_size
  fsync - triggers the fast fsync path
  power failure

In that scenario, we would drop the first extent and then replay the
second one. To fix this just make sure that all prealloc extents
beyond i_size are logged, and if we find too many (which is far from
a common case), fallback to a full transaction commit (like we do when
logging regular extents in the fast fsync path).

Trivial reproducer:

 $ mkfs.btrfs -f /dev/sdb
 $ mount /dev/sdb /mnt
 $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo
 $ sync
 $ xfs_io -c "falloc -k 256K 1M" /mnt/foo
 $ xfs_io -c "fsync" /mnt/foo
 

 # mount to replay log
 $ mount /dev/sdb /mnt
 # at this point the file only has one extent, at offset 0, size 256K

A test case for fstests follows soon, covering multiple scenarios that
involve adding prealloc extents with previous shrinking truncates and
without such truncates.

Fixes: c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/tree-log.c | 63 -
 1 file changed, 58 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 70afd1085033..eb3a41269b0e 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2457,13 +2457,41 @@ static int replay_one_buffer(struct btrfs_root *log, 
struct extent_buffer *eb,
if (ret)
break;
 
-   /* for regular files, make sure corresponding
-* orphan item exist. extents past the new EOF
-* will be truncated later by orphan cleanup.
+   /*
+* Before replaying extents, truncate the inode to its
+* size. We need to do it now and not after log replay
+* because before an fsync we can have prealloc extents
+* added beyond the inode's i_size. If we did it after,
+* through orphan cleanup for example, we would drop
+* those prealloc extents just after replaying them.
 */
if (S_ISREG(mode)) {
-   ret = insert_orphan_item(wc->trans, root,
-key.objectid);
+   struct inode *inode;
+   u64 from;
+
+   inode = read_one_inode(root, key.objectid);
+   if (!inode) {
+   ret = -EIO;
+   break;
+   }
+   from = ALIGN(i_size_read(inode),
+root->fs_info->sectorsize);
+   ret = btrfs_drop_extents(wc->trans, root, inode,
+from, (u64)-1, 1);
+   /*
+* If the nlink count is zero here, the iput
+* will free the inode.  We bump it to make
+* sure it doesn't get freed until the link
+* count fixup is done.
+*/
+   if (!ret) {
+   if (inode->i_nlink == 0)
+   inc_nlink(inode);
+   /* Update link count and nbytes. */
+   ret = btrfs_update_inode(wc->trans,
+

[PATCH] btrfs-progs: Use more loose open ctree flags for dump-tree and restore

2018-04-06 Thread Qu Wenruo
Corrupted extent tree (either the root node or leaf) can normally block
us from open the fs.
As normally open_ctree() has the following call chain:
__open_ctree_fd()
|- btrfs_setup_all_roots()
   |- btrfs_read_block_groups()
  And we will search block group items in extent tree.

And considering how block group items are scattered around the whole
extent tree, any error would block the fs from being mounted.

Fortunately, we already have OPEN_CTREE_NO_BLOCK_GROUPS flags to disable
block group items search, which will not only allow us to open some
fs, but also hugely speed up open time.

Currently dump-tree and btrfs-restore is ensured that they care nothing
about block group items. So specify OPEN_CTREE_NO_BLOCK_GROUPS flag as
default.

Also fix a typo where dump-tree is using OPEN_CTREE_FS_PARTIAL, which
should be OPEN_CTREE_PARTIAL.
This makes dump-tree do more check and can sometimes fail to open
certain filesystems.

Reported-by: Christoph Anton Mitterer 
Fixes: 8698a2b9ba89 ("btrfs-progs: Allow inspect dump-tree to show specified 
tree block even some tree roots are corrupted")
Signed-off-by: Qu Wenruo 
---
 cmds-inspect-dump-tree.c | 4 +++-
 cmds-restore.c   | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index 7defb7164a49..8be976041543 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -303,7 +303,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
int uuid_tree_only = 0;
int roots_only = 0;
int root_backups = 0;
-   unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL;
+   /* Speed up open_ctree() and continue if extent tree is corrupted */
+   unsigned open_ctree_flags = OPEN_CTREE_PARTIAL |
+   OPEN_CTREE_NO_BLOCK_GROUPS;
u64 block_bytenr;
struct btrfs_root *tree_root_scan;
u64 tree_id = 0;
diff --git a/cmds-restore.c b/cmds-restore.c
index ade35f0f880f..b43bd2ac6502 100644
--- a/cmds-restore.c
+++ b/cmds-restore.c
@@ -1282,7 +1282,8 @@ static struct btrfs_root *open_fs(const char *dev, u64 
root_location,
for (i = super_mirror; i < BTRFS_SUPER_MIRROR_MAX; i++) {
bytenr = btrfs_sb_offset(i);
fs_info = open_ctree_fs_info(dev, bytenr, root_location, 0,
-OPEN_CTREE_PARTIAL);
+OPEN_CTREE_PARTIAL |
+OPEN_CTREE_NO_BLOCK_GROUPS);
if (fs_info)
break;
fprintf(stderr, "Could not open root, trying backup super\n");
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/1] btrfs-progs: inspect-dump-tree: Allow '-b|--block' to be specified multiple times

2018-04-06 Thread Qu Wenruo
Reuse extent-cache facility to record multiple bytenr so '-b|--block'
can be specified multiple times.

Despite that, add a sector size alignment check before we try to print a
tree block.
(Please note that, nodesize alignment check is not suitable here as meta
chunk start bytenr could be unaligned to nodesize)

Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Fix memory leak detected by asan.
  Fix NULL pointer derefenrece detected by asan.
---
 Documentation/btrfs-inspect-internal.asciidoc |   2 +-
 cmds-inspect-dump-tree.c  | 109 +++---
 2 files changed, 91 insertions(+), 20 deletions(-)

diff --git a/Documentation/btrfs-inspect-internal.asciidoc 
b/Documentation/btrfs-inspect-internal.asciidoc
index e2db64660b9a..ba8529f57660 100644
--- a/Documentation/btrfs-inspect-internal.asciidoc
+++ b/Documentation/btrfs-inspect-internal.asciidoc
@@ -86,7 +86,7 @@ the respective tree root block offset
 -u|--uuid
 print only the uuid tree information, empty output if the tree does not exist
 -b 
-print info of the specified block only
+print info of the specified block only, can be specified multiple times.
 --follow
 use with '-b', print all children tree blocks of ''
 -t 
diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index b0cd49b32664..7defb7164a49 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -198,11 +198,92 @@ const char * const cmd_inspect_dump_tree_usage[] = {
"-R|--backups   same as --roots plus print backup root info",
"-u|--uuid  print only the uuid tree",
"-b|--block  print info from the specified block only",
+   "   can be specified multile times",
"-t|--tree print only tree with the given id (string or 
number)",
"--follow   use with -b, to show all children tree blocks 
of ",
NULL
 };
 
+/*
+ * Helper function to record all tree block bytenr so we don't need to put
+ * all code into deep indent.
+ *
+ * Return >0 if we hit a duplicated bytenr (already recorded)
+ * Return 0 if nothing went wrong
+ * Return <0 if error happens (ENOMEM)
+ *
+ * For != 0 return value, all warning/error will be outputted by this function.
+ */
+static int dump_add_tree_block(struct cache_tree *tree, u64 bytenr)
+{
+   int ret;
+
+   /*
+* We don't really care about the size and we don't have
+* nodesize before we open the fs, so just use 1 as size here.
+*/
+   ret = add_cache_extent(tree, bytenr, 1);
+   if (ret == -EEXIST) {
+   warning("tree block bytenr %llu is duplicated", bytenr);
+   return 1;
+   }
+   if (ret < 0) {
+   error("failed to record tree block bytenr %llu: %d(%s)",
+   bytenr, ret, strerror(-ret));
+   return ret;
+   }
+   return ret;
+}
+
+/*
+ * Print all tree blocks recorded.
+ * All tree block bytenr record will also be freed in this function.
+ *
+ * Return 0 if nothing wrong happened for *each* tree blocks
+ * Return <0 if anything wrong happened, and return value will be the last
+ * error.
+ */
+static int dump_print_tree_blocks(struct btrfs_fs_info *fs_info,
+ struct cache_tree *tree, bool follow)
+{
+   struct cache_extent *ce;
+   struct extent_buffer *eb;
+   u64 bytenr;
+   int ret = 0;
+
+   ce = first_cache_extent(tree);
+   while (ce) {
+   bytenr = ce->start;
+
+   /*
+* Please note that here we can't check it against nodesize,
+* as it's possible a chunk is just aligned to sectorsize but
+* not aligned to nodesize.
+*/
+   if (!IS_ALIGNED(bytenr, fs_info->sectorsize)) {
+   error(
+   "tree block bytenr %llu is not aligned to sectorsize %u",
+ bytenr, fs_info->sectorsize);
+   ret = -EINVAL;
+   goto next;
+   }
+
+   eb = read_tree_block(fs_info, bytenr, 0);
+   if (!extent_buffer_uptodate(eb)) {
+   error("failed to read tree block %llu", bytenr);
+   ret = -EIO;
+   goto next;
+   }
+   btrfs_print_tree(eb, follow);
+   free_extent_buffer(eb);
+next:
+   remove_cache_extent(tree, ce);
+   free(ce);
+   ce = first_cache_extent(tree);
+   }
+   return ret;
+}
+
 int cmd_inspect_dump_tree(int argc, char **argv)
 {
struct btrfs_root *root;
@@ -213,6 +294,7 @@ int cmd_inspect_dump_tree(int argc, char **argv)
struct extent_buffer *leaf;
struct btrfs_disk_key disk_key;
struct btrfs_key found_key;
+   struct cache_tree block_root;   /* for multiple --block parameters */
char 

[PATCH 0/1] btrfs-progs: dump-tree: allow -b multiple times

2018-04-06 Thread Qu Wenruo
Although just one patch, it needs the extent buffer cleanup code as
basis, so please fetch it from my github repo:
https://github.com/adam900710/btrfs-progs/tree/dump_tree_multi_blocks

This patch allow -b to be specified multiple times, and add extra basic
check for them.
For later enhancement (Issue: #113) it needs extra work to handle
special roots.

Qu Wenruo (1):
  btrfs-progs: inspect-dump-tree: Allow '-b|--block' to be specified
multiple times

 Documentation/btrfs-inspect-internal.asciidoc |   2 +-
 cmds-inspect-dump-tree.c  | 108 ++
 2 files changed, 89 insertions(+), 21 deletions(-)

-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] btrfs-progs: inspect-dump-tree: Allow '-b|--block' to be specified multiple times

2018-04-06 Thread Qu Wenruo
Reuse extent-cache facility to record multiple bytenr so '-b|--block'
can be specified multiple times.

Despite that, add a sector size alignment check before we try to print a
tree block.
(Please note that, nodesize alignment check is not suitable here as meta
chunk start bytenr could be unaligned to nodesize)

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-inspect-internal.asciidoc |   2 +-
 cmds-inspect-dump-tree.c  | 108 ++
 2 files changed, 89 insertions(+), 21 deletions(-)

diff --git a/Documentation/btrfs-inspect-internal.asciidoc 
b/Documentation/btrfs-inspect-internal.asciidoc
index e2db64660b9a..ba8529f57660 100644
--- a/Documentation/btrfs-inspect-internal.asciidoc
+++ b/Documentation/btrfs-inspect-internal.asciidoc
@@ -86,7 +86,7 @@ the respective tree root block offset
 -u|--uuid
 print only the uuid tree information, empty output if the tree does not exist
 -b 
-print info of the specified block only
+print info of the specified block only, can be specified multiple times.
 --follow
 use with '-b', print all children tree blocks of ''
 -t 
diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index b0cd49b32664..fb3ccfc9d0ba 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -203,6 +203,85 @@ const char * const cmd_inspect_dump_tree_usage[] = {
NULL
 };
 
+/*
+ * Helper function to record all tree block bytenr so we don't need to put
+ * all code into deep indent.
+ *
+ * Return >0 if we hit a duplicated bytenr (already recorded)
+ * Return 0 if nothing went wrong
+ * Return <0 if error happens (ENOMEM)
+ *
+ * For != 0 return value, all warning/error will be outputted by this function.
+ */
+static int dump_add_tree_block(struct cache_tree *tree, u64 bytenr)
+{
+   int ret;
+
+   /*
+* We don't really care about the size and we don't have
+* nodesize before we open the fs, so just use 1 as size here.
+*/
+   ret = add_cache_extent(tree, bytenr, 1);
+   if (ret == -EEXIST) {
+   warning("tree block bytenr %llu is duplicated", bytenr);
+   return 1;
+   }
+   if (ret < 0) {
+   error("failed to record tree block bytenr %llu: %d(%s)",
+   bytenr, ret, strerror(-ret));
+   return ret;
+   }
+   return ret;
+}
+
+/*
+ * Print all tree blocks recorded.
+ * All tree block bytenr record will also be freed in this function.
+ *
+ * Return 0 if nothing wrong happened for *each* tree blocks
+ * Return <0 if anything wrong happened, and return value will be the last
+ * error.
+ */
+static int dump_print_tree_blocks(struct btrfs_fs_info *fs_info,
+ struct cache_tree *tree, bool follow)
+{
+   struct cache_extent *ce;
+   struct extent_buffer *eb;
+   u64 bytenr;
+   int ret = 0;
+
+   ce = first_cache_extent(tree);
+   while (ce) {
+   bytenr = ce->start;
+
+   /*
+* Please note that here we can't check it against nodesize,
+* as it's possible a chunk is just aligned to sectorsize but
+* not aligned to nodesize.
+*/
+   if (!IS_ALIGNED(bytenr, fs_info->sectorsize)) {
+   error(
+   "tree block bytenr %llu is not aligned to sectorsize %u",
+ bytenr, fs_info->sectorsize);
+   ret = -EINVAL;
+   goto next;
+   }
+
+   eb = read_tree_block(fs_info, bytenr, 0);
+   if (!extent_buffer_uptodate(eb)) {
+   error("failed to read tree block %llu", bytenr);
+   ret = -EIO;
+   goto next;
+   }
+   btrfs_print_tree(eb, follow);
+   free_extent_buffer(eb);
+next:
+   remove_cache_extent(tree, ce);
+   ce = first_cache_extent(tree);
+   }
+   return ret;
+}
+
 int cmd_inspect_dump_tree(int argc, char **argv)
 {
struct btrfs_root *root;
@@ -213,6 +292,7 @@ int cmd_inspect_dump_tree(int argc, char **argv)
struct extent_buffer *leaf;
struct btrfs_disk_key disk_key;
struct btrfs_key found_key;
+   struct cache_tree block_root;   /* for multiple --block parameters */
char uuidbuf[BTRFS_UUID_UNPARSED_SIZE];
int ret;
int slot;
@@ -222,11 +302,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
int roots_only = 0;
int root_backups = 0;
unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL;
-   u64 block_only = 0;
+   u64 block_bytenr;
struct btrfs_root *tree_root_scan;
u64 tree_id = 0;
bool follow = false;
 
+   cache_tree_init(_root);
while (1) {
int c;
enum { GETOPT_VAL_FOLLOW = 256 };
@@ -268,7 +349,10 @@ int