Re: Bad magic on superblock on /dev/sda at 65536
First superblock is zero-ed and its not some random corruption, most probably someone else other than btrfs used the disk when it was unmounted? Or if the partition (if any) was changed? or if its a SAN storge hope the LUN wasn't recreated at the storage end. Thanks, Anand On 04/07/2018 08:56 AM, Qu Wenruo wrote: On 2018年04月07日 08:35, Ben Parsons wrote: btrfs inspect-internal dump-super -Ffa /path superblock: bytenr=65536, device=/dev/sda - csum_type0 (crc32c) csum_size4 csum0x [DON'T MATCH] bytenr0 flags0x0 magic [DON'T MATCH] fsid---- First super block is completely gone. label generation0 root0 sys_array_size0 chunk_root_generation0 root_level0 chunk_root0 chunk_root_level0 log_root0 log_root_transid0 log_root_level0 total_bytes0 bytes_used0 sectorsize0 nodesize0 leafsize (deprecated)0 stripesize0 root_dir0 num_devices0 compat_flags0x0 compat_ro_flags0x0 incompat_flags0x0 cache_generation0 uuid_tree_generation0 dev_item.uuid---- dev_item.fsid---- [match] dev_item.type0 dev_item.total_bytes0 dev_item.bytes_used0 dev_item.io_align0 dev_item.io_width0 dev_item.sector_size0 dev_item.devid0 dev_item.dev_group0 dev_item.seek_speed0 dev_item.bandwidth0 dev_item.generation0 sys_chunk_array[2048]: backup_roots[4]: -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad magic on superblock on /dev/sda at 65536
On 7 April 2018 at 10:56, Qu Wenruowrote: > > > On 2018年04月07日 08:35, Ben Parsons wrote: >>> btrfs inspect-internal dump-super -Ffa /path >> >> superblock: bytenr=65536, device=/dev/sda >> - >> csum_type0 (crc32c) >> csum_size4 >> csum0x [DON'T MATCH] >> bytenr0 >> flags0x0 >> magic [DON'T MATCH] >> fsid---- > > First super block is completely gone. > >> label >> generation0 >> root0 >> sys_array_size0 >> chunk_root_generation0 >> root_level0 >> chunk_root0 >> chunk_root_level0 >> log_root0 >> log_root_transid0 >> log_root_level0 >> total_bytes0 >> bytes_used0 >> sectorsize0 >> nodesize0 >> leafsize (deprecated)0 >> stripesize0 >> root_dir0 >> num_devices0 >> compat_flags0x0 >> compat_ro_flags0x0 >> incompat_flags0x0 >> cache_generation0 >> uuid_tree_generation0 >> dev_item.uuid---- >> dev_item.fsid---- [match] >> dev_item.type0 >> dev_item.total_bytes0 >> dev_item.bytes_used0 >> dev_item.io_align0 >> dev_item.io_width0 >> dev_item.sector_size0 >> dev_item.devid0 >> dev_item.dev_group0 >> dev_item.seek_speed0 >> dev_item.bandwidth0 >> dev_item.generation0 >> sys_chunk_array[2048]: >> backup_roots[4]: >> >> superblock: bytenr=67108864, device=/dev/sda >> - >> csum_type65178 (INVALID) >> csum_size32 >> csum >> 0x24f2057c939118ef8cf9c276a05ff294223c99ec79e0b5cfe8ed795fe0a96715 >> [DON'T MATCH] >> bytenr6481065229944367737 > > Neither this backup superblock is valid. > >> flags0x527ffc9117fc11 >> ( WRITTEN | >> CHANGING_FSID | >> METADUMP_V2 | >> unknown flag: 0x527ff09117fc10 ) >> magic...;)... [DON'T MATCH] >> fsid7011f2d5-0afe-5dc6-fce2-70e04b80939d >> label >> ;.."`8..8.x.?.N../zF..H...|h].i.C)j)...4d_..5.../...1.?.rr5.E.. >> generation7112314448606197494 > [snip] >> >> >> superblock: bytenr=274877906944, device=/dev/sda >> - >> csum_type63651 (INVALID) >> csum_size32 >> csum >> 0x39db30683b693c4ff05c0073a1fa00db390c32963bae3c37 >> [DON'T MATCH] >> bytenr6341162744368070656 > > 2nd backup is also gone. > >> flags0x9731d639d900f69a >> ( RELOC | >> CHANGING_FSID | >> SEEDING | >> unknown flag: 0x9731d630d900f698 ) >> magic;.<. [DON'T MATCH] >> fside25b006e-a0f9-00df-390a-32923bae3c00 >> label.9~2.;.< >> generation15852938566880484858 >> root17063576824041017 >> sys_array_size2956094009 >> chunk_root_generation9223372036858758203 >> root_level0 >> chunk_root2305935309654720512 >> chunk_root_level212 >> log_root12624074888383091845 >> log_root_transid9583660007048386619 >> log_root_level57 >> total_bytes14916195329329095163 >> bytes_used17051482582002489 > [snip] > > unfortunately, the filesystem seems to be totally corrupted. > >> >>> Despite that, any extra info on how this happened is also appreciated, >>> as similar problem happened twice, which means we need to pay attention >>> on this. >> >> I dont know exactly what happened but here is some background: >> >> i am running Arch Linux on mainline kernel (4.16.0-1) and mesa-git >> (101352.498d9d0f4d-1) as I have a rx vega 64 > > Vega is nice, however I would wait until mesa in extra/ repo get updated. > >> over the past few months I have been getting hard locks when opening >> certain programs (usually due to a bad versions of mesa-git / >> llvm-git, etc). >> >> i was at the time trying to open the program "cheese" and when I did, >> my machine hard locked and only alt+shift+sysrq+b got my screen to go >> black - and then did nothing else, so I held the power button for 3 >> seconds and then my machine rebooted. > > Pretty common hard power reset. > >> looking at journalctl, there is a large stacktrace from kernel: amdgpu >> (see attached). >> then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount. > > I'd say such corruption is pretty serious. > > And what's the profile of the btrfs? If metadata is raid1, we could at > least try to recovery the superblock from the remaining disk. I am not sure what the metadata was but the two disks had no parity and just appeared as a single disk with total space of the two disks how
Re: Bad magic on superblock on /dev/sda at 65536
On 2018年04月07日 08:35, Ben Parsons wrote: >> btrfs inspect-internal dump-super -Ffa /path > > superblock: bytenr=65536, device=/dev/sda > - > csum_type0 (crc32c) > csum_size4 > csum0x [DON'T MATCH] > bytenr0 > flags0x0 > magic [DON'T MATCH] > fsid---- First super block is completely gone. > label > generation0 > root0 > sys_array_size0 > chunk_root_generation0 > root_level0 > chunk_root0 > chunk_root_level0 > log_root0 > log_root_transid0 > log_root_level0 > total_bytes0 > bytes_used0 > sectorsize0 > nodesize0 > leafsize (deprecated)0 > stripesize0 > root_dir0 > num_devices0 > compat_flags0x0 > compat_ro_flags0x0 > incompat_flags0x0 > cache_generation0 > uuid_tree_generation0 > dev_item.uuid---- > dev_item.fsid---- [match] > dev_item.type0 > dev_item.total_bytes0 > dev_item.bytes_used0 > dev_item.io_align0 > dev_item.io_width0 > dev_item.sector_size0 > dev_item.devid0 > dev_item.dev_group0 > dev_item.seek_speed0 > dev_item.bandwidth0 > dev_item.generation0 > sys_chunk_array[2048]: > backup_roots[4]: > > superblock: bytenr=67108864, device=/dev/sda > - > csum_type65178 (INVALID) > csum_size32 > csum > 0x24f2057c939118ef8cf9c276a05ff294223c99ec79e0b5cfe8ed795fe0a96715 > [DON'T MATCH] > bytenr6481065229944367737 Neither this backup superblock is valid. > flags0x527ffc9117fc11 > ( WRITTEN | > CHANGING_FSID | > METADUMP_V2 | > unknown flag: 0x527ff09117fc10 ) > magic...;)... [DON'T MATCH] > fsid7011f2d5-0afe-5dc6-fce2-70e04b80939d > label > ;.."`8..8.x.?.N../zF..H...|h].i.C)j)...4d_..5.../...1.?.rr5.E.. > generation7112314448606197494 [snip] > > > superblock: bytenr=274877906944, device=/dev/sda > - > csum_type63651 (INVALID) > csum_size32 > csum > 0x39db30683b693c4ff05c0073a1fa00db390c32963bae3c37 > [DON'T MATCH] > bytenr6341162744368070656 2nd backup is also gone. > flags0x9731d639d900f69a > ( RELOC | > CHANGING_FSID | > SEEDING | > unknown flag: 0x9731d630d900f698 ) > magic;.<. [DON'T MATCH] > fside25b006e-a0f9-00df-390a-32923bae3c00 > label.9~2.;.< > generation15852938566880484858 > root17063576824041017 > sys_array_size2956094009 > chunk_root_generation9223372036858758203 > root_level0 > chunk_root2305935309654720512 > chunk_root_level212 > log_root12624074888383091845 > log_root_transid9583660007048386619 > log_root_level57 > total_bytes14916195329329095163 > bytes_used17051482582002489 [snip] unfortunately, the filesystem seems to be totally corrupted. > >> Despite that, any extra info on how this happened is also appreciated, >> as similar problem happened twice, which means we need to pay attention >> on this. > > I dont know exactly what happened but here is some background: > > i am running Arch Linux on mainline kernel (4.16.0-1) and mesa-git > (101352.498d9d0f4d-1) as I have a rx vega 64 Vega is nice, however I would wait until mesa in extra/ repo get updated. > over the past few months I have been getting hard locks when opening > certain programs (usually due to a bad versions of mesa-git / > llvm-git, etc). > > i was at the time trying to open the program "cheese" and when I did, > my machine hard locked and only alt+shift+sysrq+b got my screen to go > black - and then did nothing else, so I held the power button for 3 > seconds and then my machine rebooted. Pretty common hard power reset. > looking at journalctl, there is a large stacktrace from kernel: amdgpu > (see attached). > then when I booted back up the pool (2 disks, 1TB + 2TB) wouldn't mount. I'd say such corruption is pretty serious. And what's the profile of the btrfs? If metadata is raid1, we could at least try to recovery the superblock from the remaining disk. And is there special mount options used here like discard? Thanks, Qu > > Thanks, > Ben > > On 7 April 2018 at 09:44, Qu Wenruowrote: >> >> >> On 2018年04月07日 01:03, David Sterba wrote: >>> On Fri, Apr 06, 2018 at 11:32:34PM +1000, Ben Parsons wrote: Hi, I just had an unexpected restart and now my btrfs pool wont
Re: Bad magic on superblock on /dev/sda at 65536
>btrfs inspect-internal dump-super -Ffa /path superblock: bytenr=65536, device=/dev/sda - csum_type0 (crc32c) csum_size4 csum0x [DON'T MATCH] bytenr0 flags0x0 magic [DON'T MATCH] fsid---- label generation0 root0 sys_array_size0 chunk_root_generation0 root_level0 chunk_root0 chunk_root_level0 log_root0 log_root_transid0 log_root_level0 total_bytes0 bytes_used0 sectorsize0 nodesize0 leafsize (deprecated)0 stripesize0 root_dir0 num_devices0 compat_flags0x0 compat_ro_flags0x0 incompat_flags0x0 cache_generation0 uuid_tree_generation0 dev_item.uuid---- dev_item.fsid---- [match] dev_item.type0 dev_item.total_bytes0 dev_item.bytes_used0 dev_item.io_align0 dev_item.io_width0 dev_item.sector_size0 dev_item.devid0 dev_item.dev_group0 dev_item.seek_speed0 dev_item.bandwidth0 dev_item.generation0 sys_chunk_array[2048]: backup_roots[4]: superblock: bytenr=67108864, device=/dev/sda - csum_type65178 (INVALID) csum_size32 csum 0x24f2057c939118ef8cf9c276a05ff294223c99ec79e0b5cfe8ed795fe0a96715 [DON'T MATCH] bytenr6481065229944367737 flags0x527ffc9117fc11 ( WRITTEN | CHANGING_FSID | METADUMP_V2 | unknown flag: 0x527ff09117fc10 ) magic...;)... [DON'T MATCH] fsid7011f2d5-0afe-5dc6-fce2-70e04b80939d label ;.."`8..8.x.?.N../zF..H...|h].i.C)j)...4d_..5.../...1.?.rr5.E.. generation7112314448606197494 root10814850762639476856 sys_array_size774240540 chunk_root_generation17716845740647334363 root_level123 chunk_root9039947042838677183 chunk_root_level7 log_root11588818316475425470 log_root_transid1970336570145243359 log_root_level255 total_bytes5626579194689281529 bytes_used10936644453437477355 sectorsize2711280660 nodesize2105571139 leafsize (deprecated)2624302184 stripesize3748622636 root_dir12031892002480545941 num_devices1887426113366288834 compat_flags0x986e28a7d6a0eedf compat_ro_flags0x67bf5c50764fabec ( unknown flag: 0x67bf5c50764fabec ) incompat_flags0xb6351e01f2cbb867 ( MIXED_BACKREF | DEFAULT_SUBVOL | MIXED_GROUPS | BIG_METADATA | EXTENDED_IREF | unknown flag: 0xb6351e01f2cbb800 ) cache_generation16803576046500197625 uuid_tree_generation9151978410922426283 dev_item.uuid3eff5038-c5ed-7c44-b841-bfcaefa127ff dev_item.fsidae705a3f-dcee-f7b0-9331-410c837e0ce8 [DON'T MATCH] dev_item.type20862153328580 dev_item.total_bytes4033499057947390500 dev_item.bytes_used14123877185665736413 dev_item.io_align356589416 dev_item.io_width2238618352 dev_item.sector_size33234003 dev_item.devid4647837691355179893 dev_item.dev_group3237710941 dev_item.seek_speed159 dev_item.bandwidth35 dev_item.generation13692119449717181535 sys_chunk_array[2048]: ERROR: sys_array_size 774240540 shouldn't exceed 2048 bytes backup_roots[4]: backup 0: backup_tree_root:9098106006959284508gen: 3422959743402530751level: 13 backup_chunk_root:8653729137999036921gen: 9805354230117732311level: 13 backup_extent_root:2227142819947659262gen: 16710030944005764576level: 250 backup_fs_root:17250344053212875712gen: 11109972073411492560level: 195 backup_dev_root:10813366787773230487gen: 4733558095364468453level: 64 backup_csum_root:15995327235362395775gen: 17585993390550392957level: 223 backup_total_bytes:187327539044806356 backup_bytes_used:11088092626626268919 backup_num_devices:1646767651564978160 backup 1: backup_tree_root:6132816855654833723gen: 7933636135630997331level: 175 backup_chunk_root:4500476885298477552gen: 17588667198258184431level: 49 backup_extent_root:17341284452428219997gen: 6122825786466476477level: 27 backup_fs_root:4578178975399312410gen: 4558088662074948842level: 229 backup_dev_root:17378404189136548866gen: 8942807062595821441level: 3 backup_csum_root:13954259417814538534gen: 17582753360836298151level: 135 backup_total_bytes:
Re: Bad magic on superblock on /dev/sda at 65536
On 2018年04月07日 01:03, David Sterba wrote: > On Fri, Apr 06, 2018 at 11:32:34PM +1000, Ben Parsons wrote: >> Hi, >> >> I just had an unexpected restart and now my btrfs pool wont mount. >> The error on mount is: >> >> "ERROR: unsupported checksum algorithm 41700" >> >> and when running >> >> btrfs inspect-internal dump-super /dev/sda >> ERROR: bad magic on superblock on /dev/sda at 65536 >> >> I saw a thread in the mailing list about it: >> https://www.spinics.net/lists/linux-btrfs/msg75326.html >> However I am told on IRC that Qu fixed it using magic. >> >> Any help would be much appreciated. > > In the previous report, there were 2 isolated areas of superblock > damaged. Please post output of > > btrfs inspect dump-super /path And don't forget -Ffa option. -F to force btrfs-progs to recognize it as btrfs no matter what the magic is -f shows all data so we could find all corruption and fix them if possible -a shows all backup superblocks, and if some backup is good, "btrfs rescue super-recovery" mentioned by Nikolay would be the best solution. Despite that, any extra info on how this happened is also appreciated, as similar problem happened twice, which means we need to pay attention on this. Thanks, Qu Thanks, Qu > > so we can see if it's a similar issue. > > In case it is, there's a tool in the btrfs-progs repo that can fix the > individual values. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] btrfs: Validate child tree block's level and first key
On 2018年04月07日 01:07, David Sterba wrote: > On Mon, Apr 02, 2018 at 06:47:32PM +0800, Qu Wenruo wrote: >> On 2018年03月28日 23:49, David Sterba wrote: >>> On Tue, Mar 27, 2018 at 08:44:19PM +0800, Qu Wenruo wrote: We have several reports about node pointer points to incorrect child tree blocks, which could have even wrong owner and level but still with valid generation and checksum. Although btrfs check could handle it and print error message like: leaf parent key incorrect 60670574592 Kernel doesn't have enough check on this type of corruption correctly. At least add such check to read_tree_block() and btrfs_read_buffer(), where we need two new parameters @level and @first_key to verify the child tree block. The new @level check is mandatory and all call sites are already modified to extract expected level from its call chain. While @first_key is optional, the following call sites are skipping such check: 1) Root node/leaf As ROOT_ITEM doesn't contain the first key, skip @first_key check. 2) Direct backref Only parent bytenr and level is known and we need to resolve the key all by ourselves, skip @first_key check. Another note of this verification is, it needs extra info from nodeptr or ROOT_ITEM, so it can't fit into current tree-checker framework, which is limited to node/leaf boundary. Signed-off-by: Qu Wenruo--- changelog: v2: Make @level check mandatory, suggesed by Jeff and Nikolay. Change parameter order as @level is now mandatory, put it in front of @first_key. Change verify_parent_level() to verify_key_level() to avoid confusion on the @level parameter. Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging. >>> >>> That's much better overall, thanks. Adding it to next. >> >> Nikolay reported a case where @first_key check seems to cause false alert. >> (Although my xfstests check hasn't exposed it yet) >> >> Please discard this patch since it has the possibility to cause false >> alert for btrfs core functionality. > > Too late, the patch is in master now, so we need to fix it. Seems to be a very rare race in tree operations, still under investigation. Thanks, Qu > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 08/16] btrfs: add sanity check when resuming balance after mount
Hi, [This is an automated email] This commit has been processed by the -stable helper bot and determined to be a high probability candidate for -stable trees. (score: 16.7330) The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126. v4.16: Build OK! v4.15.15: Build OK! v4.14.32: Build OK! v4.9.92: Failed to apply! Possible dependencies: 509cdd5c938a ("btrfs: add sanity check when resuming balance after mount") v4.4.126: Failed to apply! Possible dependencies: 509cdd5c938a ("btrfs: add sanity check when resuming balance after mount") Please let us know if you'd like to have this patch included in a stable tree. -- Thanks, Sasha-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bitmap: fix memset optimization on big-endian systems
Hi, [This is an automated email] This commit has been processed because it contains a "Fixes:" tag, fixing commit: 2a98dc028f91 include/linux/bitmap.h: turn bitmap_set and bitmap_clear into memset when possible. The bot has also determined it's probably a bug fixing patch. (score: 65.4067) The bot has tested the following trees: v4.16, v4.15.15, v4.14.32. v4.16: Build OK! v4.15.15: Build OK! v4.14.32: Build OK! -- Thanks, Sasha-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/16] btrfs: add proper safety check before resuming dev-replace
Hi, [This is an automated email] This commit has been processed by the -stable helper bot and determined to be a high probability candidate for -stable trees. (score: 34.4419) The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126. v4.16: Build OK! v4.15.15: Build OK! v4.14.32: Build OK! v4.9.92: Failed to apply! Possible dependencies: 2799d90f3887 ("btrfs: add proper safety check before resuming dev-replace") v4.4.126: Failed to apply! Possible dependencies: 2799d90f3887 ("btrfs: add proper safety check before resuming dev-replace") Please let us know if you'd like to have this patch included in a stable tree. -- Thanks, Sasha-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not abort transaction when failing to insert hole extent
On Fri, Apr 6, 2018 at 6:21 AM, David Sterbawrote: > On Thu, Apr 05, 2018 at 11:58:16AM -0700, Liu Bo wrote: >> On Thu, Apr 5, 2018 at 9:48 AM, David Sterba wrote: >> > On Sat, Mar 31, 2018 at 06:11:55AM +0800, Liu Bo wrote: >> >> This is running in a typical write path, not inside a critical path >> >> where we have to abort the running transaction, so it's OK to return >> >> errors to callers and eventually to userspace. >> > >> > I'm not sure this is entierly correct, several other places do not abort >> > after btrfs_drop_extents as there's nothing that would leave the >> > structres in some half-state. >> > >> >> Signed-off-by: Liu Bo >> >> --- >> >> fs/btrfs/inode.c | 5 + >> >> 1 file changed, 1 insertion(+), 4 deletions(-) >> >> >> >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >> >> index c7b75dd..b9310f8 100644 >> >> --- a/fs/btrfs/inode.c >> >> +++ b/fs/btrfs/inode.c >> >> @@ -4939,16 +4939,13 @@ static int maybe_insert_hole(struct btrfs_root >> >> *root, struct inode *inode, >> >> >> >> ret = btrfs_drop_extents(trans, root, inode, offset, offset + len, >> >> 1); >> >> if (ret) { >> >> - btrfs_abort_transaction(trans, ret); >> >> btrfs_end_transaction(trans); >> >> return ret; >> >> } >> >> >> >> ret = btrfs_insert_file_extent(trans, root, >> >> btrfs_ino(BTRFS_I(inode)), >> >> offset, 0, 0, len, 0, len, 0, 0, 0); >> > >> > But here the extents have been already dropped and missing to insert the >> > items does not seem to lead to a consistent state. >> > >> > It's possible that I'm missing something. In a call path that can be >> > safely rolled back even with a started transaction, we don't need to >> > abort in all cases. But if the rollback requires some non-trivial >> > modifications, I don't see options how to avoid the abort. >> > >> > __btrfs_drop_extents does a lot of state changes and can itself fail >> > in the middle of dropping the range, aborting looks like the safest >> > option. >> > >> >> As maybe_insert_hole is only called by btrfs_cont_expand here, which >> means it's a really hole, I don't expect drop_extents would drop >> anything, we can remove this drop_extents and put an assert after >> btrfs_insert_file_extent for checking EEXIST. > > Sounds good. > Let me make a v2 and have a fstests run. thanks, liubo >> It's different from punch hole where we need to explicitly drop an >> actual extent and replace it with a hole range. > > Right, that's what I didn't see at first. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] btrfs: Validate child tree block's level and first key
On Mon, Apr 02, 2018 at 06:47:32PM +0800, Qu Wenruo wrote: > On 2018年03月28日 23:49, David Sterba wrote: > > On Tue, Mar 27, 2018 at 08:44:19PM +0800, Qu Wenruo wrote: > >> We have several reports about node pointer points to incorrect child > >> tree blocks, which could have even wrong owner and level but still with > >> valid generation and checksum. > >> > >> Although btrfs check could handle it and print error message like: > >> leaf parent key incorrect 60670574592 > >> > >> Kernel doesn't have enough check on this type of corruption correctly. > >> At least add such check to read_tree_block() and btrfs_read_buffer(), > >> where we need two new parameters @level and @first_key to verify the > >> child tree block. > >> > >> The new @level check is mandatory and all call sites are already > >> modified to extract expected level from its call chain. > >> > >> While @first_key is optional, the following call sites are skipping such > >> check: > >> 1) Root node/leaf > >>As ROOT_ITEM doesn't contain the first key, skip @first_key check. > >> 2) Direct backref > >>Only parent bytenr and level is known and we need to resolve the key > >>all by ourselves, skip @first_key check. > >> > >> Another note of this verification is, it needs extra info from nodeptr > >> or ROOT_ITEM, so it can't fit into current tree-checker framework, which > >> is limited to node/leaf boundary. > >> > >> Signed-off-by: Qu Wenruo> >> --- > >> changelog: > >> v2: > >> Make @level check mandatory, suggesed by Jeff and Nikolay. > >> Change parameter order as @level is now mandatory, put it in front of > >> @first_key. > >> Change verify_parent_level() to verify_key_level() to avoid confusion > >> on the @level parameter. > >> Add btrfs_error() output for CONFIG_BTRFS_DEBUG to help debugging. > > > > That's much better overall, thanks. Adding it to next. > > Nikolay reported a case where @first_key check seems to cause false alert. > (Although my xfstests check hasn't exposed it yet) > > Please discard this patch since it has the possibility to cause false > alert for btrfs core functionality. Too late, the patch is in master now, so we need to fix it. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad magic on superblock on /dev/sda at 65536
On Fri, Apr 06, 2018 at 11:32:34PM +1000, Ben Parsons wrote: > Hi, > > I just had an unexpected restart and now my btrfs pool wont mount. > The error on mount is: > > "ERROR: unsupported checksum algorithm 41700" > > and when running > > btrfs inspect-internal dump-super /dev/sda > ERROR: bad magic on superblock on /dev/sda at 65536 > > I saw a thread in the mailing list about it: > https://www.spinics.net/lists/linux-btrfs/msg75326.html > However I am told on IRC that Qu fixed it using magic. > > Any help would be much appreciated. In the previous report, there were 2 isolated areas of superblock damaged. Please post output of btrfs inspect dump-super /path so we can see if it's a similar issue. In case it is, there's a tool in the btrfs-progs repo that can fix the individual values. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bad magic on superblock on /dev/sda at 65536
On 6.04.2018 16:32, Ben Parsons wrote: > Hi, > > I just had an unexpected restart and now my btrfs pool wont mount. > The error on mount is: > > "ERROR: unsupported checksum algorithm 41700" > > and when running > > btrfs inspect-internal dump-super /dev/sda > ERROR: bad magic on superblock on /dev/sda at 65536 > > I saw a thread in the mailing list about it: > https://www.spinics.net/lists/linux-btrfs/msg75326.html > However I am told on IRC that Qu fixed it using magic. > > Any help would be much appreciated. Try recovering the super block from one of the backup copies via "btrfs rescue super-recover /dev/sda" > > Thanks, > Ben > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs progs release 4.16
Hi, btrfs-progs version 4.16 have been released. This version brings the new library that should help applications to use the btrfs functionality in a more convenient way than plain ioctls. And has python bindings. The rest are bugfixes and small enhancements. The library is hosted in the progs git because of the close dependency and maintained primarily by Omar Sandoval. Changes: * libbtrfsutil - new LGPL library to wrap userspace functionality * several 'btrfs' commands converted to use it: * properties * filesystem sync * subvolume set-default/get-default/delete/show/sync * python bindings, tests * build * use configured pkg-config path * CI: add python, musl/clang, built dependencies caching * convert: build fix for e2fsprogs 1.44+ * don't install library links with wrong permissions * fixes * prevent incorrect use of subvol_strip_mountpoint * dump-super: don't verify csum for unknown type * convert: fix inline extent creation condition * check: * lowmem: fix false alert for 'data extent backref lost for snapshot' * lowmem: fix false alert for orphan inode * lowmem: fix false alert for shared prealloc extents * mkfs: * add UUID and otime to root of FS_TREE - with the uuid, snapshots will be now linked to the toplevel subvol by the parent UUID * don't follow symlinks when calculating size * pre-create the UUID tree * fix --rootdir with selinux enabled * dump-tree: add option to print only children nodes of a given block * image: handle missing device for RAID1 * other * new tests * test script cleanups (quoting, helpers) * tool to edit superblocks * updated docs Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git Shortlog: Axel Burri (1): btrfs-progs: prevent incorrect use of subvol_strip_mountpoint David Sterba (24): btrfs-progs: build: configure.ac hard-codes the pkg-config command btrfs-progs: tests: add test for send -p on 2 mont points btrfs-progs: tests: add helper to log pipe stdout btrfs-progs: ci: add python dependencies for libbtrfsutil libbtrfsutil: add stub for reallocarray btrfs-progs: ci: cache built dependencies libbtrfsutils: add python-devel detection btrfs-progs: ci: update test image packages - add clang and python btrfs-progs: ci: enable clang and python for musl build test btrfs-progs: docs: add section about filesystem limits to btrfs(5) btrfs-progs: tests: fix source path for testsuite btrfs-progs: tests: don't use fallocate in mkfs/014-rootdir-inline-extent btrfs-progs: tests: mkfs fills uuid and otime for FS_TREE btrfs-progs: tests: update README, images, coding style btrfs-progs: tests: convert/014 use shell builtin for generating content btrfs-progs: tests: add shell quoting to fuzz test scripts btrfs-progs: tests: remove trivial use of local variables btrfs-progs: tests: add shell quotes to mkfs test scripts btrfs-progs: tests: add shell quotes to misc test scripts btrfs-progs: add tool to edit super blocks btrfs-progs: mkfs: precreate the uuid tree btrfs-progs: docs: fix typos btrfs-progs: update CHANGES for v4.16 Btrfs progs v4.16 Filipe Manana (2): Btrfs-progs: check, fix false error reports for shared prealloc extents Btrfs-progs: add fsck test for filesystem with shared prealloc extents Gu Jinxiang (1): btrfs-progs: Remove unused parameter Lu Fengqi (4): btrfs-progs: check/lowmem: Fix the incorrect error message of check_extent_data_item btrfs-progs: check/lowmem: Fix false alert of data extent backref lost for snapshot btrfs-progs: fsck-tests: Introduce test case with keyed data backref with the extent offset btrfs-progs: build: modify cscope/ctags rules to include directories such as check Misono Tomohiro (2): btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of, FS_TREE btrfs-progs: mkfs rootdir: use lgetxattr() not to follow a symbolic link Misono, Tomohiro (1): btrfs-progs: remove BTRFS_CRC32_SIZE definition Nicholas D Steeves (1): btrfs-progs: Fix typos in docs and user-facing strings Nikolay Borisov (1): btrfs-progs: Beautify owner when printing leaf/nodes Omar Sandoval (30): Add libbtrfsutil libbtrfsutil: add Python bindings libbtrfsutil: add qgroup inheritance helpers libbtrfsutil: add filesystem sync helpers libbtrfsutil: fix Python tests libbtrfsutil: copy in Btrfs UAPI headers libbtrfsutil: add btrfs_util_is_subvolume() and btrfs_util_subvolume_id() libbtrfsutil: add btrfs_util_create_subvolume() libbtrfsutil: add btrfs_util_subvolume_path() libbtrfsutil: add btrfs_util_subvolume_info() libbtrfsutil: add btrfs_util_[gs]et_read_only()
Bad magic on superblock on /dev/sda at 65536
Hi, I just had an unexpected restart and now my btrfs pool wont mount. The error on mount is: "ERROR: unsupported checksum algorithm 41700" and when running btrfs inspect-internal dump-super /dev/sda ERROR: bad magic on superblock on /dev/sda at 65536 I saw a thread in the mailing list about it: https://www.spinics.net/lists/linux-btrfs/msg75326.html However I am told on IRC that Qu fixed it using magic. Any help would be much appreciated. Thanks, Ben -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] btrfs: Allow rmdir(2) to delete a subvolume
On Fri, Mar 30, 2018 at 03:16:47PM +0900, Misono Tomohiro wrote: > This patch changes the behavior of rmdir(2) to allow it to delete > an empty subvolume by default, unless it is not a default subvolume > and send is not in progress. > > New function btrfs_delete_subvolume() is almost equal to the second half > of btrfs_ioctl_snap_destroy(). This function requires inode_lock for both > @dir and inode of @dentry. For rmdir(2) it is already acquired in vfs > layer before calling btrfs_rmdir(). > > Note that while a non-privileged user cannot delete a read-only subvolume > by "btrfs subvolume delete" when user_subvol_rm_allowd mount option is > enabled, rmdir(2) can delete an empty read-only subvolume. > > Tested-by: Goffredo Baroncelli> Signed-off-by: Tomohiro Misono > --- > fs/btrfs/inode.c | 141 > ++- > 1 file changed, 140 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index db66fa4fede6..84dbb9cafd6b 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -4387,6 +4387,145 @@ noinline int may_destroy_subvol(struct btrfs_root > *root) > return ret; > } > > +static int btrfs_delete_subvolume(struct inode *dir, struct dentry *dentry) > +{ > + struct btrfs_fs_info *fs_info = btrfs_sb(dentry->d_sb); > + struct btrfs_root *root = BTRFS_I(dir)->root; > + struct inode *inode = d_inode(dentry); > + struct btrfs_root *dest = BTRFS_I(inode)->root; > + struct btrfs_trans_handle *trans; > + struct btrfs_block_rsv block_rsv; > + u64 root_flags; > + u64 qgroup_reserved; > + int ret; > + int err; > + > + /* > + * Don't allow to delete a subvolume with send in progress. This is > + * inside the i_mutex so the error handling that has to drop the bit > + * again is not run concurrently. > + */ > + spin_lock(>root_item_lock); > + root_flags = btrfs_root_flags(>root_item); > + if (dest->send_in_progress == 0) { > + btrfs_set_root_flags(>root_item, > + root_flags | BTRFS_ROOT_SUBVOL_DEAD); > + spin_unlock(>root_item_lock); > + } else { > + spin_unlock(>root_item_lock); > + btrfs_warn(fs_info, > +"Attempt to delete subvolume %llu during send", > +dest->root_key.objectid); > + err = -EPERM; > + return err; > + } > + > + down_write(_info->subvol_sem); > + > + err = may_destroy_subvol(dest); > + if (err) > + goto out_up_write; > + > + btrfs_init_block_rsv(_rsv, BTRFS_BLOCK_RSV_TEMP); > + /* > + * One for dir inode, two for dir entries, two for root > + * ref/backref. > + */ > + err = btrfs_subvolume_reserve_metadata(root, _rsv, > +5, _reserved, true); > + if (err) > + goto out_up_write; > + > + trans = btrfs_start_transaction(root, 0); > + if (IS_ERR(trans)) { > + err = PTR_ERR(trans); > + goto out_release; > + } > + trans->block_rsv = _rsv; > + trans->bytes_reserved = block_rsv.size; > + > + btrfs_record_snapshot_destroy(trans, BTRFS_I(dir)); > + > + ret = btrfs_unlink_subvol(trans, root, dir, > + dest->root_key.objectid, > + dentry->d_name.name, > + dentry->d_name.len); > + if (ret) { > + err = ret; > + btrfs_abort_transaction(trans, ret); > + goto out_end_trans; > + } > + > + btrfs_record_root_in_trans(trans, dest); > + > + memset(>root_item.drop_progress, 0, > + sizeof(dest->root_item.drop_progress)); > + dest->root_item.drop_level = 0; > + btrfs_set_root_refs(>root_item, 0); > + > + if (!test_and_set_bit(BTRFS_ROOT_ORPHAN_ITEM_INSERTED, >state)) { > + ret = btrfs_insert_orphan_item(trans, > + fs_info->tree_root, > + dest->root_key.objectid); > + if (ret) { > + btrfs_abort_transaction(trans, ret); > + err = ret; > + goto out_end_trans; > + } > + } > + > + ret = btrfs_uuid_tree_rem(trans, fs_info, dest->root_item.uuid, > + BTRFS_UUID_KEY_SUBVOL, > + dest->root_key.objectid); > + if (ret && ret != -ENOENT) { > + btrfs_abort_transaction(trans, ret); > + err = ret; > + goto out_end_trans; > + } > + if (!btrfs_is_empty_uuid(dest->root_item.received_uuid)) { > + ret = btrfs_uuid_tree_rem(trans, fs_info, > + dest->root_item.received_uuid, > +
Re: [PATCH] Btrfs: fix loss of prealloc extents past i_size after fsync log replay
On Thu, Apr 05, 2018 at 10:55:12PM +0100, fdman...@kernel.org wrote: > From: Filipe Manana> > Currently if we allocate extents beyond an inode's i_size (through the > fallocate system call) and then fsync the file, we log the extents but > after a power failure we replay them and then immediately drop them. > This behaviour happens since about 2009, commit c71bf099abdd ("Btrfs: > Avoid orphan inodes cleanup while replaying log"), because it marks > the inode as an orphan instead of dropping any extents beyond i_size > before replaying logged extents, so after the log replay, and while > the mount operation is still ongoing, we find the inode marked as an > orphan and then perform a truncation (drop extents beyond the inode's > i_size). Because the processing of orphan inodes is still done > right after replaying the log and before the mount operation finishes, > the intention of that commit does not make any sense (at least as > of today). However reverting that behaviour is not enough, because > we can not simply discard all extents beyond i_size and then replay > logged extents, because we risk dropping extents beyond i_size created > in past transactions, for example: > > add prealloc extent beyond i_size > fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode > transaction commit > add another prealloc extent beyond i_size > fsync - triggers the fast fsync path > power failure > > In that scenario, we would drop the first extent and then replay the > second one. To fix this just make sure that all prealloc extents > beyond i_size are logged, and if we find too many (which is far from > a common case), fallback to a full transaction commit (like we do when > logging regular extents in the fast fsync path). > > Trivial reproducer: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo > $ sync > $ xfs_io -c "falloc -k 256K 1M" /mnt/foo > $ xfs_io -c "fsync" /mnt/foo > > > # mount to replay log > $ mount /dev/sdb /mnt > # at this point the file only has one extent, at offset 0, size 256K > > A test case for fstests follows soon, covering multiple scenarios that > involve adding prealloc extents with previous shrinking truncates and > without such truncates. > > Fixes: c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log") > Signed-off-by: Filipe Manana Added to next, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: do not abort transaction when failing to insert hole extent
On Thu, Apr 05, 2018 at 11:58:16AM -0700, Liu Bo wrote: > On Thu, Apr 5, 2018 at 9:48 AM, David Sterbawrote: > > On Sat, Mar 31, 2018 at 06:11:55AM +0800, Liu Bo wrote: > >> This is running in a typical write path, not inside a critical path > >> where we have to abort the running transaction, so it's OK to return > >> errors to callers and eventually to userspace. > > > > I'm not sure this is entierly correct, several other places do not abort > > after btrfs_drop_extents as there's nothing that would leave the > > structres in some half-state. > > > >> Signed-off-by: Liu Bo > >> --- > >> fs/btrfs/inode.c | 5 + > >> 1 file changed, 1 insertion(+), 4 deletions(-) > >> > >> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > >> index c7b75dd..b9310f8 100644 > >> --- a/fs/btrfs/inode.c > >> +++ b/fs/btrfs/inode.c > >> @@ -4939,16 +4939,13 @@ static int maybe_insert_hole(struct btrfs_root > >> *root, struct inode *inode, > >> > >> ret = btrfs_drop_extents(trans, root, inode, offset, offset + len, > >> 1); > >> if (ret) { > >> - btrfs_abort_transaction(trans, ret); > >> btrfs_end_transaction(trans); > >> return ret; > >> } > >> > >> ret = btrfs_insert_file_extent(trans, root, > >> btrfs_ino(BTRFS_I(inode)), > >> offset, 0, 0, len, 0, len, 0, 0, 0); > > > > But here the extents have been already dropped and missing to insert the > > items does not seem to lead to a consistent state. > > > > It's possible that I'm missing something. In a call path that can be > > safely rolled back even with a started transaction, we don't need to > > abort in all cases. But if the rollback requires some non-trivial > > modifications, I don't see options how to avoid the abort. > > > > __btrfs_drop_extents does a lot of state changes and can itself fail > > in the middle of dropping the range, aborting looks like the safest > > option. > > > > As maybe_insert_hole is only called by btrfs_cont_expand here, which > means it's a really hole, I don't expect drop_extents would drop > anything, we can remove this drop_extents and put an assert after > btrfs_insert_file_extent for checking EEXIST. Sounds good. > It's different from punch hole where we need to explicitly drop an > actual extent and replace it with a hole range. Right, that's what I didn't see at first. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: clean up resources during umount after trans is aborted
On Thu, Apr 05, 2018 at 11:45:55AM -0700, Liu Bo wrote: > On Thu, Apr 5, 2018 at 9:11 AM, David Sterbawrote: > > On Sat, Mar 31, 2018 at 06:11:56AM +0800, Liu Bo wrote: > >> Currently if some fatal errors occur, like all IO get -EIO, resources > >> would be cleaned up when > >> a) transaction is being committed or > >> b) BTRFS_FS_STATE_ERROR is set > >> > >> However, in some rare cases, resources may be left alone after transaction > >> gets aborted and umount may run into some ASSERT(), e.g. > >> ASSERT(list_empty(_group->dirty_list)); > >> > >> For case a), in btrfs_commit_transaciton(), there're several places at the > >> beginning where we just call btrfs_end_transaction() without cleaning up > >> resources. For case b), it is possible that the trans handle doesn't have > >> any dirty stuff, then only trans hanlde is marked as aborted while > >> BTRFS_FS_STATE_ERROR is not set, so resources remain in memory. > >> > >> This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that > >> all resources won't stay in memory after umount. > >> > >> Signed-off-by: Liu Bo > > > > Is it possible that the following stactrace could be caused by the > > missing check? It roughly matches what you describe (ie. close_ctree and > > unreleased resources). This is from generic/475, that does some error > > injection: > > > > [16991.455178] WARNING: CPU: 6 PID: 23518 at fs/btrfs/extent-tree.c:9896 > > btrfs_free_block_groups+0x2c8/0x420 [btrfs] > > > > Hmm...I don't think so, while running 475, the one I got pretty stable is > ASSERT(list_empty(_group->dirty_list)); There's a number of things that 475 catches so this might depend on timing, memory, disks etc. > And I did see this warning a few times, but I thought that was due to > the new flag (ZERO) of fallocate for which we had fixes from Filipe, > not sure if they've been merged? Merged to 4.15: * f27451f22996687 Btrfs: add support for fallocate's zero range operation * 9f13ce743b1bd4e Btrfs: fix missing inode i_size update after zero range operation -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: mkfs rootdir: use lgetxattr() not to follow a symbolic link
On Mon, Apr 02, 2018 at 10:59:31AM +0900, Misono Tomohiro wrote: > mkfs-test 016 "rootdir-bad-symbolic-link" fails when selinux is enabled. > This is because add_xattr_item() uses getxattr() and tries to follow a > bad symbolic link for selinux item, which causes ENOENT error. > > The line above already uses llistxattr() for getting list of xattr in > order not to follow a symbolic link, so just use lgetxattr() too. > > Signed-off-by: Tomohiro MisonoApplied and added to 4.16, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: build: Do not use cp -a to install files
On Wed, Apr 04, 2018 at 04:04:59PM +0200, Peter Kjellerstedt wrote: > Using cp -a to install files will preserve the ownership of the > original files (if possible), which is typically not wanted. E.g., if > the files were built by a normal user, but are being installed by > root, then the installed files would maintain the UIDs/GIDs of the > user that built the files rather than be owned by root. > > Signed-off-by: Peter KjellerstedtApplied and added to 4.16, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] fstests: generic test for fsync after fallocate
From: Filipe MananaTest that fsync operations preserve extents allocated with fallocate(2) that are placed beyond a file's size. This test is motivated by a bug found in btrfs where unwritten extents beyond the inode's i_size were not preserved after a fsync and power failure. The btrfs bug is fixed by the following patch for the linux kernel: "Btrfs: fix loss of prealloc extents past i_size after fsync log replay" Signed-off-by: Filipe Manana --- tests/generic/482 | 118 ++ tests/generic/482.out | 10 + tests/generic/group | 1 + 3 files changed, 129 insertions(+) create mode 100755 tests/generic/482 create mode 100644 tests/generic/482.out diff --git a/tests/generic/482 b/tests/generic/482 new file mode 100755 index ..43bbc913 --- /dev/null +++ b/tests/generic/482 @@ -0,0 +1,118 @@ +#! /bin/bash +# FSQA Test No. 482 +# +# Test that fsync operations preserve extents allocated with fallocate(2) that +# are placed beyond a file's size. +# +#--- +# +# Copyright (C) 2018 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + _cleanup_flakey + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmflakey +. ./common/punch + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_scratch +_require_dm_target flakey +_require_xfs_io_command "falloc" "-k" +_require_xfs_io_command "fiemap" + +rm -f $seqres.full + +_scratch_mkfs >>$seqres.full 2>&1 +_require_metadata_journaling $SCRATCH_DEV +_init_flakey +_mount_flakey + +# Create our test files. +$XFS_IO_PROG -f -c "pwrite -S 0xea 0 256K" $SCRATCH_MNT/foo >/dev/null + +# Create a file with many extents. We later want to shrink truncate it and +# add a prealloc extent beyond its new size. +for ((i = 1; i <= 500; i++)); do + offset=$(((i - 1) * 4 * 1024)) + $XFS_IO_PROG -f -s -c "pwrite -S 0xcf $offset 4K" \ + $SCRATCH_MNT/bar >/dev/null +done + +# A file which already has a prealloc extent beyond its size. +# The fsync done on it is motivated by differences in the btrfs implementation +# of fsync (first fsync has different logic from subsequent fsyncs). +$XFS_IO_PROG -f -c "pwrite -S 0xf1 0 256K" \ +-c "falloc -k 256K 768K" \ +-c "fsync" \ +$SCRATCH_MNT/baz >/dev/null + +# Make sure everything done so far is durably persisted. +sync + +# Allocate an extent beyond the size of the first test file and fsync it. +$XFS_IO_PROG -c "falloc -k 256K 1M"\ +-c "fsync" \ +$SCRATCH_MNT/foo + +# Do a shrinking truncate of our test file, add a prealloc extent to it after +# its new size and fsync it. +$XFS_IO_PROG -c "truncate 256K" \ +-c "falloc -k 256K 1M"\ +-c "fsync" \ +$SCRATCH_MNT/bar + +# Allocate another extent beyond the size of file baz. +$XFS_IO_PROG -c "falloc -k 1M 2M"\ +-c "fsync" \ +$SCRATCH_MNT/baz + +# Simulate a power failure and mount the filesystem to check that the extents +# previously allocated were not lost. +_flakey_drop_and_remount + +echo "File foo fiemap:" +$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foo | _filter_fiemap + +echo "File bar fiemap:" +$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/bar | _filter_fiemap + +echo "File baz fiemap:" +$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/baz | _filter_fiemap + +_unmount_flakey +_cleanup_flakey + +status=0 +exit diff --git a/tests/generic/482.out b/tests/generic/482.out new file mode 100644 index ..7e3ed139 --- /dev/null +++ b/tests/generic/482.out @@ -0,0 +1,10 @@ +QA output created by 482 +File foo fiemap: +0: [0..511]: data +1: [512..2559]: unwritten +File bar fiemap: +0: [0..511]: data +1: [512..2559]: unwritten +File baz fiemap: +0: [0..511]: data +1: [512..6143]: unwritten diff --git a/tests/generic/group b/tests/generic/group index
[PATCH] Btrfs: fix loss of prealloc extents past i_size after fsync log replay
From: Filipe MananaCurrently if we allocate extents beyond an inode's i_size (through the fallocate system call) and then fsync the file, we log the extents but after a power failure we replay them and then immediately drop them. This behaviour happens since about 2009, commit c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log"), because it marks the inode as an orphan instead of dropping any extents beyond i_size before replaying logged extents, so after the log replay, and while the mount operation is still ongoing, we find the inode marked as an orphan and then perform a truncation (drop extents beyond the inode's i_size). Because the processing of orphan inodes is still done right after replaying the log and before the mount operation finishes, the intention of that commit does not make any sense (at least as of today). However reverting that behaviour is not enough, because we can not simply discard all extents beyond i_size and then replay logged extents, because we risk dropping extents beyond i_size created in past transactions, for example: add prealloc extent beyond i_size fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode transaction commit add another prealloc extent beyond i_size fsync - triggers the fast fsync path power failure In that scenario, we would drop the first extent and then replay the second one. To fix this just make sure that all prealloc extents beyond i_size are logged, and if we find too many (which is far from a common case), fallback to a full transaction commit (like we do when logging regular extents in the fast fsync path). Trivial reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo $ sync $ xfs_io -c "falloc -k 256K 1M" /mnt/foo $ xfs_io -c "fsync" /mnt/foo # mount to replay log $ mount /dev/sdb /mnt # at this point the file only has one extent, at offset 0, size 256K A test case for fstests follows soon, covering multiple scenarios that involve adding prealloc extents with previous shrinking truncates and without such truncates. Fixes: c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log") Signed-off-by: Filipe Manana --- fs/btrfs/tree-log.c | 63 - 1 file changed, 58 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index 70afd1085033..eb3a41269b0e 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -2457,13 +2457,41 @@ static int replay_one_buffer(struct btrfs_root *log, struct extent_buffer *eb, if (ret) break; - /* for regular files, make sure corresponding -* orphan item exist. extents past the new EOF -* will be truncated later by orphan cleanup. + /* +* Before replaying extents, truncate the inode to its +* size. We need to do it now and not after log replay +* because before an fsync we can have prealloc extents +* added beyond the inode's i_size. If we did it after, +* through orphan cleanup for example, we would drop +* those prealloc extents just after replaying them. */ if (S_ISREG(mode)) { - ret = insert_orphan_item(wc->trans, root, -key.objectid); + struct inode *inode; + u64 from; + + inode = read_one_inode(root, key.objectid); + if (!inode) { + ret = -EIO; + break; + } + from = ALIGN(i_size_read(inode), +root->fs_info->sectorsize); + ret = btrfs_drop_extents(wc->trans, root, inode, +from, (u64)-1, 1); + /* +* If the nlink count is zero here, the iput +* will free the inode. We bump it to make +* sure it doesn't get freed until the link +* count fixup is done. +*/ + if (!ret) { + if (inode->i_nlink == 0) + inc_nlink(inode); + /* Update link count and nbytes. */ + ret = btrfs_update_inode(wc->trans, +
[PATCH] btrfs-progs: Use more loose open ctree flags for dump-tree and restore
Corrupted extent tree (either the root node or leaf) can normally block us from open the fs. As normally open_ctree() has the following call chain: __open_ctree_fd() |- btrfs_setup_all_roots() |- btrfs_read_block_groups() And we will search block group items in extent tree. And considering how block group items are scattered around the whole extent tree, any error would block the fs from being mounted. Fortunately, we already have OPEN_CTREE_NO_BLOCK_GROUPS flags to disable block group items search, which will not only allow us to open some fs, but also hugely speed up open time. Currently dump-tree and btrfs-restore is ensured that they care nothing about block group items. So specify OPEN_CTREE_NO_BLOCK_GROUPS flag as default. Also fix a typo where dump-tree is using OPEN_CTREE_FS_PARTIAL, which should be OPEN_CTREE_PARTIAL. This makes dump-tree do more check and can sometimes fail to open certain filesystems. Reported-by: Christoph Anton MittererFixes: 8698a2b9ba89 ("btrfs-progs: Allow inspect dump-tree to show specified tree block even some tree roots are corrupted") Signed-off-by: Qu Wenruo --- cmds-inspect-dump-tree.c | 4 +++- cmds-restore.c | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c index 7defb7164a49..8be976041543 100644 --- a/cmds-inspect-dump-tree.c +++ b/cmds-inspect-dump-tree.c @@ -303,7 +303,9 @@ int cmd_inspect_dump_tree(int argc, char **argv) int uuid_tree_only = 0; int roots_only = 0; int root_backups = 0; - unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL; + /* Speed up open_ctree() and continue if extent tree is corrupted */ + unsigned open_ctree_flags = OPEN_CTREE_PARTIAL | + OPEN_CTREE_NO_BLOCK_GROUPS; u64 block_bytenr; struct btrfs_root *tree_root_scan; u64 tree_id = 0; diff --git a/cmds-restore.c b/cmds-restore.c index ade35f0f880f..b43bd2ac6502 100644 --- a/cmds-restore.c +++ b/cmds-restore.c @@ -1282,7 +1282,8 @@ static struct btrfs_root *open_fs(const char *dev, u64 root_location, for (i = super_mirror; i < BTRFS_SUPER_MIRROR_MAX; i++) { bytenr = btrfs_sb_offset(i); fs_info = open_ctree_fs_info(dev, bytenr, root_location, 0, -OPEN_CTREE_PARTIAL); +OPEN_CTREE_PARTIAL | +OPEN_CTREE_NO_BLOCK_GROUPS); if (fs_info) break; fprintf(stderr, "Could not open root, trying backup super\n"); -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/1] btrfs-progs: inspect-dump-tree: Allow '-b|--block' to be specified multiple times
Reuse extent-cache facility to record multiple bytenr so '-b|--block' can be specified multiple times. Despite that, add a sector size alignment check before we try to print a tree block. (Please note that, nodesize alignment check is not suitable here as meta chunk start bytenr could be unaligned to nodesize) Signed-off-by: Qu Wenruo--- changelog: v2: Fix memory leak detected by asan. Fix NULL pointer derefenrece detected by asan. --- Documentation/btrfs-inspect-internal.asciidoc | 2 +- cmds-inspect-dump-tree.c | 109 +++--- 2 files changed, 91 insertions(+), 20 deletions(-) diff --git a/Documentation/btrfs-inspect-internal.asciidoc b/Documentation/btrfs-inspect-internal.asciidoc index e2db64660b9a..ba8529f57660 100644 --- a/Documentation/btrfs-inspect-internal.asciidoc +++ b/Documentation/btrfs-inspect-internal.asciidoc @@ -86,7 +86,7 @@ the respective tree root block offset -u|--uuid print only the uuid tree information, empty output if the tree does not exist -b -print info of the specified block only +print info of the specified block only, can be specified multiple times. --follow use with '-b', print all children tree blocks of '' -t diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c index b0cd49b32664..7defb7164a49 100644 --- a/cmds-inspect-dump-tree.c +++ b/cmds-inspect-dump-tree.c @@ -198,11 +198,92 @@ const char * const cmd_inspect_dump_tree_usage[] = { "-R|--backups same as --roots plus print backup root info", "-u|--uuid print only the uuid tree", "-b|--block print info from the specified block only", + " can be specified multile times", "-t|--tree print only tree with the given id (string or number)", "--follow use with -b, to show all children tree blocks of ", NULL }; +/* + * Helper function to record all tree block bytenr so we don't need to put + * all code into deep indent. + * + * Return >0 if we hit a duplicated bytenr (already recorded) + * Return 0 if nothing went wrong + * Return <0 if error happens (ENOMEM) + * + * For != 0 return value, all warning/error will be outputted by this function. + */ +static int dump_add_tree_block(struct cache_tree *tree, u64 bytenr) +{ + int ret; + + /* +* We don't really care about the size and we don't have +* nodesize before we open the fs, so just use 1 as size here. +*/ + ret = add_cache_extent(tree, bytenr, 1); + if (ret == -EEXIST) { + warning("tree block bytenr %llu is duplicated", bytenr); + return 1; + } + if (ret < 0) { + error("failed to record tree block bytenr %llu: %d(%s)", + bytenr, ret, strerror(-ret)); + return ret; + } + return ret; +} + +/* + * Print all tree blocks recorded. + * All tree block bytenr record will also be freed in this function. + * + * Return 0 if nothing wrong happened for *each* tree blocks + * Return <0 if anything wrong happened, and return value will be the last + * error. + */ +static int dump_print_tree_blocks(struct btrfs_fs_info *fs_info, + struct cache_tree *tree, bool follow) +{ + struct cache_extent *ce; + struct extent_buffer *eb; + u64 bytenr; + int ret = 0; + + ce = first_cache_extent(tree); + while (ce) { + bytenr = ce->start; + + /* +* Please note that here we can't check it against nodesize, +* as it's possible a chunk is just aligned to sectorsize but +* not aligned to nodesize. +*/ + if (!IS_ALIGNED(bytenr, fs_info->sectorsize)) { + error( + "tree block bytenr %llu is not aligned to sectorsize %u", + bytenr, fs_info->sectorsize); + ret = -EINVAL; + goto next; + } + + eb = read_tree_block(fs_info, bytenr, 0); + if (!extent_buffer_uptodate(eb)) { + error("failed to read tree block %llu", bytenr); + ret = -EIO; + goto next; + } + btrfs_print_tree(eb, follow); + free_extent_buffer(eb); +next: + remove_cache_extent(tree, ce); + free(ce); + ce = first_cache_extent(tree); + } + return ret; +} + int cmd_inspect_dump_tree(int argc, char **argv) { struct btrfs_root *root; @@ -213,6 +294,7 @@ int cmd_inspect_dump_tree(int argc, char **argv) struct extent_buffer *leaf; struct btrfs_disk_key disk_key; struct btrfs_key found_key; + struct cache_tree block_root; /* for multiple --block parameters */ char
[PATCH 0/1] btrfs-progs: dump-tree: allow -b multiple times
Although just one patch, it needs the extent buffer cleanup code as basis, so please fetch it from my github repo: https://github.com/adam900710/btrfs-progs/tree/dump_tree_multi_blocks This patch allow -b to be specified multiple times, and add extra basic check for them. For later enhancement (Issue: #113) it needs extra work to handle special roots. Qu Wenruo (1): btrfs-progs: inspect-dump-tree: Allow '-b|--block' to be specified multiple times Documentation/btrfs-inspect-internal.asciidoc | 2 +- cmds-inspect-dump-tree.c | 108 ++ 2 files changed, 89 insertions(+), 21 deletions(-) -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] btrfs-progs: inspect-dump-tree: Allow '-b|--block' to be specified multiple times
Reuse extent-cache facility to record multiple bytenr so '-b|--block' can be specified multiple times. Despite that, add a sector size alignment check before we try to print a tree block. (Please note that, nodesize alignment check is not suitable here as meta chunk start bytenr could be unaligned to nodesize) Signed-off-by: Qu Wenruo--- Documentation/btrfs-inspect-internal.asciidoc | 2 +- cmds-inspect-dump-tree.c | 108 ++ 2 files changed, 89 insertions(+), 21 deletions(-) diff --git a/Documentation/btrfs-inspect-internal.asciidoc b/Documentation/btrfs-inspect-internal.asciidoc index e2db64660b9a..ba8529f57660 100644 --- a/Documentation/btrfs-inspect-internal.asciidoc +++ b/Documentation/btrfs-inspect-internal.asciidoc @@ -86,7 +86,7 @@ the respective tree root block offset -u|--uuid print only the uuid tree information, empty output if the tree does not exist -b -print info of the specified block only +print info of the specified block only, can be specified multiple times. --follow use with '-b', print all children tree blocks of '' -t diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c index b0cd49b32664..fb3ccfc9d0ba 100644 --- a/cmds-inspect-dump-tree.c +++ b/cmds-inspect-dump-tree.c @@ -203,6 +203,85 @@ const char * const cmd_inspect_dump_tree_usage[] = { NULL }; +/* + * Helper function to record all tree block bytenr so we don't need to put + * all code into deep indent. + * + * Return >0 if we hit a duplicated bytenr (already recorded) + * Return 0 if nothing went wrong + * Return <0 if error happens (ENOMEM) + * + * For != 0 return value, all warning/error will be outputted by this function. + */ +static int dump_add_tree_block(struct cache_tree *tree, u64 bytenr) +{ + int ret; + + /* +* We don't really care about the size and we don't have +* nodesize before we open the fs, so just use 1 as size here. +*/ + ret = add_cache_extent(tree, bytenr, 1); + if (ret == -EEXIST) { + warning("tree block bytenr %llu is duplicated", bytenr); + return 1; + } + if (ret < 0) { + error("failed to record tree block bytenr %llu: %d(%s)", + bytenr, ret, strerror(-ret)); + return ret; + } + return ret; +} + +/* + * Print all tree blocks recorded. + * All tree block bytenr record will also be freed in this function. + * + * Return 0 if nothing wrong happened for *each* tree blocks + * Return <0 if anything wrong happened, and return value will be the last + * error. + */ +static int dump_print_tree_blocks(struct btrfs_fs_info *fs_info, + struct cache_tree *tree, bool follow) +{ + struct cache_extent *ce; + struct extent_buffer *eb; + u64 bytenr; + int ret = 0; + + ce = first_cache_extent(tree); + while (ce) { + bytenr = ce->start; + + /* +* Please note that here we can't check it against nodesize, +* as it's possible a chunk is just aligned to sectorsize but +* not aligned to nodesize. +*/ + if (!IS_ALIGNED(bytenr, fs_info->sectorsize)) { + error( + "tree block bytenr %llu is not aligned to sectorsize %u", + bytenr, fs_info->sectorsize); + ret = -EINVAL; + goto next; + } + + eb = read_tree_block(fs_info, bytenr, 0); + if (!extent_buffer_uptodate(eb)) { + error("failed to read tree block %llu", bytenr); + ret = -EIO; + goto next; + } + btrfs_print_tree(eb, follow); + free_extent_buffer(eb); +next: + remove_cache_extent(tree, ce); + ce = first_cache_extent(tree); + } + return ret; +} + int cmd_inspect_dump_tree(int argc, char **argv) { struct btrfs_root *root; @@ -213,6 +292,7 @@ int cmd_inspect_dump_tree(int argc, char **argv) struct extent_buffer *leaf; struct btrfs_disk_key disk_key; struct btrfs_key found_key; + struct cache_tree block_root; /* for multiple --block parameters */ char uuidbuf[BTRFS_UUID_UNPARSED_SIZE]; int ret; int slot; @@ -222,11 +302,12 @@ int cmd_inspect_dump_tree(int argc, char **argv) int roots_only = 0; int root_backups = 0; unsigned open_ctree_flags = OPEN_CTREE_FS_PARTIAL; - u64 block_only = 0; + u64 block_bytenr; struct btrfs_root *tree_root_scan; u64 tree_id = 0; bool follow = false; + cache_tree_init(_root); while (1) { int c; enum { GETOPT_VAL_FOLLOW = 256 }; @@ -268,7 +349,10 @@ int