Re: Crash when unraring large archives on btrfs-filesystem
On 7.02.2018 21:57, Stefan Malte Schumacher wrote: > Hello, > > > I have encountered what I think is a problem with btrfs, which causes > my file server to become unresponsive. But let‘s start with the basic > information: > > uname -a = Linux mars 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 > (2018-01-04) x86_64 GNU/Linux > > btrfs –version = btrfs-progs v4.7.3 > > > Label: none uuid: 1609e4e1-4037-4d31-bf12-f84a691db5d8 > > Total devices 5 FS bytes used 7.15TiB > > devid 1 size 3.64TiB used 2.90TiB path /dev/sda > > devid 2 size 3.64TiB used 2.90TiB path /dev/sdb > > devid 3 size 3.64TiB used 2.90TiB path /dev/sdc > > devid 4 size 3.64TiB used 2.90TiB path /dev/sdd > > devid 5 size 3.64TiB used 2.90TiB path /dev/sde > > > Data, RAID1: total=7.25TiB, used=7.14TiB > > System, RAID1: total=40.00MiB, used=1.02MiB > > Metadata, RAID1: total=9.00GiB, used=7.75GiB > > GlobalReserve, single: total=512.00MiB, used=0.00B > > > The following entry in kern.log seems to be the point where it all > started and which causes me to believe that the problem is related to > btrfs. At that time the server was unraring > > a large archive stored on the btrfs filesystem. > > > Feb 5 21:22:42 mars kernel: [249979.829318] BTRFS info (device sda): > The free space cache file (4701944807424) is invalid. skip it This tells you that your freespace cahe is likely corrupted, this is not that critical but it's highly recommended you rebuild it. You can do that by mounting your file system with the 'clear_cache' mount option. For more information check https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5) > > Feb 5 21:22:42 mars kernel: [249979.829318] > > Feb 5 21:25:12 mars kernel: [250090.149452] unrar: page allocation > stalls for 12104ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE) > > Feb 5 21:25:12 mars kernel: [250116.605420] [] ? > alloc_pages_vma+0xae/0x260 > > Feb 5 21:25:12 mars kernel: [250116.605422] [] ? > __read_swap_cache_async+0x118/0x1c0 > > Feb 5 21:25:12 mars kernel: [250116.605423] [] ? > read_swap_cache_async+0x24/0x60 > > Feb 5 21:25:12 mars kernel: [250116.605425] [] ? > swapin_readahead+0x1a9/0x210 > > Feb 5 21:25:12 mars kernel: [250116.605427] [] ? > radix_tree_lookup_slot+0x1e/0x50 > > Feb 5 21:25:12 mars kernel: [250116.605429] [] ? > find_get_entry+0x1b/0x100 > > Feb 5 21:25:12 mars kernel: [250116.605431] [] ? > pagecache_get_page+0x30/0x2b0 > > Feb 5 21:25:12 mars kernel: [250116.605434] [] ? > do_swap_page+0x2a3/0x750 > > Feb 5 21:25:12 mars kernel: [250116.605436] [] ? > handle_mm_fault+0x892/0x12d0 > > Feb 5 21:25:12 mars kernel: [250116.605438] [] ? > __do_page_fault+0x25c/0x500 > > Feb 5 21:25:12 mars kernel: [250116.605440] [] ? > page_fault+0x28/0x30 > > Feb 5 21:25:12 mars kernel: [250116.605442] [] ? > __get_user_8+0x1b/0x25 > > Feb 5 21:25:12 mars kernel: [250116.605445] [] ? > exit_robust_list+0x30/0x110 > > Feb 5 21:25:12 mars kernel: [250116.605447] [] ? > mm_release+0xf8/0x130 > > Feb 5 21:25:12 mars kernel: [250116.605449] [] ? > do_exit+0x150/0xae0 > > Feb 5 21:25:12 mars kernel: [250116.605450] [] ? > do_group_exit+0x3a/0xa0 > > Feb 5 21:25:12 mars kernel: [250116.605452] [] ? > get_signal+0x297/0x640 > > Feb 5 21:25:12 mars kernel: [250116.605454] [] ? > do_signal+0x36/0x6a0 > > Feb 5 21:25:12 mars kernel: [250116.605457] [] ? > exit_to_usermode_loop+0x71/0xb0 > > Feb 5 21:25:12 mars kernel: [250116.605459] [] ? > syscall_return_slowpath+0x54/0x60 > > Feb 5 21:25:12 mars kernel: [250116.605461] [] ? > system_call_fast_compare_end+0xb5/0xb7 THis call trace essentially tells you that your server sort of run out of memory and you began to swap in i.e. read from the disk and it took a rather long time (12s). Here no btrfs it is involved at all. > > Feb 5 21:25:12 mars kernel: [250116.605462] Mem-Info: > > Feb 5 21:25:12 mars kernel: [250116.605466] active_anon:44 > inactive_anon:69 isolated_anon:0 > > Feb 5 21:25:12 mars kernel: [250116.605466] active_file:3557188 > inactive_file:407932 isolated_file:1024 > > Feb 5 21:25:12 mars kernel: [250116.605466] unevictable:0 dirty:409214 > writeback:62 unstable:0 > > Feb 5 21:25:12 mars kernel: [250116.605466] slab_reclaimable:37022 > slab_unreclaimable:10475 > > Feb 5 21:25:12 mars kernel: [250116.605466] mapped:2329 shmem:21 > pagetables:3522 bounce:0 > > Feb 5 21:25:12 mars kernel: [250116.605466] free:34036 free_pcp:291 free_cma:0 > > Feb 5 21:25:12 mars kernel: [250116.605471] Node 0 active_anon:176kB > inactive_anon:276kB active_file:14228752kB inactive_file:1631728kB > unevictable:0kB isolated(anon):0kB isolated(file):4096kB mapped:9316kB > dirty:1636856kB writeback:248kB shmem:84kB shmem_thp: 0kB > shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB > pages_scanned:13631918 all_unreclaimable? no > > > Searching for "btrfs" in kern.log shows a lot of entries for kern.log > and kern.log.1 but but none before that point of time. I think that > there is a relation
Re: [PATCH v2] btrfs-progs: ctree: Add extra level check for read_node_slot()
On 8.02.2018 02:59, Qu Wenruo wrote: > Strangely, we have level check in btrfs_print_tree() while we don't have > the same check in read_node_slot(). > > That's to say, for the following corruption, btrfs_search_slot() or > btrfs_next_leaf() can return invalid leaf: > > Parent eb: > node XX level 1 > ^^^ > Child should be leaf (level 0) > ... > key (XXX XXX XXX) block YY > > Child eb: > leaf YY level 1 > ^^^ > Something went wrong now > > And for the corrupted leaf returned, later caller can be screwed up > easily. > > Although the root cause (powerloss, but still something wrong breaking > metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to > avoid SEGV. > > Reported-by: Ralph Gauges> Signed-off-by: Qu Wenruo Reviewed-by: Nikolay Borisov > --- > changlog: > v2: > Check if the extent buffer is up-to-date before checking its level to > avoid possible NULL pointer access. > --- > ctree.c | 16 +++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/ctree.c b/ctree.c > index 4fc33b14000a..430805e3043f 100644 > --- a/ctree.c > +++ b/ctree.c > @@ -22,6 +22,7 @@ > #include "repair.h" > #include "internal.h" > #include "sizes.h" > +#include "messages.h" > > static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root > *root, struct btrfs_path *path, int level); > @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct > btrfs_key *key, > struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info, > struct extent_buffer *parent, int slot) > { > + struct extent_buffer *ret; > int level = btrfs_header_level(parent); > + > if (slot < 0) > return NULL; > if (slot >= btrfs_header_nritems(parent)) > @@ -649,8 +652,19 @@ struct extent_buffer *read_node_slot(struct > btrfs_fs_info *fs_info, > if (level == 0) > return NULL; > > - return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), > + ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), > btrfs_node_ptr_generation(parent, slot)); > + if (!extent_buffer_uptodate(ret)) > + return ERR_PTR(-EIO); > + > + if (btrfs_header_level(ret) != level - 1) { > + error("child eb corrupted: parent bytenr=%llu item=%d parent > level=%d child level=%d", > + btrfs_header_bytenr(parent), slot, > + btrfs_header_level(parent), btrfs_header_level(ret)); > + free_extent_buffer(ret); > + return ERR_PTR(-EIO); > + } > + return ret; > } > > static int balance_level(struct btrfs_trans_handle *trans, > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: delete function btrfs_close_extra_devices()
ping ? Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 3/3] btrfs-progs: Add readme for export testsuits
Add the readme of command for export testsuits. And add the excute method of exported testsuits. Signed-off-by: Gu Jinxiang--- tests/README.md | 13 + 1 file changed, 13 insertions(+) diff --git a/tests/README.md b/tests/README.md index 04d2ce2a..23f35cfc 100644 --- a/tests/README.md +++ b/tests/README.md @@ -48,6 +48,19 @@ $ TEST=001\* ./fsck-tests.sh will run the first test in fsck-tests subdirectory. +## Package testsuit + +The tests can be export as a btrfs-progs-tests.tar.gz current path. Use: + +```shell +$ make testsuite +``` + + +And, after decompress btrfs-progs-tests.tar.gz, test can be run selectively +from `tests/` directory introduced above. + + ## Test structure *tests/fsck-tests/:* -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/3] btrfs-progs: Add make testsuite command for export tests
Export the testsuite files to a separate tar. Since fsck tests depend on btrfs-corrupt-block, and misc tests depends on both btrfs-corrupt-block and fssum, so set it as prerequisites for package commad. Because, althougth fssum can be generated by source that are all in tests directory, and has no rely on the btrfs's structure. But btrfs-corrupt-block deeply relys on btrfs's structure. For consistency, at the present stage, generete the two when create test tar. Signed-off-by: Gu Jinxiang--- .gitignore| 1 + Makefile | 4 tests/export-tests.sh | 37 + testsuites-list | 22 ++ 4 files changed, 64 insertions(+) create mode 100755 tests/export-tests.sh create mode 100644 testsuites-list diff --git a/.gitignore b/.gitignore index 8e607f6e..a41ad8ce 100644 --- a/.gitignore +++ b/.gitignore @@ -43,6 +43,7 @@ libbtrfs.so.0.1 library-test library-test-static /fssum +testsuites-id /tests/*-tests-results.txt /tests/test-console.txt diff --git a/Makefile b/Makefile index 6369e8f4..7eab0f4f 100644 --- a/Makefile +++ b/Makefile @@ -333,6 +333,10 @@ test-inst: all test: test-fsck test-mkfs test-convert test-misc test-fuzz test-cli +testsuite: btrfs-corrupt-block fssum + @echo "Export tests as a package" + $(Q)bash tests/export-tests.sh + # # NOTE: For static compiles, you need to have all the required libs # static equivalent available diff --git a/tests/export-tests.sh b/tests/export-tests.sh new file mode 100755 index ..0ed7dd99 --- /dev/null +++ b/tests/export-tests.sh @@ -0,0 +1,37 @@ +#!/bin/bash +# export the testsuite files to a separate tar + +TESTSUITES_LIST_FILE=$PWD/testsuites-list +if ! [ -f $TESTSUITES_LIST_FILE ];then + echo "testsuites list file is not exsit." + exit 1 +fi + +TESTSUITES_LIST=$(cat $TESTSUITES_LIST_FILE) +if [ -z "$TESTSUITES_LIST" ]; then + echo "no file be list in testsuites-list" + exit 1 +fi + +DEST="btrfs-progs-tests.tar.gz" +if [ -f $DEST ];then + echo "remove exsit package: " $DEST + rm $DEST +fi + +TEST_ID=$PWD/testsuites-id +if [ -f $TEST_ID ];then + rm $TEST_ID +fi +VERSION=`./version.sh` +TIMESTAMP=`date -u "+%Y-%m-%d %T %Z"` + +echo "git version: " $VERSION > $TEST_ID +echo "this tar is created in: " $TIMESTAMP >> $TEST_ID + +echo "begin create tar: " $DEST +tar --exclude-vcs-ignores -zScf $DEST -C ../ $TESTSUITES_LIST +if [ $? -eq 0 ]; then + echo "create tar successfully." +fi +rm $TEST_ID diff --git a/testsuites-list b/testsuites-list new file mode 100644 index ..a24591f5 --- /dev/null +++ b/testsuites-list @@ -0,0 +1,22 @@ +btrfs-progs/testsuites-id +btrfs-progs/fssum +btrfs-progs/btrfs-corrupt-block +btrfs-progs/Documentation/ +btrfs-progs/tests/cli-tests +btrfs-progs/tests/cli-tests.sh +btrfs-progs/tests/common +btrfs-progs/tests/common.convert +btrfs-progs/tests/common.local +btrfs-progs/tests/convert-tests +btrfs-progs/tests/convert-tests.sh +btrfs-progs/tests/fsck-tests +btrfs-progs/tests/fsck-tests.sh +btrfs-progs/tests/fuzz-tests/ +btrfs-progs/tests/fuzz-tests.sh +btrfs-progs/tests/misc-tests/ +btrfs-progs/tests/misc-tests.sh +btrfs-progs/tests/mkfs-tests/ +btrfs-progs/tests/mkfs-tests.sh +btrfs-progs/tests/README.md +btrfs-progs/tests/scan-results.sh +btrfs-progs/tests/test-console.sh -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 0/3] Add support for export testsuits
Achieved: 1. export testsuite by: $ make testsuite files list in testsuites-list will be added into tarball btrfs-progs-tests.tar.gz. 2. after decompress btrfs-progs-tests.tar.gz, run test by: $ TEST=`MASK` ./tests/mkfs-tests.sh and, without MASK also be ok. replenish: $ tar -xzvf ./btrfs-progs-tests.tar.gz $ ls btrfs-progs tests directory and other files is in btrfs-progs. Changelog: v5->v4: modify patch2. make TEST_TOP to represent tests directory. and introduce INTERNAL_BIN for internal binaries. v4->v3: modify patch2. 1.keep TOP used for binaries, and introduce TEST_TOP for other resources. v3->v2: patch1: 1.change command from `make package` to `make testsuite` 2.create btrfs-progs-tests.tar.gz in the current directory, so remove EXPORT variable. 3.add a listfile which list files to be added into tarball. and, add Documentation into the list. And revert the patch3 in v2. 4.add some identification info of tarball 5.add temporary file testsuites-id to .gitignore patch3: modify the readme according to the change of patch1. v2->v1: big change of realize idea. from use EXEC parameter given to run the testsuite to export the testsuite files to a separate tar, run from a script. Gu Jinxiang (3): btrfs-progs: Add make testsuite command for export tests btrfs-progs: introduce TEST_TOP and INTERNAL_BIN for tests directory and internal binaries btrfs-progs: Add readme for export testsuits .gitignore | 1 + Makefile | 4 +++ tests/README.md| 13 tests/cli-tests.sh | 15 ++--- tests/cli-tests/001-btrfs/test.sh | 2 +- .../cli-tests/002-balance-full-no-filters/test.sh | 2 +- tests/cli-tests/003-fi-resize-args/test.sh | 2 +- .../cli-tests/004-send-parent-multi-subvol/test.sh | 2 +- tests/cli-tests/005-qgroup-show/test.sh| 2 +- tests/cli-tests/006-qgroup-show-sync/test.sh | 2 +- tests/cli-tests/007-check-force/test.sh| 2 +- .../008-subvolume-get-set-default/test.sh | 2 +- tests/common | 16 ++ tests/convert-tests.sh | 15 ++--- tests/convert-tests/001-ext2-basic/test.sh | 4 +-- tests/convert-tests/002-ext3-basic/test.sh | 4 +-- tests/convert-tests/003-ext4-basic/test.sh | 4 +-- .../004-ext2-backup-superblock-ranges/test.sh | 2 +- .../convert-tests/005-delete-all-rollback/test.sh | 4 +-- tests/convert-tests/006-large-hole-extent/test.sh | 4 +-- .../007-unsupported-block-sizes/test.sh| 4 +-- tests/convert-tests/008-readonly-image/test.sh | 4 +-- tests/convert-tests/009-common-inode-flags/test.sh | 4 +-- tests/convert-tests/010-reiserfs-basic/test.sh | 4 +-- .../011-reiserfs-delete-all-rollback/test.sh | 4 +-- .../012-reiserfs-large-hole-extent/test.sh | 4 +-- .../013-reiserfs-common-inode-flags/test.sh| 4 +-- .../014-reiserfs-tail-handling/test.sh | 4 +-- .../015-no-rollback-after-balance/test.sh | 4 +-- tests/export-tests.sh | 37 ++ tests/fsck-tests.sh| 17 +++--- tests/fsck-tests/006-bad-root-items/test.sh| 2 +- tests/fsck-tests/012-leaf-corruption/test.sh | 2 +- tests/fsck-tests/013-extent-tree-rebuild/test.sh | 4 +-- tests/fsck-tests/018-leaf-crossing-stripes/test.sh | 2 +- .../fsck-tests/019-non-skinny-false-alert/test.sh | 2 +- tests/fsck-tests/020-extent-ref-cases/test.sh | 2 +- .../021-partially-dropped-snapshot-case/test.sh| 2 +- tests/fsck-tests/022-qgroup-rescan-halfway/test.sh | 2 +- tests/fsck-tests/023-qgroup-stack-overflow/test.sh | 2 +- tests/fsck-tests/024-clear-space-cache/test.sh | 2 +- tests/fsck-tests/025-file-extents/test.sh | 2 +- tests/fsck-tests/026-bad-dir-item-name/test.sh | 2 +- tests/fsck-tests/027-tree-reloc-tree/test.sh | 2 +- .../028-unaligned-super-dev-sizes/test.sh | 2 +- tests/fuzz-tests.sh| 15 ++--- .../fuzz-tests/001-simple-check-unmounted/test.sh | 4 +-- tests/fuzz-tests/002-simple-image/test.sh | 4 +-- tests/fuzz-tests/003-multi-check-unmounted/test.sh | 4 +-- tests/fuzz-tests/004-simple-dump-tree/test.sh | 4 +-- tests/fuzz-tests/005-simple-dump-super/test.sh | 4 +-- tests/fuzz-tests/006-simple-tree-stats/test.sh | 4 +-- tests/fuzz-tests/007-simple-super-recover/test.sh | 4 +-- tests/fuzz-tests/008-simple-chunk-recover/test.sh | 4 +-- tests/fuzz-tests/009-simple-zero-log/test.sh | 4 +-- tests/misc-tests.sh| 17 +++---
[PATCH v5 2/3] btrfs-progs: introduce TEST_TOP and INTERNAL_BIN for tests directory and internal binaries
Use TEST_TOP for tests directory. And INTERNAL_BIN for internal binaries. Signed-off-by: Gu Jinxiang--- tests/cli-tests.sh | 15 ++- tests/cli-tests/001-btrfs/test.sh | 2 +- tests/cli-tests/002-balance-full-no-filters/test.sh | 2 +- tests/cli-tests/003-fi-resize-args/test.sh | 2 +- tests/cli-tests/004-send-parent-multi-subvol/test.sh| 2 +- tests/cli-tests/005-qgroup-show/test.sh | 2 +- tests/cli-tests/006-qgroup-show-sync/test.sh| 2 +- tests/cli-tests/007-check-force/test.sh | 2 +- tests/cli-tests/008-subvolume-get-set-default/test.sh | 2 +- tests/common| 16 ++-- tests/convert-tests.sh | 15 ++- tests/convert-tests/001-ext2-basic/test.sh | 4 ++-- tests/convert-tests/002-ext3-basic/test.sh | 4 ++-- tests/convert-tests/003-ext4-basic/test.sh | 4 ++-- .../004-ext2-backup-superblock-ranges/test.sh | 2 +- tests/convert-tests/005-delete-all-rollback/test.sh | 4 ++-- tests/convert-tests/006-large-hole-extent/test.sh | 4 ++-- tests/convert-tests/007-unsupported-block-sizes/test.sh | 4 ++-- tests/convert-tests/008-readonly-image/test.sh | 4 ++-- tests/convert-tests/009-common-inode-flags/test.sh | 4 ++-- tests/convert-tests/010-reiserfs-basic/test.sh | 4 ++-- .../011-reiserfs-delete-all-rollback/test.sh| 4 ++-- .../012-reiserfs-large-hole-extent/test.sh | 4 ++-- .../013-reiserfs-common-inode-flags/test.sh | 4 ++-- tests/convert-tests/014-reiserfs-tail-handling/test.sh | 4 ++-- .../convert-tests/015-no-rollback-after-balance/test.sh | 4 ++-- tests/fsck-tests.sh | 17 - tests/fsck-tests/006-bad-root-items/test.sh | 2 +- tests/fsck-tests/012-leaf-corruption/test.sh| 2 +- tests/fsck-tests/013-extent-tree-rebuild/test.sh| 4 ++-- tests/fsck-tests/018-leaf-crossing-stripes/test.sh | 2 +- tests/fsck-tests/019-non-skinny-false-alert/test.sh | 2 +- tests/fsck-tests/020-extent-ref-cases/test.sh | 2 +- .../021-partially-dropped-snapshot-case/test.sh | 2 +- tests/fsck-tests/022-qgroup-rescan-halfway/test.sh | 2 +- tests/fsck-tests/023-qgroup-stack-overflow/test.sh | 2 +- tests/fsck-tests/024-clear-space-cache/test.sh | 2 +- tests/fsck-tests/025-file-extents/test.sh | 2 +- tests/fsck-tests/026-bad-dir-item-name/test.sh | 2 +- tests/fsck-tests/027-tree-reloc-tree/test.sh| 2 +- tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh | 2 +- tests/fuzz-tests.sh | 15 ++- tests/fuzz-tests/001-simple-check-unmounted/test.sh | 4 ++-- tests/fuzz-tests/002-simple-image/test.sh | 4 ++-- tests/fuzz-tests/003-multi-check-unmounted/test.sh | 4 ++-- tests/fuzz-tests/004-simple-dump-tree/test.sh | 4 ++-- tests/fuzz-tests/005-simple-dump-super/test.sh | 4 ++-- tests/fuzz-tests/006-simple-tree-stats/test.sh | 4 ++-- tests/fuzz-tests/007-simple-super-recover/test.sh | 4 ++-- tests/fuzz-tests/008-simple-chunk-recover/test.sh | 4 ++-- tests/fuzz-tests/009-simple-zero-log/test.sh| 4 ++-- tests/misc-tests.sh | 17 - tests/misc-tests/001-btrfstune-features/test.sh | 2 +- tests/misc-tests/002-uuid-rewrite/test.sh | 6 +++--- tests/misc-tests/003-zero-log/test.sh | 4 ++-- tests/misc-tests/004-shrink-fs/test.sh | 2 +- .../005-convert-progress-thread-crash/test.sh | 2 +- tests/misc-tests/006-image-on-missing-device/test.sh| 2 +- tests/misc-tests/007-subvolume-sync/test.sh | 2 +- tests/misc-tests/008-leaf-crossing-stripes/test.sh | 2 +- tests/misc-tests/009-subvolume-sync-must-wait/test.sh | 2 +- tests/misc-tests/010-convert-delete-ext2-subvol/test.sh | 2 +- tests/misc-tests/011-delete-missing-device/test.sh | 2 +- tests/misc-tests/012-find-root-no-result/test.sh| 2 +- tests/misc-tests/013-subvolume-sync-crash/test.sh | 2 +- tests/misc-tests/014-filesystem-label/test.sh | 2 +- tests/misc-tests/015-dump-super-garbage/test.sh | 2 +- tests/misc-tests/016-send-clone-src/test.sh | 2 +- tests/misc-tests/017-recv-stream-malformatted/test.sh | 2 +- tests/misc-tests/018-recv-end-of-stream/test.sh | 2 +- .../019-receive-clones-on-mounted-subvol/test.sh| 4 ++-- tests/misc-tests/020-fix-superblock-corruption/test.sh | 2 +- tests/misc-tests/021-image-multi-devices/test.sh
[PATCH 1/3] btrfs-progs: add prerequisite mkfs.btrfs for test-cli
Since tests/cli-tests/002-balance-full-no-filters/test.sh need the mkfs.btrfs for prerequisite. So add the dependency in Makefile. Signed-off-by: Gu Jinxiang--- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 00e2137..034c943 100644 --- a/Makefile +++ b/Makefile @@ -315,7 +315,7 @@ test-fuzz: btrfs @echo "[TEST] fuzz-tests.sh" $(Q)bash tests/fuzz-tests.sh -test-cli: btrfs +test-cli: btrfs mkfs.btrfs @echo "[TEST] cli-tests.sh" $(Q)bash tests/cli-tests.sh -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] btrfs-progs: add prerequisite btrfs-convert for test-misc
Since tests/misc-tests/005-convert-progress-thread-crash/test.sh need the btrfs-convert for prerequisite. So add the dependency in Makefile. Signed-off-by: Gu Jinxiang--- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 9299411..7ccba62 100644 --- a/Makefile +++ b/Makefile @@ -303,7 +303,7 @@ test-fsck: btrfs btrfs-image btrfs-corrupt-block mkfs.btrfs btrfstune $(Q)bash tests/fsck-tests.sh test-misc: btrfs btrfs-image btrfs-corrupt-block mkfs.btrfs btrfstune fssum \ - btrfs-zero-log btrfs-find-root btrfs-select-super + btrfs-zero-log btrfs-find-root btrfs-select-super btrfs-convert @echo "[TEST] misc-tests.sh" $(Q)bash tests/misc-tests.sh -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] btrfs-progs: add prerequisite btrfs-image for test-fuzz
Since tests/fuzz-tests/002-simple-image/test.sh need the btrfs-image for prerequisite. So add the dependency in Makefile. Signed-off-by: Gu Jinxiang--- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 034c943..9299411 100644 --- a/Makefile +++ b/Makefile @@ -311,7 +311,7 @@ test-mkfs: btrfs mkfs.btrfs @echo "[TEST] mkfs-tests.sh" $(Q)bash tests/mkfs-tests.sh -test-fuzz: btrfs +test-fuzz: btrfs btrfs-image @echo "[TEST] fuzz-tests.sh" $(Q)bash tests/fuzz-tests.sh -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IO Error (.snapshots is not a btrfs subvolume)
08.02.2018 06:03, Chris Murphy пишет: > On Wed, Feb 7, 2018 at 6:26 PM, Nick Gilmourwrote: >> Hi all, >> >> I have successfully restored a snapshot of root but now when I try to How exactly was it done? >> make a new snapshot I get this error: >> IO Error (.snapshots is not a btrfs subvolume). >> My snapshots were within @ which I renamed to @_old. >> What can I do now? How can I move the snapshots from @_old/ into @ and >> be able to make snapshots again? >> >> This is an excerpt of my subvolumes list: >> >> # btrfs subvolume list / >> ID 257 gen 175397 top level 5 path @_old >> ID 258 gen 175392 top level 5 path @pkg >> ID 260 gen 175447 top level 5 path @tmp >> ID 262 gen 19 top level 257 path @_old/var/lib/machines >> ID 268 gen 175441 top level 5 path @test >> ID 291 gen 175394 top level 257 path @_old/.snapshots >> ID 292 gen 1705 top level 291 path @_old/.snapshots/1/snapshot >> ... >> >> ID 3538 gen 175398 top level 291 path @_old/.snapshots/1594/snapshot >> ID 3540 gen 175447 top level 5 path @ >> > > > This is a snapper behavior. It creates .snapshots as a subvolume and > then puts snapshots into that subvolume. If you snapshot a subvolume > that contains another subvolume, the nested subvolume is not snapshot, > instead a plain directory placeholder is created instead. So your > restored snapshot contains a .snapshot directory rather than a > .snapshot subvolume. Possibly if you delete the directory and create a > new subvolume .snapshot, the problem will be fixed. > No, you should create subvolume @/.snapshots and mount it as /.snapshots (and have it in /etc/fstab). Snapshots should always be available in running system under fixed path and this only possible when it is mounted, otherwise after rollback /.snapshots will be lost just like it happened now. Exact subvolume name probably not matters that much, but better stick with what installer does by default. It may matter for grub2 snapshots handling. Also openSUSE expects that actual root is subvolume under /.snapshots which is valid snapper snapshot (i.e. it has valid metadata). Again, not having this may confuse snapper. It may be possible to move @_old/.snapshots into @/.snapshots, although this breaks parent-child relationships those old snapshots cannot be cleaned up without removing old root completely. > I can't tell you how this will confuse snapper though, or how to > unconfuse it. It pretty much expects to be in control of all > snapshots, creation, deletion, and rollbacks. So if you do it manually > for whatever reason, I think it can confuse snapper. > > There was blog post recently outlining how to restore openSUSE root. You may want to search opensuse or opensuse-factory mailing list. Ah found: https://rootco.de/2018-01-19-opensuse-btrfs-subvolumes/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 17/18] btrfs-progs: lowmem check: end of removing parameters @trans in lowmem
Remove @trans in check_chunks_and_extents(). This patch let lowmem repair work again. Signed-off-by: Su Yue--- check/mode-lowmem.c | 13 - 1 file changed, 13 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 40a179f75319..4aad69fc9eb1 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4872,7 +4872,6 @@ out: */ int check_chunks_and_extents_lowmem(struct btrfs_fs_info *fs_info) { - struct btrfs_trans_handle *trans = NULL; struct btrfs_path path; struct btrfs_key old_key; struct btrfs_key key; @@ -4884,14 +4883,6 @@ int check_chunks_and_extents_lowmem(struct btrfs_fs_info *fs_info) root = fs_info->fs_root; - if (repair) { - trans = btrfs_start_transaction(fs_info->extent_root, 1); - if (IS_ERR(trans)) { - error("failed to start transaction before check"); - return PTR_ERR(trans); - } - } - root1 = root->fs_info->chunk_root; ret = check_btrfs_root(root1, 0, 1); err |= ret; @@ -4961,10 +4952,6 @@ out: err &= ~BG_ACCOUNTING_ERROR; } - if (trans) - btrfs_commit_transaction(trans, root->fs_info->extent_root); - btrfs_release_path(); - return err; } -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 15/18] btrfs-progs: lowmem check: remove parameter @trans of check_btrfs_root()
Remove parameters @trans of delete_extent_item() and walk_down_tree_v2(). Note: This patch and next patches cause error in lowmem repair like: "Error: Commit_root already set when starting transaction". This error will disappear after removing @trans finished. Signed-off-by: Su Yue--- check/mode-lowmem.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index d4c8de4e69af..d92278d2993c 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4271,8 +4271,7 @@ out: * Returns <0 Fatal error, must exit the whole check * Returns 0 No errors found */ -static int walk_down_tree(struct btrfs_trans_handle *trans, - struct btrfs_root *root, struct btrfs_path *path, +static int walk_down_tree(struct btrfs_root *root, struct btrfs_path *path, int *level, struct node_refs *nrefs, int ext_ref, int check_all) { @@ -4585,8 +4584,7 @@ out: * Returns 0 represents OK. * Returns >0 represents error bits. */ -static int check_btrfs_root(struct btrfs_trans_handle *trans, - struct btrfs_root *root, unsigned int ext_ref, +static int check_btrfs_root(struct btrfs_root *root, unsigned int ext_ref, int check_all) { struct btrfs_path path; @@ -4631,7 +4629,7 @@ static int check_btrfs_root(struct btrfs_trans_handle *trans, } while (1) { - ret = walk_down_tree(trans, root, , , , + ret = walk_down_tree(root, , , , ext_ref, check_all); if (ret > 0) @@ -4667,7 +4665,7 @@ out: static int check_fs_root(struct btrfs_root *root, unsigned int ext_ref) { reset_cached_block_groups(root->fs_info); - return check_btrfs_root(NULL, root, ext_ref, 0); + return check_btrfs_root(root, ext_ref, 0); } /* @@ -4871,11 +4869,11 @@ int check_chunks_and_extents_lowmem(struct btrfs_fs_info *fs_info) } root1 = root->fs_info->chunk_root; - ret = check_btrfs_root(trans, root1, 0, 1); + ret = check_btrfs_root(root1, 0, 1); err |= ret; root1 = root->fs_info->tree_root; - ret = check_btrfs_root(trans, root1, 0, 1); + ret = check_btrfs_root(root1, 0, 1); err |= ret; btrfs_init_path(); @@ -4906,7 +4904,7 @@ int check_chunks_and_extents_lowmem(struct btrfs_fs_info *fs_info) goto next; } - ret = check_btrfs_root(trans, cur_root, 0, 1); + ret = check_btrfs_root(cur_root, 0, 1); err |= ret; if (key.objectid == BTRFS_TREE_RELOC_OBJECTID) -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 05/18] btrfs-progs: lowmem check: introduce mark/clear_block_groups_full()
Excluding or pining all metadata blocks is not time-efficient for large storage filesystems. Here is another way to mark all metadata block groups full and allocate a new chunk for CoW. So new reservered extents never overwrite extents. Introduce modify_block_groups_cache() to modify all blocks groups cache state and set all extents in block groups unfree in free space cache. mark/clear_block_groups_full() wraps above function. Suggested-by: Qu WenruoSigned-off-by: Su Yue --- check/mode-lowmem.c | 93 + 1 file changed, 93 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1fc84f1e8c44..a200c28a9cf7 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -233,6 +233,99 @@ static int update_nodes_refs(struct btrfs_root *root, u64 bytenr, return 0; } +/* + * Mark all extents unfree in the block group. And set @block_group->cached + * according to @cache. + */ +static int modify_block_group_cache(struct btrfs_fs_info *fs_info, + struct btrfs_block_group_cache *block_group, int cache) +{ + struct extent_io_tree *free_space_cache = _info->free_space_cache; + u64 start = block_group->key.objectid; + u64 end = start + block_group->key.offset; + + if (cache && !block_group->cached) { + block_group->cached = 1; + clear_extent_dirty(free_space_cache, start, end - 1); + } + + if (!cache && block_group->cached) { + block_group->cached = 0; + clear_extent_dirty(free_space_cache, start, end - 1); + } + return 0; +} + +/* + * Modify block groups which have @flags unfree in free space cache. + * + * @cache: if 0, clear block groups cache state; + * not 0, mark blocks groups cached. + */ +static int modify_block_groups_cache(struct btrfs_fs_info *fs_info, u64 flags, +int cache) +{ + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_key key; + struct btrfs_path path; + struct btrfs_block_group_cache *bg_cache; + struct btrfs_block_group_item *bi; + struct btrfs_block_group_item bg_item; + struct extent_buffer *eb; + int slot; + int ret; + + key.objectid = 0; + key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; + key.offset = 0; + + btrfs_init_path(); + ret = btrfs_search_slot(NULL, root, , , 0, 0); + if (ret < 0) { + error("fail to search block groups due to %s", strerror(-ret)); + goto out; + } + + while (1) { + eb = path.nodes[0]; + slot = path.slots[0]; + btrfs_item_key_to_cpu(eb, , slot); + bg_cache = btrfs_lookup_block_group(fs_info, key.objectid); + if (!bg_cache) { + ret = -ENOENT; + goto out; + } + + bi = btrfs_item_ptr(eb, slot, struct btrfs_block_group_item); + read_extent_buffer(eb, _item, (unsigned long)bi, + sizeof(bg_item)); + if (btrfs_block_group_flags(_item) & flags) + modify_block_group_cache(fs_info, bg_cache, cache); + + ret = btrfs_next_item(root, ); + if (ret > 0) { + ret = 0; + goto out; + } + if (ret < 0) + goto out; + } + +out: + btrfs_release_path(); + return ret; +} + +static int mark_block_groups_full(struct btrfs_fs_info *fs_info, u64 flags) +{ + return modify_block_groups_cache(fs_info, flags, 1); +} + +static int clear_block_groups_full(struct btrfs_fs_info *fs_info, u64 flags) +{ + return modify_block_groups_cache(fs_info, flags, 0); +} + /* * This function only handles BACKREF_MISSING, * If corresponding extent item exists, increase the ref, else insert an extent -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 16/18] btrfs-progs: lowmem check: introduce repair_block_accounting()
Introduce repair_block_accounting() which calls btrfs_fix_block_accounting() to repair block group accouting. Replace btrfs_fix_block_accounting() with the new function. Note: This patch and next patches cause error in lowmem repair like: "Error: Commit_root already set when starting transaction". This error will disappear after removing @trans finished. Signed-off-by: Su Yue--- check/mode-lowmem.c | 26 +- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index d92278d2993c..40a179f75319 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -537,6 +537,30 @@ static int end_avoid_extents_overwrite(struct btrfs_fs_info *fs_info) return ret; } +/* + * Wrapper function for btrfs_fix_block_accounting(). + * + * Returns 0 on success. + * Returns != 0 on error. + */ +static int repair_block_accounting(struct btrfs_fs_info *fs_info) +{ + struct btrfs_trans_handle *trans = NULL; + struct btrfs_root *root = fs_info->extent_root; + int ret; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + error("fail to start transaction %s", strerror(-ret)); + return ret; + } + + ret = btrfs_fix_block_accounting(trans, root); + btrfs_commit_transaction(trans, root); + return ret; +} + /* * This function only handles BACKREF_MISSING, * If corresponding extent item exists, increase the ref, else insert an extent @@ -4930,7 +4954,7 @@ out: reset_cached_block_groups(fs_info); /* update block accounting */ - ret = btrfs_fix_block_accounting(trans, root); + ret = repair_block_accounting(fs_info); if (ret) err |= ret; else -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 12/18] btrfs-progs: lowmem check: remove parameter @trans of repair_extent_item()
This patch removes parameter @trans of repair_extent_item(). It calls try_avoid_extents_overwrite() and starts a transaction by itself. Note: This patch and next patches cause error in lowmem repair like: "Error: Commit_root already set when starting transaction". This error will disappear after removing @trans finished. Signed-off-by: Su Yue--- check/mode-lowmem.c | 54 + 1 file changed, 34 insertions(+), 20 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 53377848f361..443fa513a13e 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -3588,40 +3588,55 @@ out: * means error after repair * Returns 0 nothing happened */ -static int repair_extent_item(struct btrfs_trans_handle *trans, - struct btrfs_root *root, struct btrfs_path *path, +static int repair_extent_item(struct btrfs_root *root, struct btrfs_path *path, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, u64 owner, u64 offset, int err) { + struct btrfs_trans_handle *trans; + struct btrfs_root *extent_root = root->fs_info->extent_root; struct btrfs_key old_key; int freed = 0; int ret; btrfs_item_key_to_cpu(path->nodes[0], _key, path->slots[0]); - if (err & (REFERENCER_MISSING | REFERENCER_MISMATCH)) { - /* delete the backref */ - ret = btrfs_free_extent(trans, root->fs_info->fs_root, bytenr, - num_bytes, parent, root_objectid, owner, offset); - if (!ret) { - freed = 1; - err &= ~REFERENCER_MISSING; - printf("Delete backref in extent [%llu %llu]\n", - bytenr, num_bytes); - } else { - error("fail to delete backref in extent [%llu %llu]", - bytenr, num_bytes); - } + if ((err & (REFERENCER_MISSING | REFERENCER_MISMATCH)) == 0) + return err; + + ret = avoid_extents_overwrite(root->fs_info); + if (ret) + return err; + + trans = btrfs_start_transaction(extent_root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + error("fail to start transaction %s", strerror(-ret)); + /* nothing happened */ + ret = 0; + goto out; } + /* delete the backref */ + ret = btrfs_free_extent(trans, root->fs_info->fs_root, bytenr, + num_bytes, parent, root_objectid, owner, offset); + if (!ret) { + freed = 1; + err &= ~REFERENCER_MISSING; + printf("Delete backref in extent [%llu %llu]\n", + bytenr, num_bytes); + } else { + error("fail to delete backref in extent [%llu %llu]", + bytenr, num_bytes); + } + btrfs_commit_transaction(trans, extent_root); /* btrfs_free_extent may delete the extent */ btrfs_release_path(path); ret = btrfs_search_slot(NULL, root, _key, path, 0, 0); - if (ret) ret = -ENOENT; else if (freed) ret = err; +out: return ret; } @@ -3631,8 +3646,7 @@ static int repair_extent_item(struct btrfs_trans_handle *trans, * * Since we don't use extent_record anymore, introduce new error bit */ -static int check_extent_item(struct btrfs_trans_handle *trans, -struct btrfs_fs_info *fs_info, +static int check_extent_item(struct btrfs_fs_info *fs_info, struct btrfs_path *path) { struct btrfs_extent_item *ei; @@ -3763,7 +3777,7 @@ next: } if (err && repair) { - ret = repair_extent_item(trans, fs_info->extent_root, path, + ret = repair_extent_item(fs_info->extent_root, path, key.objectid, num_bytes, parent, root_objectid, owner, owner_offset, ret); if (ret < 0) @@ -4183,7 +4197,7 @@ again: break; case BTRFS_EXTENT_ITEM_KEY: case BTRFS_METADATA_ITEM_KEY: - ret = check_extent_item(trans, fs_info, path); + ret = check_extent_item(fs_info, path); err |= ret; break; case BTRFS_EXTENT_CSUM_KEY: -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 13/18] btrfs-progs: lowmem check: remove parameter @trans of check_leaf_items()
This patch removes parameter @trans of check_leaf_items(). Note: This patch and next patches cause error in lowmem repair like: "Error: Commit_root already set when starting transaction". This error will disappear after removing @trans finished. Signed-off-by: Su Yue--- check/mode-lowmem.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 443fa513a13e..a7660a25b844 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4139,8 +4139,7 @@ out: /* * Main entry function to check known items and update related accounting info */ -static int check_leaf_items(struct btrfs_trans_handle *trans, - struct btrfs_root *root, struct btrfs_path *path, +static int check_leaf_items(struct btrfs_root *root, struct btrfs_path *path, struct node_refs *nrefs, int account_bytes) { struct btrfs_fs_info *fs_info = root->fs_info; @@ -4336,7 +4335,7 @@ static int walk_down_tree(struct btrfs_trans_handle *trans, ret = process_one_leaf(root, path, nrefs, level, ext_ref); else - ret = check_leaf_items(trans, root, path, + ret = check_leaf_items(root, path, nrefs, account_file_data); err |= ret; break; -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 11/18] btrfs-progs: lowmem check: remove parameter @trans of repair_chunk_item()
This patch removes parameter @trans of repair_chunk_item(). It calls try_avoid_extents_overwrite() and starts a transaction by itself. Note: This patch and next patches cause error in lowmem repair like: "Error: Commit_root already set when starting transaction". This error will disappear after removing @trans finished. Signed-off-by: Su Yue--- check/mode-lowmem.c | 48 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 272e658296e7..53377848f361 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4026,13 +4026,14 @@ out: * * Returns error after repair. */ -static int repair_chunk_item(struct btrfs_trans_handle *trans, -struct btrfs_root *chunk_root, +static int repair_chunk_item(struct btrfs_root *chunk_root, struct btrfs_path *path, int err) { struct btrfs_chunk *chunk; struct btrfs_key chunk_key; struct extent_buffer *eb = path->nodes[0]; + struct btrfs_root *extent_root = chunk_root->fs_info->extent_root; + struct btrfs_trans_handle *trans; u64 length; int slot = path->slots[0]; u64 type; @@ -4045,21 +4046,36 @@ static int repair_chunk_item(struct btrfs_trans_handle *trans, type = btrfs_chunk_type(path->nodes[0], chunk); length = btrfs_chunk_length(eb, chunk); - if (err & REFERENCER_MISSING) { - ret = btrfs_make_block_group(trans, chunk_root->fs_info, 0, -type, chunk_key.offset, length); - if (ret) { - error("fail to add block group item[%llu %llu]", - chunk_key.offset, length); - goto out; - } else { - err &= ~REFERENCER_MISSING; - printf("Added block group item[%llu %llu]\n", - chunk_key.offset, length); - } + /* now repair only adds block group */ + if ((err & REFERENCER_MISSING) == 0) + return err; + + ret = avoid_extents_overwrite(chunk_root->fs_info); + if (ret) + return ret; + + trans = btrfs_start_transaction(extent_root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + error("fail to start transaction %s", strerror(-ret)); + return ret; } -out: + ret = btrfs_make_block_group(trans, chunk_root->fs_info, 0, type, +chunk_key.offset, length); + if (ret) { + error("fail to add block group item[%llu %llu]", + chunk_key.offset, length); + } else { + err &= ~REFERENCER_MISSING; + printf("Added block group item[%llu %llu]\n", chunk_key.offset, + length); + } + + btrfs_commit_transaction(trans, extent_root); + if (ret) + error("fail to repair item(s) related to chunk item[%llu %llu]", + chunk_key.objectid, chunk_key.offset); return err; } @@ -4158,7 +4174,7 @@ again: case BTRFS_CHUNK_ITEM_KEY: ret = check_chunk_item(fs_info, eb, slot); if (repair && ret) - ret = repair_chunk_item(trans, root, path, ret); + ret = repair_chunk_item(root, path, ret); err |= ret; break; case BTRFS_DEV_EXTENT_KEY: -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 06/18] btrfs-progs: lowmem check: introduce try_force_cow_in_new_chunk()
Introduce create_chunk_and_block_block_group() to allocate new chunk and corresponding block group. The new function force_cow_in_new_chunk() first allocates new chunk and records its start. Then it modifies all metadata block groups cached and full. Finally it marks the new block group uncached and unfree. In the next CoW, extents states will be updated automatically by cache_block_group(). New function try_to_force_cow_in_new_chunk() will try to mark block groups full, allocate a new chunk and records the start. If the last allocated chunk is almost full, a new chunk will be allocated. Suggested-by: Qu WenruoSigned-off-by: Su Yue --- check/mode-lowmem.c | 165 1 file changed, 165 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index a200c28a9cf7..3649d570e11c 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -326,6 +326,171 @@ static int clear_block_groups_full(struct btrfs_fs_info *fs_info, u64 flags) return modify_block_groups_cache(fs_info, flags, 0); } +static int create_chunk_and_block_group(struct btrfs_fs_info *fs_info, + u64 flags, u64 *start, u64 *nbytes) +{ + struct btrfs_trans_handle *trans; + struct btrfs_root *root = fs_info->extent_root; + int ret; + + if ((flags & BTRFS_BLOCK_GROUP_TYPE_MASK) == 0) + return -EINVAL; + + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + error("error starting transaction %s", strerror(-ret)); + return ret; + } + ret = btrfs_alloc_chunk(trans, fs_info, start, nbytes, flags); + if (ret) { + error("fail to allocate new chunk %s", strerror(-ret)); + goto out; + } + ret = btrfs_make_block_group(trans, fs_info, 0, flags, *start, +*nbytes); + if (ret) { + error("fail to make block group for chunk %llu %llu %s", + *start, *nbytes, strerror(-ret)); + goto out; + } +out: + btrfs_commit_transaction(trans, root); + return ret; +} + +static int force_cow_in_new_chunk(struct btrfs_fs_info *fs_info, + u64 *start_ret) +{ + struct btrfs_block_group_cache *bg; + u64 start; + u64 nbytes; + u64 alloc_profile; + u64 flags; + int ret; + + alloc_profile = (fs_info->avail_metadata_alloc_bits & +fs_info->metadata_alloc_profile); + flags = BTRFS_BLOCK_GROUP_METADATA | alloc_profile; + if (btrfs_fs_incompat(fs_info, MIXED_GROUPS)) + flags |= BTRFS_BLOCK_GROUP_DATA; + + ret = create_chunk_and_block_group(fs_info, flags, , ); + if (ret) + goto err; + printf("Created new chunk [%llu %llu]\n", start, nbytes); + + flags = BTRFS_BLOCK_GROUP_METADATA; + /* Mark all metadata block groups cached and full in free space*/ + ret = mark_block_groups_full(fs_info, flags); + if (ret) + goto clear_bgs_full; + + bg = btrfs_lookup_block_group(fs_info, start); + if (!bg) { + ret = -ENOENT; + error("fail to look up block group %llu %llu", start, nbytes); + goto clear_bgs_full; + } + + /* Clear block group cache just allocated */ + ret = modify_block_group_cache(fs_info, bg, 0); + if (ret) + goto clear_bgs_full; + if (start_ret) + *start_ret = start; + return 0; + +clear_bgs_full: + clear_block_groups_full(fs_info, flags); +err: + return ret; +} + +/* + * Returns 0 means not almost full. + * Returns >0 means almost full. + * Returns <0 means fatal error. + */ +static int is_chunk_almost_full(struct btrfs_fs_info *fs_info, u64 start) +{ + struct btrfs_path path; + struct btrfs_key key; + struct btrfs_root *root = fs_info->extent_root; + struct btrfs_block_group_item *bi; + struct btrfs_block_group_item bg_item; + struct extent_buffer *eb; + u64 used; + u64 total; + u64 min_free; + int ret; + int slot; + + key.objectid = start; + key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; + key.offset = (u64)-1; + + btrfs_init_path(); + ret = btrfs_search_slot(NULL, root, , , 0, 0); + if (!ret) + ret = -EIO; + if (ret < 0) + goto out; + ret = btrfs_previous_item(root, , start, + BTRFS_BLOCK_GROUP_ITEM_KEY); + if (ret) { + error("failed to find block group %llu", start); + ret = -ENOENT; + goto out; + } + + eb = path.nodes[0]; + slot = path.slots[0]; + btrfs_item_key_to_cpu(eb, , slot); +
[PATCH v4 14/18] btrfs-progs: lowmem check: remove parameter @trans of repair_tree_back_ref()
This patch removes parameter @trans of repair_tree_back_ref(). It calls try_avoid_extents_overwrite() and starts a transaction by itself. Note: This patch and next patches cause error in lowmem repair like: "Error: Commit_root already set when starting transaction". This error will disappear after removing @trans finished. Signed-off-by: Su Yue--- check/mode-lowmem.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index a7660a25b844..d4c8de4e69af 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -544,11 +544,11 @@ static int end_avoid_extents_overwrite(struct btrfs_fs_info *fs_info) * * Returns error bits after repair. */ -static int repair_tree_block_ref(struct btrfs_trans_handle *trans, -struct btrfs_root *root, +static int repair_tree_block_ref(struct btrfs_root *root, struct extent_buffer *node, struct node_refs *nrefs, int level, int err) { + struct btrfs_trans_handle *trans = NULL; struct btrfs_fs_info *fs_info = root->fs_info; struct btrfs_root *extent_root = fs_info->extent_root; struct btrfs_path path; @@ -598,6 +598,16 @@ static int repair_tree_block_ref(struct btrfs_trans_handle *trans, if (nrefs->full_backref[level] != 0) flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF; + ret = avoid_extents_overwrite(root->fs_info); + if (ret) + goto out; + trans = btrfs_start_transaction(extent_root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + trans = NULL; + error("fail to start transaction %s", strerror(-ret)); + goto out; + } /* insert an extent item */ if (insert_extent) { struct btrfs_disk_key copy_key; @@ -663,6 +673,8 @@ static int repair_tree_block_ref(struct btrfs_trans_handle *trans, nrefs->refs[level]++; out: + if (trans) + btrfs_commit_transaction(trans, extent_root); btrfs_release_path(); if (ret) { error( @@ -4304,7 +4316,7 @@ static int walk_down_tree(struct btrfs_trans_handle *trans, btrfs_header_owner(cur), nrefs); if (repair && ret) - ret = repair_tree_block_ref(trans, root, + ret = repair_tree_block_ref(root, path->nodes[*level], nrefs, *level, ret); err |= ret; -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 07/18] btrfs-progs: lowmem check: introduce avoid_extents_overwrite()
Another global u64 last_allocated_chunk records the last chunk start allocated by lowmem repair. Although global variable is not so graceful, it simplifies codes much. avoid_extents_overwrite() prefer to allocates new chunk first. If it failed because of no space or wrong used bytes(fsck-tests/004), then it try to exclude metadata blocks but costs lots of time in large filesystem. Signed-off-by: Su Yue--- check/mode-lowmem.c | 46 ++ 1 file changed, 46 insertions(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 3649d570e11c..ea4019c32a3f 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -28,6 +28,8 @@ #include "check/mode-common.h" #include "check/mode-lowmem.h" +static u64 last_allocated_chunk; + static int calc_extent_flag(struct btrfs_root *root, struct extent_buffer *eb, u64 *flags_ret) { @@ -491,6 +493,50 @@ static int try_to_force_cow_in_new_chunk(struct btrfs_fs_info *fs_info, return ret; } +static int avoid_extents_overwrite(struct btrfs_fs_info *fs_info) +{ + int ret; + int mixed = btrfs_fs_incompat(fs_info, MIXED_GROUPS); + + if (fs_info->excluded_extents) + return 0; + + if (last_allocated_chunk != (u64)-1) { + ret = try_to_force_cow_in_new_chunk(fs_info, + last_allocated_chunk, _allocated_chunk); + if (!ret) + goto out; + /* +* If failed, do not try to allocate chunk again in +* next call. +* If there is no space left to allocate, try to exclude all +* metadata blocks. Mixed filesystem is unsupported. +*/ + last_allocated_chunk = (u64)-1; + if (ret != -ENOSPC || mixed) + goto out; + } + + printf( + "Try to exclude all metadata blcoks and extents, it may be slow\n"); + ret = exclude_metadata_blocks(fs_info); +out: + if (ret) + error("failed to avoid extents overwrite %s", strerror(-ret)); + return ret; +} + +static int end_avoid_extents_overwrite(struct btrfs_fs_info *fs_info) +{ + int ret = 0; + + cleanup_excluded_extents(fs_info); + if (last_allocated_chunk) + ret = clear_block_groups_full(fs_info, + BTRFS_BLOCK_GROUP_METADATA); + return ret; +} + /* * This function only handles BACKREF_MISSING, * If corresponding extent item exists, increase the ref, else insert an extent -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 18/18] btrfs-progs: fsck-tests: add image no extent with normal device size
This new image only misses one extent which leads lowmem mode to allocate new chunk in repair. Original image renamed to no_extent_bad_dev.img should let lowmem mode exclude blocks in repair. Due to problems of btrfs-image, choose xz as compression tool. Signed-off-by: Su Yue--- tests/fsck-tests/014-no-extent-info/.lowmem_repairable | 0 tests/fsck-tests/014-no-extent-info/no_extent.raw.xz| Bin 0 -> 28084 bytes .../{default_case.img => no_extent_bad_dev.img} | Bin 3 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 tests/fsck-tests/014-no-extent-info/.lowmem_repairable create mode 100644 tests/fsck-tests/014-no-extent-info/no_extent.raw.xz rename tests/fsck-tests/014-no-extent-info/{default_case.img => no_extent_bad_dev.img} (100%) diff --git a/tests/fsck-tests/014-no-extent-info/.lowmem_repairable b/tests/fsck-tests/014-no-extent-info/.lowmem_repairable new file mode 100644 index ..e69de29bb2d1 diff --git a/tests/fsck-tests/014-no-extent-info/no_extent.raw.xz b/tests/fsck-tests/014-no-extent-info/no_extent.raw.xz new file mode 100644 index ..6e568a9cf1f0a1d1bcd00222b07cf14d3c09afc5 GIT binary patch literal 28084 zcmeHwRdglUlAV~Dr4lnUGcz+YlHTGm`hY*W@ct)W@gDf9*;jfYmHxzU(agW z_wTOzlbMlmc0}y6&(3#_ADY@gKwt+8b>bjEM8LQ}KtM>7nq!}zeqgG4KtQ(dpP%`S zpA!6%=nh;)N=@;U2l>H}|6{<94I+`Pat)!xJ5;c^W5@XJ0$yOAWwDqC7u{m$V# zu1^h#K5*Z
[PATCH v4 08/18] btrfs-progs: lowmem check: exclude extents if init-extent-tree in lowmem
If options '--init-extent-tree' and '--mode=lowmem' are both input, all metadata blocks will be traversed twice. First one is done by pin_metadata_blocks() in reinit_extent_tree(). Second one is in check_chunks_and_extents_v2(). Excluding instead of pining metadata blocks before reinit extent tree in lowmem can save some time. Signed-off-by: Su Yue--- check/mode-common.c | 27 --- check/mode-common.h | 2 +- check/mode-lowmem.c | 8 +++- cmds-check.c| 3 ++- 4 files changed, 30 insertions(+), 10 deletions(-) diff --git a/check/mode-common.c b/check/mode-common.c index acceb24b9597..afe5f04d1deb 100644 --- a/check/mode-common.c +++ b/check/mode-common.c @@ -706,7 +706,7 @@ out: * Using fs and other trees to rebuild extent tree. */ int reinit_extent_tree(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) + struct btrfs_fs_info *fs_info, bool pin) { u64 start = 0; int ret; @@ -728,13 +728,26 @@ int reinit_extent_tree(struct btrfs_trans_handle *trans, /* * first we need to walk all of the trees except the extent tree and pin -* down the bytes that are in use so we don't overwrite any existing -* metadata. +* down/exclude the bytes that are in use so we don't overwrite any +* existing metadata. +* If pin, unpin will be done in end of transaction. +* If exclude, cleanup will be done in check_chunks_and_extents_lowmem. */ - ret = pin_metadata_blocks(fs_info); - if (ret) { - fprintf(stderr, "error pinning down used bytes\n"); - return ret; +again: + if (pin) { + ret = pin_metadata_blocks(fs_info); + if (ret) { + fprintf(stderr, "error pinning down used bytes\n"); + return ret; + } + } else { + ret = exclude_metadata_blocks(fs_info); + if (ret) { + fprintf(stderr, "error excluding used bytes\n"); + printf("try to pin down used bytes\n"); + pin = true; + goto again; + } } /* diff --git a/check/mode-common.h b/check/mode-common.h index e2a824a318c1..8af7dd3066ff 100644 --- a/check/mode-common.h +++ b/check/mode-common.h @@ -122,7 +122,7 @@ int check_child_node(struct extent_buffer *parent, int slot, void reset_cached_block_groups(struct btrfs_fs_info *fs_info); int zero_log_tree(struct btrfs_root *root); int reinit_extent_tree(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info); + struct btrfs_fs_info *fs_info, bool pin); int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, int overwrite); int fill_csum_tree(struct btrfs_trans_handle *trans, diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index ea4019c32a3f..1e0545e6249d 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4860,8 +4860,14 @@ next: } out: - /* if repair, update block accounting */ if (repair) { + ret = end_avoid_extents_overwrite(fs_info); + if (ret < 0) + ret = FATAL_ERROR; + err |= ret; + + reset_cached_block_groups(fs_info); + /* update block accounting */ ret = btrfs_fix_block_accounting(trans, root); if (ret) err |= ret; diff --git a/cmds-check.c b/cmds-check.c index 28746712fac1..ed81fd3c22b4 100644 --- a/cmds-check.c +++ b/cmds-check.c @@ -453,7 +453,8 @@ int cmd_check(int argc, char **argv) if (init_extent_tree) { printf("Creating a new extent tree\n"); - ret = reinit_extent_tree(trans, info); + ret = reinit_extent_tree(trans, info, +check_mode == CHECK_MODE_ORIGINAL); err |= !!ret; if (ret) goto close_out; -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 09/18] btrfs-progs: lowmem check: start to remove parameters @trans in lowmem
Since extents can be avoid overwrite by excluding or new chunk allocation. It's unnessesary to do all repairs in one transaction. This patch removes parameter @trans of repair_extent_data_item(). repair_extent_data_item() calls try_avoid_extents_overwrite() and starts a transaction by itself. Note: This patch and next patches cause error in lowmem repair like: "Error: Commit_root already set when starting transaction". This error will disappear after removing @trans finished. Signed-off-by: Su Yue--- check/mode-lowmem.c | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 1e0545e6249d..446ea4a21bfa 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -2739,12 +2739,12 @@ out: * * Returns error bits after reapir. */ -static int repair_extent_data_item(struct btrfs_trans_handle *trans, - struct btrfs_root *root, +static int repair_extent_data_item(struct btrfs_root *root, struct btrfs_path *pathp, struct node_refs *nrefs, int err) { + struct btrfs_trans_handle *trans = NULL; struct btrfs_file_extent_item *fi; struct btrfs_key fi_key; struct btrfs_key key; @@ -2761,6 +2761,7 @@ static int repair_extent_data_item(struct btrfs_trans_handle *trans, u64 file_offset; int generation; int slot; + int need_insert = 0; int ret = 0; eb = pathp->nodes[0]; @@ -2799,9 +2800,20 @@ static int repair_extent_data_item(struct btrfs_trans_handle *trans, ret = -EIO; goto out; } + need_insert = ret; + ret = avoid_extents_overwrite(root->fs_info); + if (ret) + goto out; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + trans = NULL; + error("fail to start transaction %s", strerror(-ret)); + goto out; + } /* insert an extent item */ - if (ret > 0) { + if (need_insert) { key.objectid = disk_bytenr; key.type = BTRFS_EXTENT_ITEM_KEY; key.offset = num_bytes; @@ -2841,6 +2853,8 @@ static int repair_extent_data_item(struct btrfs_trans_handle *trans, err &= ~BACKREF_MISSING; out: + if (trans) + btrfs_commit_transaction(trans, root); btrfs_release_path(); if (ret) error("can't repair root %llu extent data item[%llu %llu]", @@ -4117,8 +4131,7 @@ again: case BTRFS_EXTENT_DATA_KEY: ret = check_extent_data_item(root, path, nrefs, account_bytes); if (repair && ret) - ret = repair_extent_data_item(trans, root, path, nrefs, - ret); + ret = repair_extent_data_item(root, path, nrefs, ret); err |= ret; break; case BTRFS_BLOCK_GROUP_ITEM_KEY: -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 10/18] btrfs-progs: lowmem check: remove parameter @trans of delete_extent_item()
This patch removes the parameter @trans of delete_extent_item(). It calls try_avoid_extents_overwrite() and starts a transaction by itself. Note: This patch and next patches cause error in lowmem repair like: "Error: Commit_root already set when starting transaction". This error will disappear after removing @trans finished. Signed-off-by: Su Yue--- check/mode-lowmem.c | 24 +--- 1 file changed, 17 insertions(+), 7 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 446ea4a21bfa..272e658296e7 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4063,13 +4063,22 @@ out: return err; } -static int delete_extent_tree_item(struct btrfs_trans_handle *trans, - struct btrfs_root *root, +static int delete_extent_tree_item(struct btrfs_root *root, struct btrfs_path *path) { struct btrfs_key key; + struct btrfs_trans_handle *trans; int ret = 0; + ret = avoid_extents_overwrite(root->fs_info); + if (ret) + return ret; + trans = btrfs_start_transaction(root, 1); + if (IS_ERR(trans)) { + ret = PTR_ERR(trans); + error("fail to start transaction %s", strerror(-ret)); + goto out; + } btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]); btrfs_release_path(path); ret = btrfs_search_slot(trans, root, , path, -1, 1); @@ -4087,6 +4096,7 @@ static int delete_extent_tree_item(struct btrfs_trans_handle *trans, else path->slots[0]--; out: + btrfs_commit_transaction(trans, root); if (ret) error("failed to delete root %llu item[%llu, %u, %llu]", root->objectid, key.objectid, key.type, key.offset); @@ -4138,7 +4148,7 @@ again: ret = check_block_group_item(fs_info, eb, slot); if (repair && ret & REFERENCER_MISSING) - ret = delete_extent_tree_item(trans, root, path); + ret = delete_extent_tree_item(root, path); err |= ret; break; case BTRFS_DEV_ITEM_KEY: @@ -4169,7 +4179,7 @@ again: key.objectid, -1); if (repair && ret & (REFERENCER_MISMATCH | REFERENCER_MISSING)) - ret = delete_extent_tree_item(trans, root, path); + ret = delete_extent_tree_item(root, path); err |= ret; break; case BTRFS_EXTENT_DATA_REF_KEY: @@ -4182,7 +4192,7 @@ again: btrfs_extent_data_ref_count(eb, dref)); if (repair && ret & (REFERENCER_MISMATCH | REFERENCER_MISSING)) - ret = delete_extent_tree_item(trans, root, path); + ret = delete_extent_tree_item(root, path); err |= ret; break; case BTRFS_SHARED_BLOCK_REF_KEY: @@ -4190,7 +4200,7 @@ again: key.objectid, -1); if (repair && ret & (REFERENCER_MISMATCH | REFERENCER_MISSING)) - ret = delete_extent_tree_item(trans, root, path); + ret = delete_extent_tree_item(root, path); err |= ret; break; case BTRFS_SHARED_DATA_REF_KEY: @@ -4198,7 +4208,7 @@ again: key.objectid); if (repair && ret & (REFERENCER_MISMATCH | REFERENCER_MISSING)) - ret = delete_extent_tree_item(trans, root, path); + ret = delete_extent_tree_item(root, path); err |= ret; break; default: -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 04/18] btrfs-progs: lowmem check: exclude extents of metadata blocks
Commit d17d6663c99c ("btrfs-progs: lowmem check: Fix regression which screws up extent allocator") removes pin_metadata_blocks() from lowmem repair. So we have to find another way to exclude extents which should be occupied by tree blocks. Introduce exclude_metadata_blocks() to mark extents of all tree blocks dirty in fs_info->excluded_extents. Export it since it will be used in lowmem too. Signed-off-by: Su Yue--- check/mode-common.c | 73 + check/mode-common.h | 2 ++ 2 files changed, 65 insertions(+), 10 deletions(-) diff --git a/check/mode-common.c b/check/mode-common.c index e6d8ebe8b9b7..acceb24b9597 100644 --- a/check/mode-common.c +++ b/check/mode-common.c @@ -377,40 +377,54 @@ int zero_log_tree(struct btrfs_root *root) return ret; } -static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info, - struct extent_buffer *eb, int tree_root) +static int traverse_tree_blocks(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, int tree_root, + int pin) { struct extent_buffer *tmp; struct btrfs_root_item *ri; struct btrfs_key key; + struct extent_io_tree *tree; u64 bytenr; int level = btrfs_header_level(eb); int nritems; int ret; int i; + u64 end = eb->start + eb->len; + if (pin) + tree = _info->pinned_extents; + else + tree = fs_info->excluded_extents; /* -* If we have pinned this block before, don't pin it again. +* If we have pinned/excluded this block before, don't do it again. * This can not only avoid forever loop with broken filesystem * but also give us some speedups. */ - if (test_range_bit(_info->pinned_extents, eb->start, - eb->start + eb->len - 1, EXTENT_DIRTY, 0)) + if (test_range_bit(tree, eb->start, end - 1, EXTENT_DIRTY, 0)) return 0; - btrfs_pin_extent(fs_info, eb->start, eb->len); + if (pin) + btrfs_pin_extent(fs_info, eb->start, eb->len); + else + set_extent_dirty(tree, eb->start, end - 1); nritems = btrfs_header_nritems(eb); for (i = 0; i < nritems; i++) { if (level == 0) { + bool is_extent_root; btrfs_item_key_to_cpu(eb, , i); if (key.type != BTRFS_ROOT_ITEM_KEY) continue; /* Skip the extent root and reloc roots */ - if (key.objectid == BTRFS_EXTENT_TREE_OBJECTID || - key.objectid == BTRFS_TREE_RELOC_OBJECTID || + if (key.objectid == BTRFS_TREE_RELOC_OBJECTID || key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID) continue; + is_extent_root = + key.objectid == BTRFS_EXTENT_TREE_OBJECTID; + /* If pin, skip the extent root */ + if (pin && is_extent_root) + continue; ri = btrfs_item_ptr(eb, i, struct btrfs_root_item); bytenr = btrfs_disk_root_bytenr(eb, ri); @@ -425,7 +439,7 @@ static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info, fprintf(stderr, "Error reading root block\n"); return -EIO; } - ret = pin_down_tree_blocks(fs_info, tmp, 0); + ret = traverse_tree_blocks(fs_info, tmp, 0, pin); free_extent_buffer(tmp); if (ret) return ret; @@ -444,7 +458,8 @@ static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info, fprintf(stderr, "Error reading tree block\n"); return -EIO; } - ret = pin_down_tree_blocks(fs_info, tmp, tree_root); + ret = traverse_tree_blocks(fs_info, tmp, tree_root, + pin); free_extent_buffer(tmp); if (ret) return ret; @@ -454,6 +469,12 @@ static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info, return 0; } +static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info, + struct extent_buffer *eb, int tree_root) +{ + return traverse_tree_blocks(fs_info, eb, tree_root, 1); +} + static int pin_metadata_blocks(struct btrfs_fs_info *fs_info) { int ret; @@ -465,6 +486,38 @@ static int pin_metadata_blocks(struct btrfs_fs_info
[PATCH v4 03/18] btrfs-progs: lowmem check: assign @parent early in repair_extent_data_item()
The variable @eb is assigned to leaf in fs_tree before insertion of backref. It will causes wrong parent of new inserted backref. Set @parent at beginning solves the problem. Reviewed-by: Qu WenruoSigned-off-by: Su Yue --- check/mode-lowmem.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 18ec6db098e7..1fc84f1e8c44 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -2475,6 +2475,11 @@ static int repair_extent_data_item(struct btrfs_trans_handle *trans, extent_offset = btrfs_file_extent_offset(eb, fi); offset = file_offset - extent_offset; + if (nrefs->full_backref[0]) + parent = btrfs_header_bytenr(eb); + else + parent = 0; + /* now repair only adds backref */ if ((err & BACKREF_MISSING) == 0) return err; @@ -2516,11 +2521,6 @@ static int repair_extent_data_item(struct btrfs_trans_handle *trans, btrfs_release_path(); } - if (nrefs->full_backref[0]) - parent = btrfs_header_bytenr(eb); - else - parent = 0; - ret = btrfs_inc_extent_ref(trans, root, disk_bytenr, num_bytes, parent, root->objectid, parent ? BTRFS_FIRST_FREE_OBJECTID : fi_key.objectid, -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 00/18] btrfs-progs: lowmem check: avoid extents overwrite
This patchset can be fetched from my github: https://github.com/Damenly/btrfs-progs/tree/lowmem based on unmerged patchset whose cover: [PATCH 0/3] btrfs-progs: Split original mode check to its own Author: Qu WenruoI'm sorry to send patches based on unmerged patch if it let you feel uncomfortable. I think the three patches from Qu are good enough so I send it before my vacation. Patch[1-3] fix minor problems of lowmem repair. Patch[4-8] introduce two ways to avoid extents overwrite: 1) Traverse trees and exclude all metadata blocks. It's time-inefficient for large filesystems. 2) Mark all existed chunks full, allocate new chunk for CoW and records chunk start. If the last allocated chunk is almost full, allocated a new one. 2) is More efficient than 1). However, it can't handle situations like no space(fsck/004). Lowmem repair will try method 2 first and then method 1. Patch[9-17] remove parameters @trans in functions for lowmem repair. They try to avoid extents overwrite if necessary and start transactions by themselves. Patch[18] adds a test image. Those patches are mainly for lowmem repair. Original mode is not influenced. --- Changlog: v4->v3: - Remove global enum extents_operation to simplify avoid_extents_overwrite() and its cleanup. - Rebase after work of check split. v3->v2: - check_btrfs_root() returns FATAL_ERROR if check_fs_first_inode() failed. Thanks Nikolay Borisov. - Add function try_to_force_cow_in_new_chunk() and global u64 varaiable to record start of the last allocated chunk. - Remove unused EXTENTS_PIN in enum lowmem_extents_operation. v2->v1: - Let @err in check_btrfs_root() record err bits but excluded negative values. - Do not delete a line of code to release path after extent item' insertion in repair_extent_data_item(). - Add patch[3]. - Force CoW in new allocated chunk to avoid extents overwrite. - Remove parameters @trans in check_chunks_and_extents_v2() and related callees. - Repair functions for lowmem mode call try_avoid_extents_overwrite() and start transactions. Su Yue (18): btrfs-progs: lowmem check: release path in repair_extent_data_item() btrfs-progs: lowmem check: record returned errors after walk_down_tree_v2() btrfs-progs: lowmem check: assign @parent early in repair_extent_data_item() btrfs-progs: lowmem check: exclude extents of metadata blocks btrfs-progs: lowmem check: introduce mark/clear_block_groups_full() btrfs-progs: lowmem check: introduce try_force_cow_in_new_chunk() btrfs-progs: lowmem check: introduce avoid_extents_overwrite() btrfs-progs: lowmem check: exclude extents if init-extent-tree in lowmem btrfs-progs: lowmem check: start to remove parameters @trans in lowmem btrfs-progs: lowmem check: remove parameter @trans of delete_extent_item() btrfs-progs: lowmem check: remove parameter @trans of repair_chunk_item() btrfs-progs: lowmem check: remove parameter @trans of repair_extent_item() btrfs-progs: lowmem check: remove parameter @trans of check_leaf_items() btrfs-progs: lowmem check: remove parameter @trans of repair_tree_back_ref() btrfs-progs: lowmem check: remove parameter @trans of check_btrfs_root() btrfs-progs: lowmem check: introduce repair_block_accounting() btrfs-progs: lowmem check: end of removing parameters @trans in lowmem btrfs-progs: fsck-tests: add image no extent with normal device size check/mode-common.c| 100 +++- check/mode-common.h| 4 +- check/mode-lowmem.c| 560 + check/mode-lowmem.h| 1 + cmds-check.c | 3 +- .../014-no-extent-info/.lowmem_repairable | 0 .../fsck-tests/014-no-extent-info/no_extent.raw.xz | Bin 0 -> 28084 bytes .../{default_case.img => no_extent_bad_dev.img}| Bin 8 files changed, 561 insertions(+), 107 deletions(-) create mode 100644 tests/fsck-tests/014-no-extent-info/.lowmem_repairable create mode 100644 tests/fsck-tests/014-no-extent-info/no_extent.raw.xz rename tests/fsck-tests/014-no-extent-info/{default_case.img => no_extent_bad_dev.img} (100%) -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 01/18] btrfs-progs: lowmem check: release path in repair_extent_data_item()
In repair_extent_data_item(), path is not be released if some errors occurs which causes extent buffer leak. So release path in end of the function. Reviewed-by: Qu WenruoSigned-off-by: Su Yue --- check/mode-lowmem.c | 1 + 1 file changed, 1 insertion(+) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index 62bcf3d2e126..d168a3ddd5e5 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -2537,6 +2537,7 @@ static int repair_extent_data_item(struct btrfs_trans_handle *trans, err &= ~BACKREF_MISSING; out: + btrfs_release_path(); if (ret) error("can't repair root %llu extent data item[%llu %llu]", root->objectid, disk_bytenr, num_bytes); -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 02/18] btrfs-progs: lowmem check: record returned errors after walk_down_tree_v2()
In lowmem mode with '--repair', check_chunks_and_extents_v2() will fix accounting in block groups and clear the error bit BG_ACCOUNTING_ERROR. However, return value of check_btrfs_root() doesn't contain error bits. If extent tree is on error, lowmem repair always prints error and returns nonzero value even the filesystem is fine after repair. Introduce FATAL_ERROR for lowmem mode to represents negative return values since negative and positive can't not be mixed in bits operations. Then let check_btrfs_root() return error bits. Signed-off-by: Su Yue--- check/mode-lowmem.c | 10 +- check/mode-lowmem.h | 1 + 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c index d168a3ddd5e5..18ec6db098e7 100644 --- a/check/mode-lowmem.c +++ b/check/mode-lowmem.c @@ -4215,7 +4215,7 @@ out: *otherwise means check fs tree(s) items relationship and * @root MUST be a fs tree root. * Returns 0 represents OK. - * Returns not 0 represents error. + * Returns >0 represents error bits. */ static int check_btrfs_root(struct btrfs_trans_handle *trans, struct btrfs_root *root, unsigned int ext_ref, @@ -4238,7 +4238,7 @@ static int check_btrfs_root(struct btrfs_trans_handle *trans, */ ret = check_fs_first_inode(root, ext_ref); if (ret < 0) - return ret; + return FATAL_ERROR; } @@ -4266,11 +4266,11 @@ static int check_btrfs_root(struct btrfs_trans_handle *trans, ret = walk_down_tree(trans, root, , , , ext_ref, check_all); - err |= !!ret; - + if (ret > 0) + err |= ret; /* if ret is negative, walk shall stop */ if (ret < 0) { - ret = err; + ret = err | FATAL_ERROR; break; } diff --git a/check/mode-lowmem.h b/check/mode-lowmem.h index 73d5799951b7..e7ba62e2413e 100644 --- a/check/mode-lowmem.h +++ b/check/mode-lowmem.h @@ -43,6 +43,7 @@ #define DIR_INDEX_MISMATCH (1<<19) /* INODE_INDEX found but not match */ #define DIR_COUNT_AGAIN (1<<20) /* DIR isize should be recalculated */ #define BG_ACCOUNTING_ERROR (1<<21) /* Block group accounting error */ +#define FATAL_ERROR (1<<22) /* Fatal bit for errno */ /* * Error bit for low memory mode check. -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crash when unraring large archives on btrfs-filesystem
Another way to test for this problem is one of the responses in that lkml thread by Btrfs list regular Duncan, about tweaking the knobs that handle dirty write caching. So you could try those suggested tweaks first, rather than changing kernels. https://lkml.org/lkml/2016/12/13/753 Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: IO Error (.snapshots is not a btrfs subvolume)
On Wed, Feb 7, 2018 at 6:26 PM, Nick Gilmourwrote: > Hi all, > > I have successfully restored a snapshot of root but now when I try to > make a new snapshot I get this error: > IO Error (.snapshots is not a btrfs subvolume). > My snapshots were within @ which I renamed to @_old. > What can I do now? How can I move the snapshots from @_old/ into @ and > be able to make snapshots again? > > This is an excerpt of my subvolumes list: > > # btrfs subvolume list / > ID 257 gen 175397 top level 5 path @_old > ID 258 gen 175392 top level 5 path @pkg > ID 260 gen 175447 top level 5 path @tmp > ID 262 gen 19 top level 257 path @_old/var/lib/machines > ID 268 gen 175441 top level 5 path @test > ID 291 gen 175394 top level 257 path @_old/.snapshots > ID 292 gen 1705 top level 291 path @_old/.snapshots/1/snapshot > ... > > ID 3538 gen 175398 top level 291 path @_old/.snapshots/1594/snapshot > ID 3540 gen 175447 top level 5 path @ > This is a snapper behavior. It creates .snapshots as a subvolume and then puts snapshots into that subvolume. If you snapshot a subvolume that contains another subvolume, the nested subvolume is not snapshot, instead a plain directory placeholder is created instead. So your restored snapshot contains a .snapshot directory rather than a .snapshot subvolume. Possibly if you delete the directory and create a new subvolume .snapshot, the problem will be fixed. I can't tell you how this will confuse snapper though, or how to unconfuse it. It pretty much expects to be in control of all snapshots, creation, deletion, and rollbacks. So if you do it manually for whatever reason, I think it can confuse snapper. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Crash when unraring large archives on btrfs-filesystem
On Wed, Feb 7, 2018 at 12:57 PM, Stefan Malte Schumacher < s.schumac...@netcologne.de> wrote: > > > Feb 5 21:25:12 mars kernel: [250116.605471] Node 0 active_anon:176kB > inactive_anon:276kB active_file:14228752kB inactive_file:1631728kB > unevictable:0kB isolated(anon):0kB isolated(file):4096kB mapped:9316kB > dirty:1636856kB writeback:248kB shmem:84kB shmem_thp: 0kB > shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB > pages_scanned:13631918 all_unreclaimable? no > How much RAM on the machine and how much swap available? This looks like a lot of dirty data has accumulated, and then also there's swapping happening. Both swap out and swap in. >4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux I don't know if this bears any relation to the upstream longterm 4.9.65, but there are definitely many memory and btrfs changes between 4.9.66 and 4.9.80, including a deadlock when writing out freespace cache fix. I don't know that this is related to your particular problem, there might be more than one thing going on. But the easiest thing to until someone who actually knows for sure (a developer with time to respond) is to just upgrade the kernel and see if the problem goes away. I did also find a similar problem related to the first problem, unclear if it's the instigator, page allocation stalls for 12104ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE), happening along with Btrfs. That thread: https://lkml.org/lkml/2016/12/13/529 --- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] Btrfs: expose bad chunks in sysfs
On 2018年02月08日 06:57, Liu Bo wrote: > On Tue, Feb 06, 2018 at 09:28:14AM +0800, Qu Wenruo wrote: >> >> >> On 2018年02月06日 07:15, Liu Bo wrote: >>> Btrfs tries its best to tolerate write errors, but kind of silently >>> (except some messages in kernel log). >>> >>> For raid1 and raid10, this is usually not a problem because there is a >>> copy as backup, while for parity based raid setup, i.e. raid5 and >>> raid6, the problem is that, if a write error occurs due to some bad >>> sectors, one horizonal stripe becomes degraded and the number of write >>> errors it can tolerate gets reduced by one, now if two disk fails, >>> data may be lost forever. >>> >>> One way to mitigate the data loss pain is to expose 'bad chunks', >>> i.e. degraded chunks, to users, so that they can use 'btrfs balance' >>> to relocate the whole chunk and get the full raid6 protection again >>> (if the relocation works). >>> >>> This introduces 'bad_chunks' in btrfs's per-fs sysfs directory. Once >>> a chunk of raid5 or raid6 becomes degraded, it will appear in >>> 'bad_chunks'. >> >> Sysfs looks good. >> >> Although other systems uses their own interface to handle their status. >> Mdadm uses /proc/mdstat to show such status, LVM uses lvdisplay/lvs. >> > > It's more like badblocks in md, instead of /proc/mdstat. I see the point now. > >> So here comes to a new sys-fs interface. >> >>> >>> Signed-off-by: Liu Bo>>> --- >>> - In this patch, 'bad chunks' is not persistent on disk, but it can be >>> added if it's thought to be a good idea. >> >> IHMO such bad chunks list can be built using existing dev status at >> mount time. >> > > What dev status offers is counters, but here chunk info. is needed if > we want balance to do relocation. I'll think harder about how to use > it. In my opinion, if we get write error, relocation may help for a short time, but as long as we're using the same device, it may happen again, and the root fix will be replace the device. > >> Although using dev status may cause extra problems like false alerts. >> >>> - This is lightly tested, comments are very welcome. >> >> Just checked the code, there are 2 concerns: >> >> 1) The way to remove bad chunk >>Currently it can only be removed when the chunk is removed. >>If any transient write error happened, the bad chunk will just be >>there forever (if not removed) >> > > The fundamental assumption about write error is that filesystem should > not get any transient write error, as the underlying layers in IO > stack should do their best to get rid of transient write error. > (probably I should add this to the patch log.) > > So once we get a bad chunk, there is a real IO error, for now what I > can think of is to use balance to create a new chunk to hold > everything in the bad chunk and the new chunk has the full raid > protection. Then the problem is about the granularity. If write error happens, should we just ignore that bad blocks, or the whole device? And in that case I prefer the latter. > >>It seems to cause false alert. >> >>And extra logic to determine if it's a real bad chunk in kernel seems >>a little complex and less flex. >>(Maybe an interface to info userspace where problem happens is more >> flex?) >> > > It depends on what users care about, when raid6 is in use, I think > users would care how many disk failures btrfs could tolerate at any > point, about bad chunks whether it's true or false, probably they > don't care, they might think it'd help a lot if some operations could > be done to get the system back to the protect level they want. > >> 2) Bad chunk is only added when writing >>Read routine should also be able to detect bad chunks, with better >>accuracy. >> > > Do you mean a read error should also report bad chunk? > Or am I misunderstanding your point? > > Typically read failure would trigger reconstruction and a write for > correction will be issued, then we could get bad chunks if correction > write fails. Right, I just forgot the fix procedure. > >>> >>> fs/btrfs/ctree.h | 8 +++ >>> fs/btrfs/disk-io.c | 2 ++ >>> fs/btrfs/extent-tree.c | 13 +++ >>> fs/btrfs/raid56.c | 59 >>> -- >>> fs/btrfs/sysfs.c | 26 ++ >>> fs/btrfs/volumes.c | 15 +++-- >>> fs/btrfs/volumes.h | 2 ++ >>> 7 files changed, 121 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h >>> index 13c260b..08aad65 100644 >>> --- a/fs/btrfs/ctree.h >>> +++ b/fs/btrfs/ctree.h >>> @@ -1101,6 +1101,9 @@ struct btrfs_fs_info { >>> spinlock_t ref_verify_lock; >>> struct rb_root block_tree; >>> #endif >>> + >>> + struct list_head bad_chunks; >> >> Rbtree may be better here. >> >> Since iterating a list to remove bad chunk can sometimes be slow. >> > > At the point I wrote the patch, I thought bad chunk should be rare > case so
Re: [PATCH RFC] Btrfs: expose bad chunks in sysfs
On Tue, Feb 06, 2018 at 09:28:14AM +0800, Qu Wenruo wrote: > > > On 2018年02月06日 07:15, Liu Bo wrote: > > Btrfs tries its best to tolerate write errors, but kind of silently > > (except some messages in kernel log). > > > > For raid1 and raid10, this is usually not a problem because there is a > > copy as backup, while for parity based raid setup, i.e. raid5 and > > raid6, the problem is that, if a write error occurs due to some bad > > sectors, one horizonal stripe becomes degraded and the number of write > > errors it can tolerate gets reduced by one, now if two disk fails, > > data may be lost forever. > > > > One way to mitigate the data loss pain is to expose 'bad chunks', > > i.e. degraded chunks, to users, so that they can use 'btrfs balance' > > to relocate the whole chunk and get the full raid6 protection again > > (if the relocation works). > > > > This introduces 'bad_chunks' in btrfs's per-fs sysfs directory. Once > > a chunk of raid5 or raid6 becomes degraded, it will appear in > > 'bad_chunks'. > > Sysfs looks good. > > Although other systems uses their own interface to handle their status. > Mdadm uses /proc/mdstat to show such status, LVM uses lvdisplay/lvs. > It's more like badblocks in md, instead of /proc/mdstat. > So here comes to a new sys-fs interface. > > > > > Signed-off-by: Liu Bo> > --- > > - In this patch, 'bad chunks' is not persistent on disk, but it can be > > added if it's thought to be a good idea. > > IHMO such bad chunks list can be built using existing dev status at > mount time. > What dev status offers is counters, but here chunk info. is needed if we want balance to do relocation. I'll think harder about how to use it. > Although using dev status may cause extra problems like false alerts. > > > - This is lightly tested, comments are very welcome. > > Just checked the code, there are 2 concerns: > > 1) The way to remove bad chunk >Currently it can only be removed when the chunk is removed. >If any transient write error happened, the bad chunk will just be >there forever (if not removed) > The fundamental assumption about write error is that filesystem should not get any transient write error, as the underlying layers in IO stack should do their best to get rid of transient write error. (probably I should add this to the patch log.) So once we get a bad chunk, there is a real IO error, for now what I can think of is to use balance to create a new chunk to hold everything in the bad chunk and the new chunk has the full raid protection. >It seems to cause false alert. > >And extra logic to determine if it's a real bad chunk in kernel seems >a little complex and less flex. >(Maybe an interface to info userspace where problem happens is more > flex?) > It depends on what users care about, when raid6 is in use, I think users would care how many disk failures btrfs could tolerate at any point, about bad chunks whether it's true or false, probably they don't care, they might think it'd help a lot if some operations could be done to get the system back to the protect level they want. > 2) Bad chunk is only added when writing >Read routine should also be able to detect bad chunks, with better >accuracy. > Do you mean a read error should also report bad chunk? Or am I misunderstanding your point? Typically read failure would trigger reconstruction and a write for correction will be issued, then we could get bad chunks if correction write fails. > > > > fs/btrfs/ctree.h | 8 +++ > > fs/btrfs/disk-io.c | 2 ++ > > fs/btrfs/extent-tree.c | 13 +++ > > fs/btrfs/raid56.c | 59 > > -- > > fs/btrfs/sysfs.c | 26 ++ > > fs/btrfs/volumes.c | 15 +++-- > > fs/btrfs/volumes.h | 2 ++ > > 7 files changed, 121 insertions(+), 4 deletions(-) > > > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > > index 13c260b..08aad65 100644 > > --- a/fs/btrfs/ctree.h > > +++ b/fs/btrfs/ctree.h > > @@ -1101,6 +1101,9 @@ struct btrfs_fs_info { > > spinlock_t ref_verify_lock; > > struct rb_root block_tree; > > #endif > > + > > + struct list_head bad_chunks; > > Rbtree may be better here. > > Since iterating a list to remove bad chunk can sometimes be slow. > At the point I wrote the patch, I thought bad chunk should be rare case so list search is fine, but now I'm not sure. > > + seqlock_t bc_lock; > > }; > > > > static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb) > > @@ -2568,6 +2571,11 @@ static inline gfp_t btrfs_alloc_write_mask(struct > > address_space *mapping) > > > > /* extent-tree.c */ > > > > +struct btrfs_bad_chunk { > > + u64 chunk_offset; > > It would be better to have chunk_size to info user. > Just chunk start won't tell user how serious the problem is. > Hmm, I don't understand what extra value chunk_size can offer. > And
IO Error (.snapshots is not a btrfs subvolume)
Hi all, I have successfully restored a snapshot of root but now when I try to make a new snapshot I get this error: IO Error (.snapshots is not a btrfs subvolume). My snapshots were within @ which I renamed to @_old. What can I do now? How can I move the snapshots from @_old/ into @ and be able to make snapshots again? This is an excerpt of my subvolumes list: # btrfs subvolume list / ID 257 gen 175397 top level 5 path @_old ID 258 gen 175392 top level 5 path @pkg ID 260 gen 175447 top level 5 path @tmp ID 262 gen 19 top level 257 path @_old/var/lib/machines ID 268 gen 175441 top level 5 path @test ID 291 gen 175394 top level 257 path @_old/.snapshots ID 292 gen 1705 top level 291 path @_old/.snapshots/1/snapshot ... ID 3538 gen 175398 top level 291 path @_old/.snapshots/1594/snapshot ID 3540 gen 175447 top level 5 path @ Regards, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] btrfs-progs: ctree: Add extra level check for read_node_slot()
Strangely, we have level check in btrfs_print_tree() while we don't have the same check in read_node_slot(). That's to say, for the following corruption, btrfs_search_slot() or btrfs_next_leaf() can return invalid leaf: Parent eb: node XX level 1 ^^^ Child should be leaf (level 0) ... key (XXX XXX XXX) block YY Child eb: leaf YY level 1 ^^^ Something went wrong now And for the corrupted leaf returned, later caller can be screwed up easily. Although the root cause (powerloss, but still something wrong breaking metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to avoid SEGV. Reported-by: Ralph GaugesSigned-off-by: Qu Wenruo --- changlog: v2: Check if the extent buffer is up-to-date before checking its level to avoid possible NULL pointer access. --- ctree.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/ctree.c b/ctree.c index 4fc33b14000a..430805e3043f 100644 --- a/ctree.c +++ b/ctree.c @@ -22,6 +22,7 @@ #include "repair.h" #include "internal.h" #include "sizes.h" +#include "messages.h" static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int level); @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct btrfs_key *key, struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info, struct extent_buffer *parent, int slot) { + struct extent_buffer *ret; int level = btrfs_header_level(parent); + if (slot < 0) return NULL; if (slot >= btrfs_header_nritems(parent)) @@ -649,8 +652,19 @@ struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info, if (level == 0) return NULL; - return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), + ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), btrfs_node_ptr_generation(parent, slot)); + if (!extent_buffer_uptodate(ret)) + return ERR_PTR(-EIO); + + if (btrfs_header_level(ret) != level - 1) { + error("child eb corrupted: parent bytenr=%llu item=%d parent level=%d child level=%d", + btrfs_header_bytenr(parent), slot, + btrfs_header_level(parent), btrfs_header_level(ret)); + free_extent_buffer(ret); + return ERR_PTR(-EIO); + } + return ret; } static int balance_level(struct btrfs_trans_handle *trans, -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[josef-btrfs:current-work 3/3] block/bio.c:1801:2: error: implicit declaration of function 'rq_qos_done_bio'; did you mean 'rq_qos_id'?
tree: https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git current-work head: 71fe7e0ab249e42c17f387951aa09de7cb362d35 commit: 71fe7e0ab249e42c17f387951aa09de7cb362d35 [3/3] current-work config: x86_64-randconfig-x008-201805 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: git checkout 71fe7e0ab249e42c17f387951aa09de7cb362d35 # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): In file included from block/bio.c:20:0: include/linux/bio.h:521:55: warning: 'struct blkcg_gq' declared inside parameter list will not be visible outside of this definition or declaration static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { return 0; } ^~~~ block/bio.c: In function 'bio_endio': >> block/bio.c:1801:2: error: implicit declaration of function >> 'rq_qos_done_bio'; did you mean 'rq_qos_id'? >> [-Werror=implicit-function-declaration] rq_qos_done_bio(bio->bi_disk->queue, bio); ^~~ rq_qos_id In file included from block/bio.c:20:0: At top level: include/linux/bio.h:521:12: warning: 'bio_associate_blkg' defined but not used [-Wunused-function] static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { return 0; } ^~ cc1: some warnings being treated as errors -- In file included from include/linux/blkdev.h:21:0, from include/linux/backing-dev.h:15, from block/blk-core.c:16: include/linux/bio.h:521:55: warning: 'struct blkcg_gq' declared inside parameter list will not be visible outside of this definition or declaration static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { return 0; } ^~~~ block/blk-core.c: In function 'blk_requeue_request': >> block/blk-core.c:1545:2: error: implicit declaration of function >> 'rq_qos_requeue'; did you mean 'wbt_requeue'? >> [-Werror=implicit-function-declaration] rq_qos_requeue(q, >issue_stat); ^~ wbt_requeue block/blk-core.c: In function '__blk_put_request': >> block/blk-core.c:1651:2: error: implicit declaration of function >> 'rq_qos_done'; did you mean 'rq_qos_add'? >> [-Werror=implicit-function-declaration] rq_qos_done(q, >issue_stat); ^~~ rq_qos_add block/blk-core.c: In function 'blk_queue_bio': >> block/blk-core.c:1943:12: error: implicit declaration of function >> 'rq_qos_throttle' [-Werror=implicit-function-declaration] wb_acct = rq_qos_throttle(q, bio, q->queue_lock); ^~~ >> block/blk-core.c:1953:3: error: implicit declaration of function >> 'rq_qos_cleanup'; did you mean 'rq_qos_add'? >> [-Werror=implicit-function-declaration] rq_qos_cleanup(q, wb_acct); ^~ rq_qos_add block/blk-core.c: In function 'blk_start_request': >> block/blk-core.c:2841:3: error: implicit declaration of function >> 'rq_qos_issue'; did you mean 'rq_qos_id'? >> [-Werror=implicit-function-declaration] rq_qos_issue(req->q, >issue_stat); ^~~~ rq_qos_id In file included from include/linux/blkdev.h:21:0, from include/linux/backing-dev.h:15, from block/blk-core.c:16: At top level: include/linux/bio.h:521:12: warning: 'bio_associate_blkg' defined but not used [-Wunused-function] static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { return 0; } ^~ cc1: some warnings being treated as errors -- In file included from block/blk-sysfs.c:8:0: include/linux/bio.h:521:55: warning: 'struct blkcg_gq' declared inside parameter list will not be visible outside of this definition or declaration static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { return 0; } ^~~~ block/blk-sysfs.c: In function 'queue_wb_lat_show': >> block/blk-sysfs.c:432:41: error: implicit declaration of function >> 'wbt_get_min_lat'; did you mean 'bdi_set_min_ratio'? >> [-Werror=implicit-function-declaration] return sprintf(page, "%llu\n", div_u64(wbt_get_min_lat(q), 1000)); ^~~ bdi_set_min_ratio block/blk-sysfs.c: In function 'queue_wb_lat_store': >> block/blk-sysfs.c:460:2: error: implicit declaration of function >> 'wbt_set_min_lat'; did you mean 'bdi_set_min_ratio'? >> [-Werror=implicit-function-declaration] wbt_set_min_lat(q, val); ^~~ bdi_set_min_ratio >> block/blk-sysfs.c:462:20: error: passing argument 1 of 'wbt_update_limits' >> from incompatible pointer type [-Werror=incompatible-pointer-types] wbt_update_limits(q);
[josef-btrfs:current-work 3/3] block/blk-wbt.c:1005:33: error: 'struct blkcg' has no member named 'css'
tree: https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git current-work head: 71fe7e0ab249e42c17f387951aa09de7cb362d35 commit: 71fe7e0ab249e42c17f387951aa09de7cb362d35 [3/3] current-work config: i386-randconfig-x010-201805 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: git checkout 71fe7e0ab249e42c17f387951aa09de7cb362d35 # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): In file included from include/linux/blkdev.h:21:0, from include/linux/backing-dev.h:15, from block/blk-wbt.c:24: include/linux/bio.h:521:55: warning: 'struct blkcg_gq' declared inside parameter list will not be visible outside of this definition or declaration static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { return 0; } ^~~~ block/blk-wbt.c: In function 'blkcg_qos_throttle': >> block/blk-wbt.c:1005:33: error: 'struct blkcg' has no member named 'css' bio_associate_blkcg(bio, >css); ^~ block/blk-wbt.c:1010:10: error: implicit declaration of function 'blkg_lookup_create'; did you mean 'blk_lookup_devt'? [-Werror=implicit-function-declaration] blkg = blkg_lookup_create(blkcg, q); ^~ blk_lookup_devt block/blk-wbt.c:1010:8: warning: assignment makes pointer from integer without a cast [-Wint-conversion] blkg = blkg_lookup_create(blkcg, q); ^ >> block/blk-wbt.c:1019:26: error: passing argument 2 of 'bio_associate_blkg' >> from incompatible pointer type [-Werror=incompatible-pointer-types] bio_associate_blkg(bio, blkg); ^~~~ In file included from include/linux/blkdev.h:21:0, from include/linux/backing-dev.h:15, from block/blk-wbt.c:24: include/linux/bio.h:521:12: note: expected 'struct blkcg_gq *' but argument is of type 'struct blkcg_gq *' static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { return 0; } ^~ >> block/blk-wbt.c:1030:26: error: 'struct bio' has no member named >> 'bi_issue_stat' blk_stat_set_issue(>bi_issue_stat, bio_sectors(bio)); ^~ block/blk-wbt.c: In function 'blkcg_qos_done_bio': >> block/blk-wbt.c:1105:14: error: 'struct bio' has no member named 'bi_blkg'; >> did you mean 'bi_flags'? blkg = bio->bi_blkg; ^~~ bi_flags block/blk-wbt.c:1112:26: error: 'struct bio' has no member named 'bi_issue_stat' qos_record_time(qg, >bi_issue_stat, now); ^~ block/blk-wbt.c: In function 'qos_set_min_lat_nsec': block/blk-wbt.c:1167:13: error: 'struct blkcg_gq' has no member named 'parent' while (blkg->parent) { ^~ block/blk-wbt.c:1168:44: error: 'struct blkcg_gq' has no member named 'parent' struct qos_grp *this_qg = blkg_to_qg(blkg->parent); ^~ block/blk-wbt.c:1170:14: error: 'struct blkcg_gq' has no member named 'parent' blkg = blkg->parent; ^~ block/blk-wbt.c: In function 'qos_set_limit': block/blk-wbt.c:1182:24: error: implicit declaration of function 'css_to_blkcg'; did you mean 'qg_to_blkg'? [-Werror=implicit-function-declaration] struct blkcg *blkcg = css_to_blkcg(of_css(of)); ^~~~ qg_to_blkg block/blk-wbt.c:1182:24: warning: initialization makes pointer from integer without a cast [-Wint-conversion] block/blk-wbt.c:1186:23: error: storage size of 'ctx' isn't known struct blkg_conf_ctx ctx; ^~~ block/blk-wbt.c:1193:8: error: implicit declaration of function 'blkg_conf_prep'; did you mean 'blkg_to_pd'? [-Werror=implicit-function-declaration] ret = blkg_conf_prep(blkcg, _policy_qos, buf, ); ^~ blkg_to_pd block/blk-wbt.c:1217:2: error: implicit declaration of function 'blkg_for_each_descendant_pre'; did you mean 'css_for_each_descendant_pre'? [-Werror=implicit-function-declaration] blkg_for_each_descendant_pre(blkg, pos_css, ctx.blkg) ^~~~ css_for_each_descendant_pre >> block/blk-wbt.c:1218:3: error: expected ';' before 'qos_set_min_lat_nsec' qos_set_min_lat_nsec(blkg, 1); ^~~~ >> block/blk-wbt.c:1221:2: error: implicit declaration of function >> 'blkg_conf_finish'; did you mean 'blkcg_qos_init'? >> [-Werror=implicit-function-declaration] blkg_conf_finish(); ^~~~ blkcg_qos_init block/blk-wbt.c:1186:23: warning: unused variable 'ctx' [-Wunused-variable] struct blkg_conf_ctx ctx; ^~~ block/blk-wbt.c: In function
Crash when unraring large archives on btrfs-filesystem
Hello, I have encountered what I think is a problem with btrfs, which causes my file server to become unresponsive. But let‘s start with the basic information: uname -a = Linux mars 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux btrfs –version = btrfs-progs v4.7.3 Label: none uuid: 1609e4e1-4037-4d31-bf12-f84a691db5d8 Total devices 5 FS bytes used 7.15TiB devid 1 size 3.64TiB used 2.90TiB path /dev/sda devid 2 size 3.64TiB used 2.90TiB path /dev/sdb devid 3 size 3.64TiB used 2.90TiB path /dev/sdc devid 4 size 3.64TiB used 2.90TiB path /dev/sdd devid 5 size 3.64TiB used 2.90TiB path /dev/sde Data, RAID1: total=7.25TiB, used=7.14TiB System, RAID1: total=40.00MiB, used=1.02MiB Metadata, RAID1: total=9.00GiB, used=7.75GiB GlobalReserve, single: total=512.00MiB, used=0.00B The following entry in kern.log seems to be the point where it all started and which causes me to believe that the problem is related to btrfs. At that time the server was unraring a large archive stored on the btrfs filesystem. Feb 5 21:22:42 mars kernel: [249979.829318] BTRFS info (device sda): The free space cache file (4701944807424) is invalid. skip it Feb 5 21:22:42 mars kernel: [249979.829318] Feb 5 21:25:12 mars kernel: [250090.149452] unrar: page allocation stalls for 12104ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE) Feb 5 21:25:12 mars kernel: [250116.605420] [] ? alloc_pages_vma+0xae/0x260 Feb 5 21:25:12 mars kernel: [250116.605422] [] ? __read_swap_cache_async+0x118/0x1c0 Feb 5 21:25:12 mars kernel: [250116.605423] [] ? read_swap_cache_async+0x24/0x60 Feb 5 21:25:12 mars kernel: [250116.605425] [] ? swapin_readahead+0x1a9/0x210 Feb 5 21:25:12 mars kernel: [250116.605427] [] ? radix_tree_lookup_slot+0x1e/0x50 Feb 5 21:25:12 mars kernel: [250116.605429] [] ? find_get_entry+0x1b/0x100 Feb 5 21:25:12 mars kernel: [250116.605431] [] ? pagecache_get_page+0x30/0x2b0 Feb 5 21:25:12 mars kernel: [250116.605434] [] ? do_swap_page+0x2a3/0x750 Feb 5 21:25:12 mars kernel: [250116.605436] [] ? handle_mm_fault+0x892/0x12d0 Feb 5 21:25:12 mars kernel: [250116.605438] [] ? __do_page_fault+0x25c/0x500 Feb 5 21:25:12 mars kernel: [250116.605440] [] ? page_fault+0x28/0x30 Feb 5 21:25:12 mars kernel: [250116.605442] [] ? __get_user_8+0x1b/0x25 Feb 5 21:25:12 mars kernel: [250116.605445] [] ? exit_robust_list+0x30/0x110 Feb 5 21:25:12 mars kernel: [250116.605447] [] ? mm_release+0xf8/0x130 Feb 5 21:25:12 mars kernel: [250116.605449] [] ? do_exit+0x150/0xae0 Feb 5 21:25:12 mars kernel: [250116.605450] [] ? do_group_exit+0x3a/0xa0 Feb 5 21:25:12 mars kernel: [250116.605452] [] ? get_signal+0x297/0x640 Feb 5 21:25:12 mars kernel: [250116.605454] [] ? do_signal+0x36/0x6a0 Feb 5 21:25:12 mars kernel: [250116.605457] [] ? exit_to_usermode_loop+0x71/0xb0 Feb 5 21:25:12 mars kernel: [250116.605459] [] ? syscall_return_slowpath+0x54/0x60 Feb 5 21:25:12 mars kernel: [250116.605461] [] ? system_call_fast_compare_end+0xb5/0xb7 Feb 5 21:25:12 mars kernel: [250116.605462] Mem-Info: Feb 5 21:25:12 mars kernel: [250116.605466] active_anon:44 inactive_anon:69 isolated_anon:0 Feb 5 21:25:12 mars kernel: [250116.605466] active_file:3557188 inactive_file:407932 isolated_file:1024 Feb 5 21:25:12 mars kernel: [250116.605466] unevictable:0 dirty:409214 writeback:62 unstable:0 Feb 5 21:25:12 mars kernel: [250116.605466] slab_reclaimable:37022 slab_unreclaimable:10475 Feb 5 21:25:12 mars kernel: [250116.605466] mapped:2329 shmem:21 pagetables:3522 bounce:0 Feb 5 21:25:12 mars kernel: [250116.605466] free:34036 free_pcp:291 free_cma:0 Feb 5 21:25:12 mars kernel: [250116.605471] Node 0 active_anon:176kB inactive_anon:276kB active_file:14228752kB inactive_file:1631728kB unevictable:0kB isolated(anon):0kB isolated(file):4096kB mapped:9316kB dirty:1636856kB writeback:248kB shmem:84kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB pages_scanned:13631918 all_unreclaimable? no Searching for "btrfs" in kern.log shows a lot of entries for kern.log and kern.log.1 but but none before that point of time. I think that there is a relation between upgrading to kernel 4.9.0.5 and the start of these problems. What follows is the output of of "zless kern.log | grep btrfs". Feb 5 21:25:21 mars kernel: [250128.490899] Workqueue: writeback wb_workfn (flush-btrfs-1) Feb 5 21:25:21 mars kernel: [250128.490940] [] ? io_ctl_prepare_pages+0x4c/0x180 [btrfs] Feb 5 21:25:21 mars kernel: [250128.490953] [] ? __load_free_space_cache+0x1eb/0x6d0 [btrfs] Feb 5 21:25:21 mars kernel: [250128.490966] [] ? load_free_space_cache+0xe9/0x190 [btrfs] Feb 5 21:25:21 mars kernel: [250128.490975] [] ? cache_block_group+0x1c2/0x3c0 [btrfs] Feb 5 21:25:21 mars kernel: [250128.490989] [] ? find_free_extent+0x66d/0x10d0 [btrfs] Feb 5 21:25:21 mars kernel: [250128.490999] [] ? btrfs_reserve_extent+0xa1/0x210 [btrfs] Feb 5 21:25:21 mars kernel: [250128.491011] [] ?
Re: [PATCH v4 2/3] btrfs-progs: introduce TEST_TOP for resources except binaries
On Tue, Feb 06, 2018 at 01:37:24PM +0800, Gu Jinxiang wrote: > Use TEST_TOP for tests/common, Documentation, images, and internal > binaries. Well, the point of TEST_TOP was also to remove the /tests/ subdirectory from the paths if it's inside git and to set it to the top directory where the exported testsuite resides. I'm not sure if we should continue this back-and-forth. The project idea was stated out tersly so the implementation was left "as an exercise". The v4 is close to what I'd liek to merge, so let's give it a v5 and if there will be only small things to fix I'll update the patches at commit time. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/14] btrfs: Remove fs_info argument from btrfs_trans_release_metadata
All current callers of this function just get a reference to the trans->fs_info member and pass it as the second argument. Collapse this into the function itself. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 5ca4302c136c..3c3ed6e3d484 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -818,9 +818,11 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle *trans) return should_end_transaction(trans); } -static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) +static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans) + { + struct btrfs_fs_info *fs_info = trans->fs_info; + if (!trans->block_rsv) { ASSERT(!trans->bytes_reserved); return; @@ -854,7 +856,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, return 0; } - btrfs_trans_release_metadata(trans, info); + btrfs_trans_release_metadata(trans); trans->block_rsv = NULL; if (!list_empty(>new_bgs)) @@ -875,7 +877,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, must_run_delayed_refs = 2; } - btrfs_trans_release_metadata(trans, info); + btrfs_trans_release_metadata(trans); trans->block_rsv = NULL; if (!list_empty(>new_bgs)) @@ -1968,7 +1970,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) return ret; } - btrfs_trans_release_metadata(trans, fs_info); + btrfs_trans_release_metadata(trans); trans->block_rsv = NULL; cur_trans = trans->transaction; @@ -2322,7 +2324,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) scrub_continue: btrfs_scrub_continue(fs_info); cleanup_transaction: - btrfs_trans_release_metadata(trans, fs_info); + btrfs_trans_release_metadata(trans); btrfs_trans_release_chunk_metadata(trans); trans->block_rsv = NULL; btrfs_warn(fs_info, "Skipping commit of aborted transaction."); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/14] btrfs: Don't pass fs_info to commit_cowonly_roots
We already pass a transaction handle which refrences the fs_info so we can grab it from there. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index d58c4cf461f3..354143e6d440 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1160,9 +1160,9 @@ static int update_cowonly_root(struct btrfs_trans_handle *trans, * failures will cause the file system to go offline. We still need * to clean up the delayed refs. */ -static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans, -struct btrfs_fs_info *fs_info) +static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans) { + struct btrfs_fs_info *fs_info = trans->fs_info; struct list_head *dirty_bgs = >transaction->dirty_bgs; struct list_head *io_bgs = >transaction->io_bgs; struct list_head *next; @@ -1402,7 +1402,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, * like chunk and root tree, as they won't affect qgroup. * And we don't write super to avoid half committed status. */ - ret = commit_cowonly_roots(trans, fs_info); + ret = commit_cowonly_roots(trans); if (ret) goto out; switch_commit_roots(trans->transaction, fs_info); @@ -2202,7 +2202,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) goto scrub_continue; } - ret = commit_cowonly_roots(trans, fs_info); + ret = commit_cowonly_roots(trans); if (ret) { mutex_unlock(_info->tree_log_mutex); mutex_unlock(_info->reloc_mutex); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/14] btrfs: Remove fs_info argument from create_pending_snapshots/create_pending_snapshot
We already pass the trans handle which has a reference to fs_info to create_pending_snapshot so we can refer to it directly. Doing this obviates the need to pass the fs_info to create_pending_snapshots as well. No functional changes. Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 81143ac1d88d..abee26b269a1 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1435,9 +1435,10 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, * the creation of the pending snapshots, just return 0. */ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, struct btrfs_pending_snapshot *pending) { + + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_key key; struct btrfs_root_item *new_root_item; struct btrfs_root *tree_root = fs_info->tree_root; @@ -1704,8 +1705,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, /* * create all the snapshots we've scheduled for creation */ -static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans, -struct btrfs_fs_info *fs_info) +static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans) { struct btrfs_pending_snapshot *pending, *next; struct list_head *head = >transaction->pending_snapshots; @@ -1713,7 +1713,7 @@ static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans, list_for_each_entry_safe(pending, next, head, list) { list_del(>list); - ret = create_pending_snapshot(trans, fs_info, pending); + ret = create_pending_snapshot(trans, pending); if (ret) break; } @@ -2110,7 +2110,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) * deal with them in create_pending_snapshot(), which is the * core function of the snapshot creation. */ - ret = create_pending_snapshots(trans, fs_info); + ret = create_pending_snapshots(trans); if (ret) { mutex_unlock(_info->reloc_mutex); goto scrub_continue; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/14] btrfs: Don't pass fs_info to __btrfs_run_delayed_items
We already pass the transactino handle, which contains a refrence to the fs_info so grab it from there. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/delayed-inode.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 1c0bab4080a0..1305872bbff8 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1114,9 +1114,9 @@ __btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans, * Returns < 0 on error and returns with an aborted transaction with any * outstanding delayed items cleaned up. */ -static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans, -struct btrfs_fs_info *fs_info, int nr) +static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans, int nr) { + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_delayed_root *delayed_root; struct btrfs_delayed_node *curr_node, *prev_node; struct btrfs_path *path; @@ -1164,13 +1164,13 @@ static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans, int btrfs_run_delayed_items(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info) { - return __btrfs_run_delayed_items(trans, fs_info, -1); + return __btrfs_run_delayed_items(trans, -1); } int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info, int nr) { - return __btrfs_run_delayed_items(trans, fs_info, nr); + return __btrfs_run_delayed_items(trans, nr); } int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans, -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/14] btrfs: Remove fs_info argument from btrfs_create_pending_block_groups
It can be referenced from the passed transaciton so no point in passing it as function argument. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 3 +-- fs/btrfs/extent-tree.c | 10 +- fs/btrfs/transaction.c | 6 +++--- 3 files changed, 9 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 1cc77c4bf3c3..9963b6caadeb 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2712,8 +2712,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info); void btrfs_get_block_group_trimming(struct btrfs_block_group_cache *cache); void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *cache); -void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info); +void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans); u64 btrfs_data_alloc_profile(struct btrfs_fs_info *fs_info); u64 btrfs_metadata_alloc_profile(struct btrfs_fs_info *fs_info); u64 btrfs_system_alloc_profile(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index b079ebc1f842..99bfc628ab89 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3086,7 +3086,7 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, if (run_all) { if (!list_empty(>new_bgs)) - btrfs_create_pending_block_groups(trans, fs_info); + btrfs_create_pending_block_groups(trans); spin_lock(_refs->lock); node = rb_first(_refs->href_root); @@ -3686,7 +3686,7 @@ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, * make sure all the block groups on our dirty list actually * exist */ - btrfs_create_pending_block_groups(trans, fs_info); + btrfs_create_pending_block_groups(trans); if (!path) { path = btrfs_alloc_path(); @@ -4706,7 +4706,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, */ if (trans->can_flush_pending_bgs && trans->chunk_bytes_reserved >= (u64)SZ_2M) { - btrfs_create_pending_block_groups(trans, fs_info); + btrfs_create_pending_block_groups(trans); btrfs_trans_release_chunk_metadata(trans); } return ret; @@ -10130,9 +10130,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) return ret; } -void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) +void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans) { + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_block_group_cache *block_group, *tmp; struct btrfs_root *extent_root = fs_info->extent_root; struct btrfs_block_group_item item; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 3c3ed6e3d484..82b7e5855119 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -860,7 +860,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, trans->block_rsv = NULL; if (!list_empty(>new_bgs)) - btrfs_create_pending_block_groups(trans, info); + btrfs_create_pending_block_groups(trans); trans->delayed_ref_updates = 0; if (!trans->sync) { @@ -881,7 +881,7 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, trans->block_rsv = NULL; if (!list_empty(>new_bgs)) - btrfs_create_pending_block_groups(trans, info); + btrfs_create_pending_block_groups(trans); btrfs_trans_release_chunk_metadata(trans); @@ -1983,7 +1983,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) smp_wmb(); if (!list_empty(>new_bgs)) - btrfs_create_pending_block_groups(trans, fs_info); + btrfs_create_pending_block_groups(trans); ret = btrfs_run_delayed_refs(trans, fs_info, 0); if (ret) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/14] btrfs: Remove root argument of cleanup_transaction
The only thing the passed root is used for is: 1. get a reference to the fs_info and to 2. call trace_btrfs_transaction_commit. We can achieve 1) by simply referring to the fs_info from passed trans object. As far as 2) is concerned cleanup_transaction is called from only one place and the 'root' argument passed is the one from the trans handle. No functional changes. Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 354143e6d440..b8fd9fe8a9c1 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1866,10 +1866,9 @@ int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans, } -static void cleanup_transaction(struct btrfs_trans_handle *trans, - struct btrfs_root *root, int err) +static void cleanup_transaction(struct btrfs_trans_handle *trans, int err) { - struct btrfs_fs_info *fs_info = root->fs_info; + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_transaction *cur_trans = trans->transaction; DEFINE_WAIT(wait); @@ -1909,7 +1908,7 @@ static void cleanup_transaction(struct btrfs_trans_handle *trans, btrfs_put_transaction(cur_trans); btrfs_put_transaction(cur_trans); - trace_btrfs_transaction_commit(root); + trace_btrfs_transaction_commit(trans->root); if (current->journal_info == trans) current->journal_info = NULL; @@ -2330,7 +2329,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) btrfs_warn(fs_info, "Skipping commit of aborted transaction."); if (current->journal_info == trans) current->journal_info = NULL; - cleanup_transaction(trans, trans->root, ret); + cleanup_transaction(trans, ret); return ret; } -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/14] btrfs: Don't pass fs_info to commit_fs_roots
We already pass the transaction handle which has a reference to the fs_info. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index f24f05fb508e..d58c4cf461f3 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1256,9 +1256,9 @@ void btrfs_add_dead_root(struct btrfs_root *root) /* * update all the cowonly tree roots on disk */ -static noinline int commit_fs_roots(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) +static noinline int commit_fs_roots(struct btrfs_trans_handle *trans) { + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_root *gang[8]; int i; int ret; @@ -1376,7 +1376,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, */ mutex_lock(_info->tree_log_mutex); - ret = commit_fs_roots(trans, fs_info); + ret = commit_fs_roots(trans); if (ret) goto out; ret = btrfs_qgroup_account_extents(trans, fs_info); @@ -2162,7 +2162,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) */ mutex_lock(_info->tree_log_mutex); - ret = commit_fs_roots(trans, fs_info); + ret = commit_fs_roots(trans); if (ret) { mutex_unlock(_info->tree_log_mutex); mutex_unlock(_info->reloc_mutex); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/14] btrfs: Make btrfs_trans_release_metadata private to transaction.c
This function is only ever used in __btrfs_end_transaction and btrfs_commit_transaction so there is no need to export it via header. Let's move it closer to where it's used, make it static and remove it from the header. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 2 -- fs/btrfs/extent-tree.c | 18 -- fs/btrfs/transaction.c | 19 +++ 3 files changed, 19 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aee4365e82ba..1cc77c4bf3c3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2748,8 +2748,6 @@ void btrfs_delalloc_release_space(struct inode *inode, struct extent_changeset *reserved, u64 start, u64 len); void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start, u64 len); -void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info); void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans); int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans, struct btrfs_inode *inode); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index cc08e6af3542..b079ebc1f842 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5893,24 +5893,6 @@ static void release_global_block_rsv(struct btrfs_fs_info *fs_info) WARN_ON(fs_info->delayed_block_rsv.reserved > 0); } -void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) -{ - if (!trans->block_rsv) { - ASSERT(!trans->bytes_reserved); - return; - } - - if (!trans->bytes_reserved) - return; - - ASSERT(trans->block_rsv == _info->trans_block_rsv); - trace_btrfs_space_reservation(fs_info, "transaction", - trans->transid, trans->bytes_reserved, 0); - btrfs_block_rsv_release(fs_info, trans->block_rsv, - trans->bytes_reserved); - trans->bytes_reserved = 0; -} /* * To be called after all the new block groups attached to the transaction diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 2141587195d4..beca25635787 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -818,6 +818,25 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle *trans) return should_end_transaction(trans); } +static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info) +{ + if (!trans->block_rsv) { + ASSERT(!trans->bytes_reserved); + return; + } + + if (!trans->bytes_reserved) + return; + + ASSERT(trans->block_rsv == _info->trans_block_rsv); + trace_btrfs_space_reservation(fs_info, "transaction", + trans->transid, trans->bytes_reserved, 0); + btrfs_block_rsv_release(fs_info, trans->block_rsv, + trans->bytes_reserved); + trans->bytes_reserved = 0; +} + static int __btrfs_end_transaction(struct btrfs_trans_handle *trans, int throttle) { -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 07/14] btrfs: Don't pass fs_info to btrfs_run_delayed_items/_nr
We already pass the transaction which has a reference to the fs_info, so use that. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/delayed-inode.c | 6 ++ fs/btrfs/delayed-inode.h | 6 ++ fs/btrfs/extent-tree.c | 2 +- fs/btrfs/transaction.c | 8 fs/btrfs/tree-log.c | 12 5 files changed, 13 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 1305872bbff8..86cc0f5b0435 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -1161,14 +1161,12 @@ static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans, int nr) return ret; } -int btrfs_run_delayed_items(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) +int btrfs_run_delayed_items(struct btrfs_trans_handle *trans) { return __btrfs_run_delayed_items(trans, -1); } -int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, int nr) +int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans, int nr) { return __btrfs_run_delayed_items(trans, nr); } diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h index c4189d495934..ae893d85224f 100644 --- a/fs/btrfs/delayed-inode.h +++ b/fs/btrfs/delayed-inode.h @@ -111,10 +111,8 @@ int btrfs_delete_delayed_dir_index(struct btrfs_trans_handle *trans, int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode); -int btrfs_run_delayed_items(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info); -int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info, int nr); +int btrfs_run_delayed_items(struct btrfs_trans_handle *trans); +int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans, int nr); void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info); diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 47c27fc403b9..52cb4eb12318 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4994,7 +4994,7 @@ static void flush_space(struct btrfs_fs_info *fs_info, ret = PTR_ERR(trans); break; } - ret = btrfs_run_delayed_items_nr(trans, fs_info, nr); + ret = btrfs_run_delayed_items_nr(trans, nr); btrfs_end_transaction(trans); break; case FLUSH_DELALLOC: diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index aef311531ab2..f24f05fb508e 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1529,7 +1529,7 @@ static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans, * otherwise we corrupt the FS during * snapshot */ - ret = btrfs_run_delayed_items(trans, fs_info); + ret = btrfs_run_delayed_items(trans); if (ret) { /* Transaction aborted */ btrfs_abort_transaction(trans, ret); goto fail; @@ -2066,7 +2066,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) if (ret) goto cleanup_transaction; - ret = btrfs_run_delayed_items(trans, fs_info); + ret = btrfs_run_delayed_items(trans); if (ret) goto cleanup_transaction; @@ -2074,7 +2074,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) extwriter_counter_read(cur_trans) == 0); /* some pending stuffs might be added after the previous flush. */ - ret = btrfs_run_delayed_items(trans, fs_info); + ret = btrfs_run_delayed_items(trans); if (ret) goto cleanup_transaction; @@ -2127,7 +2127,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) * because all the tree which are snapshoted will be forced to COW * the nodes and leaves. */ - ret = btrfs_run_delayed_items(trans, fs_info); + ret = btrfs_run_delayed_items(trans); if (ret) { mutex_unlock(_info->reloc_mutex); goto scrub_continue; diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c index df8e76d01dbe..2fbe49a04933 100644 --- a/fs/btrfs/tree-log.c +++ b/fs/btrfs/tree-log.c @@ -852,7 +852,6 @@ static noinline int drop_one_dir_item(struct btrfs_trans_handle *trans, struct btrfs_inode *dir, struct btrfs_dir_item *di) { - struct btrfs_fs_info *fs_info = root->fs_info; struct inode *inode; char *name; int name_len; @@ -886,7 +885,7 @@ static noinline int drop_one_dir_item(struct btrfs_trans_handle *trans, if (ret) goto out; else - ret = btrfs_run_delayed_items(trans, fs_info); + ret =
[PATCH 05/14] btrfs: Don't pass fs_info arg to btrfs_start_dirty_block_groups
It can be referenced from the passed transaction so no point in passing it as a function argument. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/ctree.h | 3 +-- fs/btrfs/extent-tree.c | 4 ++-- fs/btrfs/transaction.c | 2 +- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 9963b6caadeb..f929685b80e2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2690,8 +2690,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans, u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid, u64 owner, u64 offset); -int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info); +int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans); int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info); int btrfs_setup_space_cache(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 99bfc628ab89..47c27fc403b9 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3660,9 +3660,9 @@ int btrfs_setup_space_cache(struct btrfs_trans_handle *trans, * the commit latency by getting rid of the easy block groups while * we're still allowing others to join the commit. */ -int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) +int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans) { + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_block_group_cache *cache; struct btrfs_transaction *cur_trans = trans->transaction; int ret = 0; diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 82b7e5855119..aef311531ab2 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2014,7 +2014,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) mutex_unlock(_info->ro_block_group_mutex); if (run_it) - ret = btrfs_start_dirty_block_groups(trans, fs_info); + ret = btrfs_start_dirty_block_groups(trans); } if (ret) { btrfs_end_transaction(trans); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/14] btrfs: Remove fs_info argument from switch_commit_roots
We already have the fs_info from the passed transaction so use it directly. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index b8fd9fe8a9c1..81143ac1d88d 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -126,9 +126,9 @@ static void clear_btree_io_tree(struct extent_io_tree *tree) spin_unlock(>lock); } -static noinline void switch_commit_roots(struct btrfs_transaction *trans, -struct btrfs_fs_info *fs_info) +static noinline void switch_commit_roots(struct btrfs_transaction *trans) { + struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_root *root, *tmp; down_write(_info->commit_root_sem); @@ -1405,7 +1405,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, ret = commit_cowonly_roots(trans); if (ret) goto out; - switch_commit_roots(trans->transaction, fs_info); + switch_commit_roots(trans->transaction); ret = btrfs_write_and_wait_transaction(trans, fs_info); if (ret) btrfs_handle_fs_error(fs_info, ret, @@ -2233,7 +2233,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) list_add_tail(_info->chunk_root->dirty_list, _trans->switch_commits); - switch_commit_roots(cur_trans, fs_info); + switch_commit_roots(cur_trans); ASSERT(list_empty(_trans->dirty_bgs)); ASSERT(list_empty(_trans->io_bgs)); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 14/14] btrfs: Remove fs_info argument of btrfs_write_and_wait_transaction
We already pass btrfs_trans_handle which contains a reference to the fs_info so use that. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index b8dbe4e88631..a57065f022ff 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1091,10 +1091,10 @@ int btrfs_wait_tree_log_extents(struct btrfs_root *log_root, int mark) * * @trans: transaction whose dirty pages we'd like to write */ -static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) +static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans) { struct extent_io_tree *dirty_pages = >transaction->dirty_pages; + struct btrfs_fs_info *fs_info = trans->fs_info; struct blk_plug plug; int ret, ret2; @@ -1406,7 +1406,7 @@ static int qgroup_account_snapshot(struct btrfs_trans_handle *trans, if (ret) goto out; switch_commit_roots(trans->transaction); - ret = btrfs_write_and_wait_transaction(trans, fs_info); + ret = btrfs_write_and_wait_transaction(trans); if (ret) btrfs_handle_fs_error(fs_info, ret, "Error while writing out transaction for qgroup"); @@ -2260,7 +2260,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) wake_up(_info->transaction_wait); - ret = btrfs_write_and_wait_transaction(trans, fs_info); + ret = btrfs_write_and_wait_transaction(trans); if (ret) { btrfs_handle_fs_error(fs_info, ret, "Error while writing out transaction"); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/14] btrfs: Remove fs_info argument from btrfs_update_commit_device_bytes_used
We already pass the btrfs_transaction which references fs_info so no need to pass the later as an argument. Also use the opportunity to shorten transaction->trans. No functional changes Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 2 +- fs/btrfs/volumes.c | 10 +- fs/btrfs/volumes.h | 3 +-- 3 files changed, 7 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index abee26b269a1..b8dbe4e88631 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2245,7 +2245,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) sizeof(*fs_info->super_copy)); btrfs_update_commit_device_size(fs_info); - btrfs_update_commit_device_bytes_used(fs_info, cur_trans); + btrfs_update_commit_device_bytes_used(cur_trans); clear_bit(BTRFS_FS_LOG1_ERR, _info->flags); clear_bit(BTRFS_FS_LOG2_ERR, _info->flags); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 71f9abd44f21..c61fef86538d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5229,7 +5229,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) /* * There could be two corrupted data stripes, we need * to loop retry in order to rebuild the correct data. -* +* * Fail a stripe at a time on every retry except the * stripe under reconstruction. */ @@ -7387,20 +7387,20 @@ void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info) } /* Must be invoked during the transaction commit */ -void btrfs_update_commit_device_bytes_used(struct btrfs_fs_info *fs_info, - struct btrfs_transaction *transaction) +void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans) { + struct btrfs_fs_info *fs_info = trans->fs_info; struct extent_map *em; struct map_lookup *map; struct btrfs_device *dev; int i; - if (list_empty(>pending_chunks)) + if (list_empty(>pending_chunks)) return; /* In order to kick the device replace finish process */ mutex_lock(_info->chunk_mutex); - list_for_each_entry(em, >pending_chunks, list) { + list_for_each_entry(em, >pending_chunks, list) { map = em->map_lookup; for (i = 0; i < map->num_stripes; i++) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index ca6640445a88..8692b40036d6 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -569,8 +569,7 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags) } void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info); -void btrfs_update_commit_device_bytes_used(struct btrfs_fs_info *fs_info, - struct btrfs_transaction *transaction); +void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans); struct list_head *btrfs_get_fs_uuids(void); void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info); -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 00/14] Misc transaction cleanups
Here are a bunch of transaction-related cleanups all of them present no functional changes. The first 2 patches could be more interesting - the first one moves trans_release_metadata to transaction.c and makes it static and the second one opencodes btrfs_write_and_wait_marked_extents in its sole caller to make the chall chain shorter. The rest of the patches just kill the extraneous fs_info argument since they also take either a btrfs_trans_handle or btrfs_transaction pointer which already contain fs_info. The modified functions are all called from btrfs_commit_transaction. With this series applied the only function which remain that still take fs_info and some type of transaction reference are: btrfs_finish_extent_commit btrfs_qgroup_account_extents btrfs_run_delayed_refs The reason I haven't touched them is that David expressed some reservation about mass cleaning of functions which are more or less public interface. And the above 3 are such functions. David if you don't objec to converting those 3 I will keep them in mind when doing further cleanups in the transaction area. Nikolay Borisov (14): btrfs: Make btrfs_trans_release_metadata private to transaction.c btrfs: Open code btrfs_write_and_wait_marked_extents btrfs: Remove fs_info argument from btrfs_trans_release_metadata btrfs: Remove fs_info argument from btrfs_create_pending_block_groups btrfs: Don't pass fs_info arg to btrfs_start_dirty_block_groups btrfs: Don't pass fs_info to __btrfs_run_delayed_items btrfs: Don't pass fs_info to btrfs_run_delayed_items/_nr btrfs: Don't pass fs_info to commit_fs_roots btrfs: Don't pass fs_info to commit_cowonly_roots btrfs: Remove root argument of cleanup_transaction btrfs: Remove fs_info argument from switch_commit_roots btrfs: Remove fs_info argument from create_pending_snapshots/create_pending_snapshot btrfs: Remove fs_info argument from btrfs_update_commit_device_bytes_used btrfs: Remove fs_info argument of btrfs_write_and_wait_transaction fs/btrfs/ctree.h | 8 +-- fs/btrfs/delayed-inode.c | 14 +++-- fs/btrfs/delayed-inode.h | 6 +-- fs/btrfs/extent-tree.c | 34 +++- fs/btrfs/transaction.c | 134 ++- fs/btrfs/tree-log.c | 12 ++--- fs/btrfs/volumes.c | 10 ++-- fs/btrfs/volumes.h | 3 +- 8 files changed, 101 insertions(+), 120 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/14] btrfs: Open code btrfs_write_and_wait_marked_extents
btrfs_write_and_wait_transaction is essentially a wrapper of btrfs_write_and_wait_marked_extents with the addition of calling clear_btree_io_tree. Having the code split doesn't really bring any benefit. Open code the later into the former and add proper documentation header. Signed-off-by: Nikolay Borisov--- fs/btrfs/transaction.c | 40 1 file changed, 16 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index beca25635787..5ca4302c136c 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1082,41 +1082,33 @@ int btrfs_wait_tree_log_extents(struct btrfs_root *log_root, int mark) return err; } -/* - * when btree blocks are allocated, they have some corresponding bits set for - * them in one of two extent_io trees. This is used to make sure all of - * those extents are on disk for transaction or log commit + +/* btrfs_write_and_wait_transaction - When btree blocks are allocated the + * corresponding extents are marked dirty. This function ensures such extents + * are persisted on disk for * transaction or log commit. + * + * @trans: transaction whose dirty pages we'd like to write */ -static int btrfs_write_and_wait_marked_extents(struct btrfs_fs_info *fs_info, - struct extent_io_tree *dirty_pages, int mark) +static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans, + struct btrfs_fs_info *fs_info) { - int ret; - int ret2; + struct extent_io_tree *dirty_pages = >transaction->dirty_pages; struct blk_plug plug; + int ret, ret2; blk_start_plug(); - ret = btrfs_write_marked_extents(fs_info, dirty_pages, mark); + ret = btrfs_write_marked_extents(fs_info, dirty_pages, EXTENT_DIRTY); blk_finish_plug(); ret2 = btrfs_wait_extents(fs_info, dirty_pages); + clear_btree_io_tree(>transaction->dirty_pages); + if (ret) return ret; - if (ret2) + else if (ret2) return ret2; - return 0; -} - -static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans, - struct btrfs_fs_info *fs_info) -{ - int ret; - - ret = btrfs_write_and_wait_marked_extents(fs_info, - >transaction->dirty_pages, - EXTENT_DIRTY); - clear_btree_io_tree(>transaction->dirty_pages); - - return ret; + else + return 0; } /* -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: ctree: Add extra level check for read_node_slot()
On 2018年02月07日 22:35, David Sterba wrote: > On Wed, Feb 07, 2018 at 05:18:25PM +0800, Qu Wenruo wrote: >> Strangely, we have level check in btrfs_print_tree() while we don't have >> the same check in read_node_slot(). >> >> That's to say, for the following corruption, btrfs_search_slot() or >> btrfs_next_leaf() can return invalid leaf: >> >> Parent eb: >> node XX level 1 >> ^^^ >> Child should be leaf (level 0) >> ... >> key (XXX XXX XXX) block YY >> >> Child eb: >> leaf YY level 1 >> ^^^ >> Something went wrong now >> >> And for the corrupted leaf returned, later caller can be screwed up >> easily. >> >> Although the root cause (powerloss, but still something wrong breaking >> metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to >> avoid SEGV. >> >> Reported-by: Ralph Gauges>> Signed-off-by: Qu Wenruo >> --- >> ctree.c | 13 - >> 1 file changed, 12 insertions(+), 1 deletion(-) >> >> diff --git a/ctree.c b/ctree.c >> index 4fc33b14000a..ddb1e9cc6d37 100644 >> --- a/ctree.c >> +++ b/ctree.c >> @@ -22,6 +22,7 @@ >> #include "repair.h" >> #include "internal.h" >> #include "sizes.h" >> +#include "messages.h" >> >> static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root >>*root, struct btrfs_path *path, int level); >> @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct >> btrfs_key *key, >> struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info, >> struct extent_buffer *parent, int slot) >> { >> +struct extent_buffer *ret; >> int level = btrfs_header_level(parent); >> + >> if (slot < 0) >> return NULL; >> if (slot >= btrfs_header_nritems(parent)) >> @@ -649,8 +652,16 @@ struct extent_buffer *read_node_slot(struct >> btrfs_fs_info *fs_info, >> if (level == 0) >> return NULL; >> >> -return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), >> +ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), > > The result of read_tree_block should be checked before use, by > extent_buffer_uptodate, null check or IS_ERR at least (depending on the > context of use). Right, just forgot that. Will fix it in next version. Thanks, Qu > >> btrfs_node_ptr_generation(parent, slot)); >> +if (btrfs_header_level(ret) != level - 1) { >> +error("child eb corrupted: parent bytenr=%llu item=%d parent >> level=%d child level=%d", >> + btrfs_header_bytenr(parent), slot, >> + btrfs_header_level(parent), btrfs_header_level(ret)); >> +free_extent_buffer(ret); >> +return ERR_PTR(-EIO); >> +} >> +return ret; >> } >> >> static int balance_level(struct btrfs_trans_handle *trans, >> -- >> 2.16.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > signature.asc Description: OpenPGP digital signature
Re: [PATCH] btrfs-progs: ctree: Add extra level check for read_node_slot()
On Wed, Feb 07, 2018 at 05:18:25PM +0800, Qu Wenruo wrote: > Strangely, we have level check in btrfs_print_tree() while we don't have > the same check in read_node_slot(). > > That's to say, for the following corruption, btrfs_search_slot() or > btrfs_next_leaf() can return invalid leaf: > > Parent eb: > node XX level 1 > ^^^ > Child should be leaf (level 0) > ... > key (XXX XXX XXX) block YY > > Child eb: > leaf YY level 1 > ^^^ > Something went wrong now > > And for the corrupted leaf returned, later caller can be screwed up > easily. > > Although the root cause (powerloss, but still something wrong breaking > metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to > avoid SEGV. > > Reported-by: Ralph Gauges> Signed-off-by: Qu Wenruo > --- > ctree.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/ctree.c b/ctree.c > index 4fc33b14000a..ddb1e9cc6d37 100644 > --- a/ctree.c > +++ b/ctree.c > @@ -22,6 +22,7 @@ > #include "repair.h" > #include "internal.h" > #include "sizes.h" > +#include "messages.h" > > static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root > *root, struct btrfs_path *path, int level); > @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct > btrfs_key *key, > struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info, > struct extent_buffer *parent, int slot) > { > + struct extent_buffer *ret; > int level = btrfs_header_level(parent); > + > if (slot < 0) > return NULL; > if (slot >= btrfs_header_nritems(parent)) > @@ -649,8 +652,16 @@ struct extent_buffer *read_node_slot(struct > btrfs_fs_info *fs_info, > if (level == 0) > return NULL; > > - return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), > + ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), The result of read_tree_block should be checked before use, by extent_buffer_uptodate, null check or IS_ERR at least (depending on the context of use). > btrfs_node_ptr_generation(parent, slot)); > + if (btrfs_header_level(ret) != level - 1) { > + error("child eb corrupted: parent bytenr=%llu item=%d parent > level=%d child level=%d", > + btrfs_header_bytenr(parent), slot, > + btrfs_header_level(parent), btrfs_header_level(ret)); > + free_extent_buffer(ret); > + return ERR_PTR(-EIO); > + } > + return ret; > } > > static int balance_level(struct btrfs_trans_handle *trans, > -- > 2.16.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: tests common: remove meaningless colon in extract_image()
On Wed, Feb 07, 2018 at 05:57:43PM +0800, Su Yue wrote: > The colon is meaningless so remove it. > > Signed-off-by: Su YueApplied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: send, do not issue unnecessary truncate operations
From: Filipe MananaWhen send finishes processing an inode representing a regular file, it always issues a truncate operation for that file, even if its size did not change or the last write sets the file size correctly. In the most common cases, the issued write operations set the file to correct size (either full or incremental sends) or the file size did not change (for incremental sends), so the only case where a truncate operation is needed is when a file size becomes smaller in the send snapshot when compared to the parent snapshot. By not issuing unnecessary truncate operations we reduce the stream size and save time in the receiver. Currently truncating a file to the same size triggers writeback of its last page (if it's dirty) and waits for it to complete (only if the file size is not aligned with the filesystem's sector size). This is being fixed by another patch and is independent of this change (that patch's title is "Btrfs: skip writeback of last page when truncating file to same size"). The following script was used to measure time spent by a receiver without this change applied, with this change applied, and without this change and with the truncate fix applied (the fix to not make it start and wait for writeback to complete). $ cat test_send.sh #!/bin/bash SRC_DEV=/dev/sdc DST_DEV=/dev/sdd SRC_MNT=/mnt/sdc DST_MNT=/mnt/sdd mkfs.btrfs -f $SRC_DEV >/dev/null mkfs.btrfs -f $DST_DEV >/dev/null mount $SRC_DEV $SRC_MNT mount $DST_DEV $DST_MNT echo "Creating source filesystem" for ((t = 0; t < 10; t++)); do ( for ((i = 1; i <= 2; i++)); do xfs_io -f -c "pwrite -S 0xab 0 5000" \ $SRC_MNT/file_$i > /dev/null done ) & worker_pids[$t]=$! done wait ${worker_pids[@]} echo "Creating and sending snapshot" btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null /usr/bin/time -f "send took %e seconds"\ btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1 /usr/bin/time -f "receive took %e seconds" \ btrfs receive -f $SRC_MNT/send_file $DST_MNT umount $SRC_MNT umount $DST_MNT The results, which are averages for 5 runs for each case, were the following: * Without this change average receive time was 26.49 seconds standard deviation of 2.53 seconds * Without this change and with the truncate fix average receive time was 12.51 seconds standard deviation of 0.32 seconds * With this change and without the truncate fix average receive time was 10.02 seconds standard deviation of 1.11 seconds Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 484e2af793de..5df50d67d319 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -112,6 +112,7 @@ struct send_ctx { u64 cur_inode_mode; u64 cur_inode_rdev; u64 cur_inode_last_extent; + u64 cur_inode_next_write_offset; u64 send_progress; @@ -5029,6 +5030,7 @@ static int send_hole(struct send_ctx *sctx, u64 end) break; offset += len; } + sctx->cur_inode_next_write_offset = offset; tlv_put_failure: fs_path_free(p); return ret; @@ -5264,6 +5266,7 @@ static int send_write_or_clone(struct send_ctx *sctx, } else { ret = send_extent_data(sctx, offset, len); } + sctx->cur_inode_next_write_offset = offset + len; out: return ret; } @@ -5788,6 +5791,7 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end) u64 right_gid; int need_chmod = 0; int need_chown = 0; + int need_truncate = 1; int pending_move = 0; int refs_processed = 0; @@ -5825,9 +5829,13 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end) need_chown = 1; if (!S_ISLNK(sctx->cur_inode_mode)) need_chmod = 1; + if (sctx->cur_inode_next_write_offset == sctx->cur_inode_size) + need_truncate = 0; } else { + u64 old_size; + ret = get_inode_info(sctx->parent_root, sctx->cur_ino, - NULL, NULL, _mode, _uid, + _size, NULL, _mode, _uid, _gid, NULL); if (ret < 0) goto out; @@ -5836,6 +5844,10 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end) need_chown = 1; if (!S_ISLNK(sctx->cur_inode_mode) && left_mode != right_mode) need_chmod = 1; + if ((old_size == sctx->cur_inode_size) || + (sctx->cur_inode_size > old_size && +sctx->cur_inode_next_write_offset == sctx->cur_inode_size)) +
[PATCH] Btrfs: skip writeback of last page when truncating file to same size
From: Filipe MananaWhen we truncate a file to the same size and that size is not aligned with the sector size, we end up triggering writeback (and wait for it to complete) of the last page. This is unncessary as we can not have delayed allocation beyond the inode's i_size and the goal of truncating a file to its own size is to discard prealloc extents (allocated via the fallocate(2) system call). Besides the unnecessary IO start and wait, it also breaks the oppurtunity for larger contiguous extents on disk, as before the last dirty page there might be other dirty pages. This scenario is probably not very common in general, however it is common for btrfs receive implementations because currently the send stream always issues a truncate operation for each processed inode as the last operation for that inode (this truncate operation is not always needed and the send implementation will be addressed to avoid them). So improve this by not starting and waiting for writeback of the inode's last page when we are truncating to exactly the same size. The following script was used to quickly measure the time a receive operation takes: $ cat test_send.sh #!/bin/bash SRC_DEV=/dev/sdc DST_DEV=/dev/sdd SRC_MNT=/mnt/sdc DST_MNT=/mnt/sdd mkfs.btrfs -f $SRC_DEV >/dev/null mkfs.btrfs -f $DST_DEV >/dev/null mount $SRC_DEV $SRC_MNT mount $DST_DEV $DST_MNT echo "Creating source filesystem" for ((t = 0; t < 10; t++)); do ( for ((i = 1; i <= 2; i++)); do xfs_io -f -c "pwrite -S 0xab 0 5000" \ $SRC_MNT/file_$i > /dev/null done ) & worker_pids[$t]=$! done wait ${worker_pids[@]} echo "Creating and sending snapshot" btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null /usr/bin/time -f "send took %e seconds"\ btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1 /usr/bin/time -f "receive took %e seconds" \ btrfs receive -f $SRC_MNT/send_file $DST_MNT umount $SRC_MNT umount $DST_MNT The results for 5 runs were the following: * Without this change average receive time was 26.49 seconds standard deviation of 2.53 seconds * With this change average receive time was 12.51 seconds standard deviation of 0.32 seconds Reported-by: Robbie Ko Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2a19413a7868..dae631ab5cb2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -101,7 +101,7 @@ static const unsigned char btrfs_type_by_mode[S_IFMT >> S_SHIFT] = { }; static int btrfs_setsize(struct inode *inode, struct iattr *attr); -static int btrfs_truncate(struct inode *inode); +static int btrfs_truncate(struct inode *inode, bool skip_writeback); static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent); static noinline int cow_file_range(struct inode *inode, struct page *locked_page, @@ -3625,7 +3625,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) goto out; } - ret = btrfs_truncate(inode); + ret = btrfs_truncate(inode, false); if (ret) btrfs_orphan_del(NULL, BTRFS_I(inode)); } else { @@ -5109,7 +5109,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) inode_dio_wait(inode); btrfs_inode_resume_unlocked_dio(BTRFS_I(inode)); - ret = btrfs_truncate(inode); + ret = btrfs_truncate(inode, newsize == oldsize); if (ret && inode->i_nlink) { int err; @@ -9087,7 +9087,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf) return ret; } -static int btrfs_truncate(struct inode *inode) +static int btrfs_truncate(struct inode *inode, bool skip_writeback) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; @@ -9098,10 +9098,12 @@ static int btrfs_truncate(struct inode *inode) u64 mask = fs_info->sectorsize - 1; u64 min_size = btrfs_calc_trunc_metadata_size(fs_info, 1); - ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask), - (u64)-1); - if (ret) - return ret; + if (!skip_writeback) { + ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask), + (u64)-1); + if (ret) + return ret; + } /* * Yes ladies and gentlemen, this is indeed ugly. The fact is we have -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More
[PATCH] Btrfs: send, fix issuing write op when processing hole in no data mode
From: Filipe MananaWhen doing an incremental send of a filesystem with the no-holes feature enabled, we end up issuing a write operation when using the no data mode send flag, instead of issuing an update extent operation. Fix this by issuing the update extent operation instead. Trivial reproducer: $ mkfs.btrfs -f -O no-holes /dev/sdc $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdc /mnt/sdc $ mount /dev/sdd /mnt/sdd $ xfs_io -f -c "pwrite -S 0xab 0 32K" /mnt/sdc/foobar $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap1 $ xfs_io -c "fpunch 8K 8K" /mnt/sdc/foobar $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap2 $ btrfs send /mnt/sdc/snap1 | btrfs receive /mnt/sdd $ btrfs send --no-data -p /mnt/sdc/snap1 /mnt/sdc/snap2 \ | btrfs receive -vv /mnt/sdd Before this change the output of the second receive command is: receiving snapshot snap2 uuid=f6922049-8c22-e544-9ff9-fc6755918447... utimes write foobar, offset 8192, len 8192 utimes foobar BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=f6922049-8c22-e544-9ff9-... After this change it is: receiving snapshot snap2 uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64... utimes update_extent foobar: offset=8192, len=8192 utimes foobar BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64... Signed-off-by: Filipe Manana --- fs/btrfs/send.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index f306c608dc28..484e2af793de 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -5005,6 +5005,9 @@ static int send_hole(struct send_ctx *sctx, u64 end) u64 len; int ret = 0; + if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA) + return send_update_extent(sctx, offset, end - offset); + p = fs_path_alloc(); if (!p) return -ENOMEM; -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: tests common: remove meaningless colon in extract_image()
The colon is meaningless so remove it. Signed-off-by: Su Yue--- tests/common | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/common b/tests/common index 8e5d0cde1b7e..7f641a004661 100644 --- a/tests/common +++ b/tests/common @@ -331,7 +331,7 @@ extract_image() case "$image" in *.img) rm -f "$image.restored" - : ;; + ;; *.img.xz) xz --decompress --keep "$image" || \ _fail "failed to decompress image $image" >&2 -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't hardcode the csum size in btrfs_ordered_sum_size
On 02/07/2018 05:19 PM, Nikolay Borisov wrote: Currently the function uses a hardcoded value for the checksum size of a sector. This is fine, given that we currently support only a single algorithm, whose checksum is 4 bytes == sizeof(u32). Despite not having other algorithms, btrfs' design supports using a different algorithm whith different space requirements. To future-proof the code query the size of the currently used algorithm from the in-memory copy of the super block. No functional changes. Signed-off-by: Nikolay BorisovReviewed-by: Su Yue --- fs/btrfs/ordered-data.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 56c4c0ee6381..c53e2cfb72d9 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -151,7 +151,9 @@ static inline int btrfs_ordered_sum_size(struct btrfs_fs_info *fs_info, unsigned long bytes) { int num_sectors = (int)DIV_ROUND_UP(bytes, fs_info->sectorsize); - return sizeof(struct btrfs_ordered_sum) + num_sectors * sizeof(u32); + int csum_size = btrfs_super_csum_size(fs_info->super_copy); + + return sizeof(struct btrfs_ordered_sum) + num_sectors * csum_size; } static inline void -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: Don't hardcode the csum size in btrfs_ordered_sum_size
On 2018年02月07日 17:19, Nikolay Borisov wrote: > Currently the function uses a hardcoded value for the checksum size of > a sector. This is fine, given that we currently support only a single > algorithm, whose checksum is 4 bytes == sizeof(u32). Despite not > having other algorithms, btrfs' design supports using a different > algorithm whith different space requirements. To future-proof the code > query the size of the currently used algorithm from the in-memory copy > of the super block. No functional changes. > > Signed-off-by: Nikolay BorisovReviewed-by: Qu Wenruo Thanks, Qu > --- > fs/btrfs/ordered-data.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h > index 56c4c0ee6381..c53e2cfb72d9 100644 > --- a/fs/btrfs/ordered-data.h > +++ b/fs/btrfs/ordered-data.h > @@ -151,7 +151,9 @@ static inline int btrfs_ordered_sum_size(struct > btrfs_fs_info *fs_info, >unsigned long bytes) > { > int num_sectors = (int)DIV_ROUND_UP(bytes, fs_info->sectorsize); > - return sizeof(struct btrfs_ordered_sum) + num_sectors * sizeof(u32); > + int csum_size = btrfs_super_csum_size(fs_info->super_copy); > + > + return sizeof(struct btrfs_ordered_sum) + num_sectors * csum_size; > } > > static inline void > signature.asc Description: OpenPGP digital signature
Re: [PATCH] btrfs-progs: fsck-tests: Cleanup the restored image for 028
Please ignore this one. I just forgot to remove unrelated patch. Thanks, Qu On 2018年02月07日 17:17, Qu Wenruo wrote: > Signed-off-by: Qu Wenruo> --- > tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh > b/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh > index 3928f548c3f9..4bbcfbae662e 100755 > --- a/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh > +++ b/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh > @@ -21,3 +21,5 @@ run_check "$TOP/btrfs" check "$TEST_DEV" > # mount test > run_check_mount_test_dev > run_check_umount_test_dev "$TEST_MNT" > +# don't forget to clean it up > +rm "$TEST_DEV" > signature.asc Description: OpenPGP digital signature
[PATCH] btrfs: Don't hardcode the csum size in btrfs_ordered_sum_size
Currently the function uses a hardcoded value for the checksum size of a sector. This is fine, given that we currently support only a single algorithm, whose checksum is 4 bytes == sizeof(u32). Despite not having other algorithms, btrfs' design supports using a different algorithm whith different space requirements. To future-proof the code query the size of the currently used algorithm from the in-memory copy of the super block. No functional changes. Signed-off-by: Nikolay Borisov--- fs/btrfs/ordered-data.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 56c4c0ee6381..c53e2cfb72d9 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -151,7 +151,9 @@ static inline int btrfs_ordered_sum_size(struct btrfs_fs_info *fs_info, unsigned long bytes) { int num_sectors = (int)DIV_ROUND_UP(bytes, fs_info->sectorsize); - return sizeof(struct btrfs_ordered_sum) + num_sectors * sizeof(u32); + int csum_size = btrfs_super_csum_size(fs_info->super_copy); + + return sizeof(struct btrfs_ordered_sum) + num_sectors * csum_size; } static inline void -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: ctree: Add extra level check for read_node_slot()
Strangely, we have level check in btrfs_print_tree() while we don't have the same check in read_node_slot(). That's to say, for the following corruption, btrfs_search_slot() or btrfs_next_leaf() can return invalid leaf: Parent eb: node XX level 1 ^^^ Child should be leaf (level 0) ... key (XXX XXX XXX) block YY Child eb: leaf YY level 1 ^^^ Something went wrong now And for the corrupted leaf returned, later caller can be screwed up easily. Although the root cause (powerloss, but still something wrong breaking metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to avoid SEGV. Reported-by: Ralph GaugesSigned-off-by: Qu Wenruo --- ctree.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/ctree.c b/ctree.c index 4fc33b14000a..ddb1e9cc6d37 100644 --- a/ctree.c +++ b/ctree.c @@ -22,6 +22,7 @@ #include "repair.h" #include "internal.h" #include "sizes.h" +#include "messages.h" static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_path *path, int level); @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct btrfs_key *key, struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info, struct extent_buffer *parent, int slot) { + struct extent_buffer *ret; int level = btrfs_header_level(parent); + if (slot < 0) return NULL; if (slot >= btrfs_header_nritems(parent)) @@ -649,8 +652,16 @@ struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info, if (level == 0) return NULL; - return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), + ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot), btrfs_node_ptr_generation(parent, slot)); + if (btrfs_header_level(ret) != level - 1) { + error("child eb corrupted: parent bytenr=%llu item=%d parent level=%d child level=%d", + btrfs_header_bytenr(parent), slot, + btrfs_header_level(parent), btrfs_header_level(ret)); + free_extent_buffer(ret); + return ERR_PTR(-EIO); + } + return ret; } static int balance_level(struct btrfs_trans_handle *trans, -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: fsck-tests: Cleanup the restored image for 028
Signed-off-by: Qu Wenruo--- tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh b/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh index 3928f548c3f9..4bbcfbae662e 100755 --- a/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh +++ b/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh @@ -21,3 +21,5 @@ run_check "$TOP/btrfs" check "$TEST_DEV" # mount test run_check_mount_test_dev run_check_umount_test_dev "$TEST_MNT" +# don't forget to clean it up +rm "$TEST_DEV" -- 2.16.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html