Re: Crash when unraring large archives on btrfs-filesystem

2018-02-07 Thread Nikolay Borisov


On  7.02.2018 21:57, Stefan Malte Schumacher wrote:
> Hello,
> 
> 
> I have encountered what I think is a problem with btrfs, which causes
> my file server to become unresponsive. But let‘s start with the basic
> information:
> 
> uname -a = Linux mars 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2
> (2018-01-04) x86_64 GNU/Linux
> 
> btrfs –version = btrfs-progs v4.7.3
> 
> 
> Label: none uuid: 1609e4e1-4037-4d31-bf12-f84a691db5d8
> 
> Total devices 5 FS bytes used 7.15TiB
> 
> devid 1 size 3.64TiB used 2.90TiB path /dev/sda
> 
> devid 2 size 3.64TiB used 2.90TiB path /dev/sdb
> 
> devid 3 size 3.64TiB used 2.90TiB path /dev/sdc
> 
> devid 4 size 3.64TiB used 2.90TiB path /dev/sdd
> 
> devid 5 size 3.64TiB used 2.90TiB path /dev/sde
> 
> 
> Data, RAID1: total=7.25TiB, used=7.14TiB
> 
> System, RAID1: total=40.00MiB, used=1.02MiB
> 
> Metadata, RAID1: total=9.00GiB, used=7.75GiB
> 
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> 
> The following entry in kern.log seems to be the point where it all
> started and which causes me to believe that the problem is related to
> btrfs. At that time the server was unraring
> 
> a large archive stored on the btrfs filesystem.
> 
> 
> Feb 5 21:22:42 mars kernel: [249979.829318] BTRFS info (device sda):
> The free space cache file (4701944807424) is invalid. skip it

This tells you that your freespace cahe is likely corrupted, this is not
that critical but it's highly recommended you rebuild it. You can do
that by mounting your file system with the 'clear_cache' mount option.
For more information check
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)

> 
> Feb 5 21:22:42 mars kernel: [249979.829318]
> 
> Feb 5 21:25:12 mars kernel: [250090.149452] unrar: page allocation
> stalls for 12104ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)
> 
> Feb 5 21:25:12 mars kernel: [250116.605420] [] ?
> alloc_pages_vma+0xae/0x260
> 
> Feb 5 21:25:12 mars kernel: [250116.605422] [] ?
> __read_swap_cache_async+0x118/0x1c0
> 
> Feb 5 21:25:12 mars kernel: [250116.605423] [] ?
> read_swap_cache_async+0x24/0x60
> 
> Feb 5 21:25:12 mars kernel: [250116.605425] [] ?
> swapin_readahead+0x1a9/0x210
> 
> Feb 5 21:25:12 mars kernel: [250116.605427] [] ?
> radix_tree_lookup_slot+0x1e/0x50
> 
> Feb 5 21:25:12 mars kernel: [250116.605429] [] ?
> find_get_entry+0x1b/0x100
> 
> Feb 5 21:25:12 mars kernel: [250116.605431] [] ?
> pagecache_get_page+0x30/0x2b0
> 
> Feb 5 21:25:12 mars kernel: [250116.605434] [] ?
> do_swap_page+0x2a3/0x750
> 
> Feb 5 21:25:12 mars kernel: [250116.605436] [] ?
> handle_mm_fault+0x892/0x12d0
> 
> Feb 5 21:25:12 mars kernel: [250116.605438] [] ?
> __do_page_fault+0x25c/0x500
> 
> Feb 5 21:25:12 mars kernel: [250116.605440] [] ?
> page_fault+0x28/0x30
> 
> Feb 5 21:25:12 mars kernel: [250116.605442] [] ?
> __get_user_8+0x1b/0x25
> 
> Feb 5 21:25:12 mars kernel: [250116.605445] [] ?
> exit_robust_list+0x30/0x110
> 
> Feb 5 21:25:12 mars kernel: [250116.605447] [] ?
> mm_release+0xf8/0x130
> 
> Feb 5 21:25:12 mars kernel: [250116.605449] [] ?
> do_exit+0x150/0xae0
> 
> Feb 5 21:25:12 mars kernel: [250116.605450] [] ?
> do_group_exit+0x3a/0xa0
> 
> Feb 5 21:25:12 mars kernel: [250116.605452] [] ?
> get_signal+0x297/0x640
> 
> Feb 5 21:25:12 mars kernel: [250116.605454] [] ?
> do_signal+0x36/0x6a0
> 
> Feb 5 21:25:12 mars kernel: [250116.605457] [] ?
> exit_to_usermode_loop+0x71/0xb0
> 
> Feb 5 21:25:12 mars kernel: [250116.605459] [] ?
> syscall_return_slowpath+0x54/0x60
> 
> Feb 5 21:25:12 mars kernel: [250116.605461] [] ?
> system_call_fast_compare_end+0xb5/0xb7

THis call trace essentially tells you that your server sort of run out
of memory and you began to swap in i.e. read from the disk and it took a
rather long time (12s). Here no btrfs it is involved at all.

> 
> Feb 5 21:25:12 mars kernel: [250116.605462] Mem-Info:
> 
> Feb 5 21:25:12 mars kernel: [250116.605466] active_anon:44
> inactive_anon:69 isolated_anon:0
> 
> Feb 5 21:25:12 mars kernel: [250116.605466] active_file:3557188
> inactive_file:407932 isolated_file:1024
> 
> Feb 5 21:25:12 mars kernel: [250116.605466] unevictable:0 dirty:409214
> writeback:62 unstable:0
> 
> Feb 5 21:25:12 mars kernel: [250116.605466] slab_reclaimable:37022
> slab_unreclaimable:10475
> 
> Feb 5 21:25:12 mars kernel: [250116.605466] mapped:2329 shmem:21
> pagetables:3522 bounce:0
> 
> Feb 5 21:25:12 mars kernel: [250116.605466] free:34036 free_pcp:291 free_cma:0
> 
> Feb 5 21:25:12 mars kernel: [250116.605471] Node 0 active_anon:176kB
> inactive_anon:276kB active_file:14228752kB inactive_file:1631728kB
> unevictable:0kB isolated(anon):0kB isolated(file):4096kB mapped:9316kB
> dirty:1636856kB writeback:248kB shmem:84kB shmem_thp: 0kB
> shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
> pages_scanned:13631918 all_unreclaimable? no
> 
> 
> Searching for "btrfs" in kern.log shows a lot of entries for kern.log
> and kern.log.1 but but none before that point of time. I think that
> there is a relation 

Re: [PATCH v2] btrfs-progs: ctree: Add extra level check for read_node_slot()

2018-02-07 Thread Nikolay Borisov


On  8.02.2018 02:59, Qu Wenruo wrote:
> Strangely, we have level check in btrfs_print_tree() while we don't have
> the same check in read_node_slot().
> 
> That's to say, for the following corruption, btrfs_search_slot() or
> btrfs_next_leaf() can return invalid leaf:
> 
> Parent eb:
>   node XX level 1
>   ^^^
>   Child should be leaf (level 0)
>   ...
>   key (XXX XXX XXX) block YY
> 
> Child eb:
>   leaf YY level 1
>   ^^^
>   Something went wrong now
> 
> And for the corrupted leaf returned, later caller can be screwed up
> easily.
> 
> Although the root cause (powerloss, but still something wrong breaking
> metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to
> avoid SEGV.
> 
> Reported-by: Ralph Gauges 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

> ---
> changlog:
> v2:
>   Check if the extent buffer is up-to-date before checking its level to
>   avoid possible NULL pointer access.
> ---
>  ctree.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/ctree.c b/ctree.c
> index 4fc33b14000a..430805e3043f 100644
> --- a/ctree.c
> +++ b/ctree.c
> @@ -22,6 +22,7 @@
>  #include "repair.h"
>  #include "internal.h"
>  #include "sizes.h"
> +#include "messages.h"
>  
>  static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
> *root, struct btrfs_path *path, int level);
> @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct 
> btrfs_key *key,
>  struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
>  struct extent_buffer *parent, int slot)
>  {
> + struct extent_buffer *ret;
>   int level = btrfs_header_level(parent);
> +
>   if (slot < 0)
>   return NULL;
>   if (slot >= btrfs_header_nritems(parent))
> @@ -649,8 +652,19 @@ struct extent_buffer *read_node_slot(struct 
> btrfs_fs_info *fs_info,
>   if (level == 0)
>   return NULL;
>  
> - return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
> + ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
>  btrfs_node_ptr_generation(parent, slot));
> + if (!extent_buffer_uptodate(ret))
> + return ERR_PTR(-EIO);
> +
> + if (btrfs_header_level(ret) != level - 1) {
> + error("child eb corrupted: parent bytenr=%llu item=%d parent 
> level=%d child level=%d",
> +   btrfs_header_bytenr(parent), slot,
> +   btrfs_header_level(parent), btrfs_header_level(ret));
> + free_extent_buffer(ret);
> + return ERR_PTR(-EIO);
> + }
> + return ret;
>  }
>  
>  static int balance_level(struct btrfs_trans_handle *trans,
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: delete function btrfs_close_extra_devices()

2018-02-07 Thread Anand Jain


 ping ?

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 3/3] btrfs-progs: Add readme for export testsuits

2018-02-07 Thread Gu Jinxiang
Add the readme of command for export testsuits.
And add the excute method of exported testsuits.

Signed-off-by: Gu Jinxiang 
---
 tests/README.md | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tests/README.md b/tests/README.md
index 04d2ce2a..23f35cfc 100644
--- a/tests/README.md
+++ b/tests/README.md
@@ -48,6 +48,19 @@ $ TEST=001\* ./fsck-tests.sh
 will run the first test in fsck-tests subdirectory.
 
 
+## Package testsuit
+
+The tests can be export as a btrfs-progs-tests.tar.gz current path. Use:
+
+```shell
+$ make testsuite
+```
+
+
+And, after decompress btrfs-progs-tests.tar.gz, test can be run selectively
+from `tests/` directory introduced above.
+
+
 ## Test structure
 
 *tests/fsck-tests/:*
-- 
2.14.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 1/3] btrfs-progs: Add make testsuite command for export tests

2018-02-07 Thread Gu Jinxiang
Export the testsuite files to a separate tar.
Since fsck tests depend on btrfs-corrupt-block, and misc
tests depends on both btrfs-corrupt-block and fssum,
so set it as prerequisites for package commad.

Because,
althougth fssum can be generated by source that are all in
tests directory, and has no rely on the btrfs's structure.
But btrfs-corrupt-block deeply relys on btrfs's structure.
For consistency, at the present stage, generete the two
when create test tar.

Signed-off-by: Gu Jinxiang 
---
 .gitignore|  1 +
 Makefile  |  4 
 tests/export-tests.sh | 37 +
 testsuites-list   | 22 ++
 4 files changed, 64 insertions(+)
 create mode 100755 tests/export-tests.sh
 create mode 100644 testsuites-list

diff --git a/.gitignore b/.gitignore
index 8e607f6e..a41ad8ce 100644
--- a/.gitignore
+++ b/.gitignore
@@ -43,6 +43,7 @@ libbtrfs.so.0.1
 library-test
 library-test-static
 /fssum
+testsuites-id
 
 /tests/*-tests-results.txt
 /tests/test-console.txt
diff --git a/Makefile b/Makefile
index 6369e8f4..7eab0f4f 100644
--- a/Makefile
+++ b/Makefile
@@ -333,6 +333,10 @@ test-inst: all
 
 test: test-fsck test-mkfs test-convert test-misc test-fuzz test-cli
 
+testsuite: btrfs-corrupt-block fssum
+   @echo "Export tests as a package"
+   $(Q)bash tests/export-tests.sh
+
 #
 # NOTE: For static compiles, you need to have all the required libs
 #  static equivalent available
diff --git a/tests/export-tests.sh b/tests/export-tests.sh
new file mode 100755
index ..0ed7dd99
--- /dev/null
+++ b/tests/export-tests.sh
@@ -0,0 +1,37 @@
+#!/bin/bash
+# export the testsuite files to a separate tar
+
+TESTSUITES_LIST_FILE=$PWD/testsuites-list
+if ! [ -f $TESTSUITES_LIST_FILE ];then
+   echo "testsuites list file is not exsit."
+   exit 1
+fi
+
+TESTSUITES_LIST=$(cat $TESTSUITES_LIST_FILE)
+if [ -z "$TESTSUITES_LIST" ]; then
+   echo "no file be list in testsuites-list"
+   exit 1
+fi
+
+DEST="btrfs-progs-tests.tar.gz"
+if [ -f $DEST ];then
+   echo "remove exsit package: " $DEST
+   rm $DEST
+fi
+
+TEST_ID=$PWD/testsuites-id
+if [ -f $TEST_ID ];then
+   rm $TEST_ID
+fi
+VERSION=`./version.sh`
+TIMESTAMP=`date -u "+%Y-%m-%d %T %Z"`
+
+echo "git version: " $VERSION > $TEST_ID
+echo "this tar is created in: " $TIMESTAMP >> $TEST_ID
+
+echo "begin create tar:  " $DEST
+tar --exclude-vcs-ignores -zScf $DEST -C ../ $TESTSUITES_LIST
+if [ $? -eq 0 ]; then
+   echo "create tar successfully."
+fi
+rm $TEST_ID
diff --git a/testsuites-list b/testsuites-list
new file mode 100644
index ..a24591f5
--- /dev/null
+++ b/testsuites-list
@@ -0,0 +1,22 @@
+btrfs-progs/testsuites-id
+btrfs-progs/fssum
+btrfs-progs/btrfs-corrupt-block
+btrfs-progs/Documentation/
+btrfs-progs/tests/cli-tests
+btrfs-progs/tests/cli-tests.sh
+btrfs-progs/tests/common
+btrfs-progs/tests/common.convert
+btrfs-progs/tests/common.local
+btrfs-progs/tests/convert-tests
+btrfs-progs/tests/convert-tests.sh
+btrfs-progs/tests/fsck-tests
+btrfs-progs/tests/fsck-tests.sh
+btrfs-progs/tests/fuzz-tests/
+btrfs-progs/tests/fuzz-tests.sh
+btrfs-progs/tests/misc-tests/
+btrfs-progs/tests/misc-tests.sh
+btrfs-progs/tests/mkfs-tests/
+btrfs-progs/tests/mkfs-tests.sh
+btrfs-progs/tests/README.md
+btrfs-progs/tests/scan-results.sh
+btrfs-progs/tests/test-console.sh
-- 
2.14.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/3] Add support for export testsuits

2018-02-07 Thread Gu Jinxiang
Achieved:
1. export testsuite by:
 $ make testsuite
files list in testsuites-list will be added into tarball 
btrfs-progs-tests.tar.gz.

2. after decompress btrfs-progs-tests.tar.gz, run test by:
 $ TEST=`MASK` ./tests/mkfs-tests.sh
and, without MASK also be ok.
replenish:
 $ tar -xzvf ./btrfs-progs-tests.tar.gz
 $ ls
   btrfs-progs
tests directory and other files is in btrfs-progs.

Changelog:
v5->v4: modify patch2.
make TEST_TOP to represent tests directory.
and introduce INTERNAL_BIN for internal binaries.
v4->v3: modify patch2.
1.keep TOP used for binaries, and introduce TEST_TOP for other 
resources.
v3->v2:
patch1:
1.change command from `make package` to `make testsuite`
2.create btrfs-progs-tests.tar.gz in the current directory,
  so remove EXPORT variable.
3.add a listfile which list files to be added into tarball.
  and, add Documentation into the list. And revert the patch3
  in v2.
4.add some identification info of tarball
5.add temporary file testsuites-id to .gitignore

patch3: modify the readme according to the change of patch1.


v2->v1:
big change of realize idea.
from use EXEC parameter given to run the testsuite to export the testsuite 
files to
a separate tar, run from a script.


Gu Jinxiang (3):
  btrfs-progs: Add make testsuite command for export tests
  btrfs-progs: introduce TEST_TOP and INTERNAL_BIN for tests directory
and internal binaries
  btrfs-progs: Add readme for export testsuits

 .gitignore |  1 +
 Makefile   |  4 +++
 tests/README.md| 13 
 tests/cli-tests.sh | 15 ++---
 tests/cli-tests/001-btrfs/test.sh  |  2 +-
 .../cli-tests/002-balance-full-no-filters/test.sh  |  2 +-
 tests/cli-tests/003-fi-resize-args/test.sh |  2 +-
 .../cli-tests/004-send-parent-multi-subvol/test.sh |  2 +-
 tests/cli-tests/005-qgroup-show/test.sh|  2 +-
 tests/cli-tests/006-qgroup-show-sync/test.sh   |  2 +-
 tests/cli-tests/007-check-force/test.sh|  2 +-
 .../008-subvolume-get-set-default/test.sh  |  2 +-
 tests/common   | 16 ++
 tests/convert-tests.sh | 15 ++---
 tests/convert-tests/001-ext2-basic/test.sh |  4 +--
 tests/convert-tests/002-ext3-basic/test.sh |  4 +--
 tests/convert-tests/003-ext4-basic/test.sh |  4 +--
 .../004-ext2-backup-superblock-ranges/test.sh  |  2 +-
 .../convert-tests/005-delete-all-rollback/test.sh  |  4 +--
 tests/convert-tests/006-large-hole-extent/test.sh  |  4 +--
 .../007-unsupported-block-sizes/test.sh|  4 +--
 tests/convert-tests/008-readonly-image/test.sh |  4 +--
 tests/convert-tests/009-common-inode-flags/test.sh |  4 +--
 tests/convert-tests/010-reiserfs-basic/test.sh |  4 +--
 .../011-reiserfs-delete-all-rollback/test.sh   |  4 +--
 .../012-reiserfs-large-hole-extent/test.sh |  4 +--
 .../013-reiserfs-common-inode-flags/test.sh|  4 +--
 .../014-reiserfs-tail-handling/test.sh |  4 +--
 .../015-no-rollback-after-balance/test.sh  |  4 +--
 tests/export-tests.sh  | 37 ++
 tests/fsck-tests.sh| 17 +++---
 tests/fsck-tests/006-bad-root-items/test.sh|  2 +-
 tests/fsck-tests/012-leaf-corruption/test.sh   |  2 +-
 tests/fsck-tests/013-extent-tree-rebuild/test.sh   |  4 +--
 tests/fsck-tests/018-leaf-crossing-stripes/test.sh |  2 +-
 .../fsck-tests/019-non-skinny-false-alert/test.sh  |  2 +-
 tests/fsck-tests/020-extent-ref-cases/test.sh  |  2 +-
 .../021-partially-dropped-snapshot-case/test.sh|  2 +-
 tests/fsck-tests/022-qgroup-rescan-halfway/test.sh |  2 +-
 tests/fsck-tests/023-qgroup-stack-overflow/test.sh |  2 +-
 tests/fsck-tests/024-clear-space-cache/test.sh |  2 +-
 tests/fsck-tests/025-file-extents/test.sh  |  2 +-
 tests/fsck-tests/026-bad-dir-item-name/test.sh |  2 +-
 tests/fsck-tests/027-tree-reloc-tree/test.sh   |  2 +-
 .../028-unaligned-super-dev-sizes/test.sh  |  2 +-
 tests/fuzz-tests.sh| 15 ++---
 .../fuzz-tests/001-simple-check-unmounted/test.sh  |  4 +--
 tests/fuzz-tests/002-simple-image/test.sh  |  4 +--
 tests/fuzz-tests/003-multi-check-unmounted/test.sh |  4 +--
 tests/fuzz-tests/004-simple-dump-tree/test.sh  |  4 +--
 tests/fuzz-tests/005-simple-dump-super/test.sh |  4 +--
 tests/fuzz-tests/006-simple-tree-stats/test.sh |  4 +--
 tests/fuzz-tests/007-simple-super-recover/test.sh  |  4 +--
 tests/fuzz-tests/008-simple-chunk-recover/test.sh  |  4 +--
 tests/fuzz-tests/009-simple-zero-log/test.sh   |  4 +--
 tests/misc-tests.sh| 17 +++---
 

[PATCH v5 2/3] btrfs-progs: introduce TEST_TOP and INTERNAL_BIN for tests directory and internal binaries

2018-02-07 Thread Gu Jinxiang
Use TEST_TOP for tests directory.
And INTERNAL_BIN for internal binaries.

Signed-off-by: Gu Jinxiang 
---
 tests/cli-tests.sh  | 15 ++-
 tests/cli-tests/001-btrfs/test.sh   |  2 +-
 tests/cli-tests/002-balance-full-no-filters/test.sh |  2 +-
 tests/cli-tests/003-fi-resize-args/test.sh  |  2 +-
 tests/cli-tests/004-send-parent-multi-subvol/test.sh|  2 +-
 tests/cli-tests/005-qgroup-show/test.sh |  2 +-
 tests/cli-tests/006-qgroup-show-sync/test.sh|  2 +-
 tests/cli-tests/007-check-force/test.sh |  2 +-
 tests/cli-tests/008-subvolume-get-set-default/test.sh   |  2 +-
 tests/common| 16 ++--
 tests/convert-tests.sh  | 15 ++-
 tests/convert-tests/001-ext2-basic/test.sh  |  4 ++--
 tests/convert-tests/002-ext3-basic/test.sh  |  4 ++--
 tests/convert-tests/003-ext4-basic/test.sh  |  4 ++--
 .../004-ext2-backup-superblock-ranges/test.sh   |  2 +-
 tests/convert-tests/005-delete-all-rollback/test.sh |  4 ++--
 tests/convert-tests/006-large-hole-extent/test.sh   |  4 ++--
 tests/convert-tests/007-unsupported-block-sizes/test.sh |  4 ++--
 tests/convert-tests/008-readonly-image/test.sh  |  4 ++--
 tests/convert-tests/009-common-inode-flags/test.sh  |  4 ++--
 tests/convert-tests/010-reiserfs-basic/test.sh  |  4 ++--
 .../011-reiserfs-delete-all-rollback/test.sh|  4 ++--
 .../012-reiserfs-large-hole-extent/test.sh  |  4 ++--
 .../013-reiserfs-common-inode-flags/test.sh |  4 ++--
 tests/convert-tests/014-reiserfs-tail-handling/test.sh  |  4 ++--
 .../convert-tests/015-no-rollback-after-balance/test.sh |  4 ++--
 tests/fsck-tests.sh | 17 -
 tests/fsck-tests/006-bad-root-items/test.sh |  2 +-
 tests/fsck-tests/012-leaf-corruption/test.sh|  2 +-
 tests/fsck-tests/013-extent-tree-rebuild/test.sh|  4 ++--
 tests/fsck-tests/018-leaf-crossing-stripes/test.sh  |  2 +-
 tests/fsck-tests/019-non-skinny-false-alert/test.sh |  2 +-
 tests/fsck-tests/020-extent-ref-cases/test.sh   |  2 +-
 .../021-partially-dropped-snapshot-case/test.sh |  2 +-
 tests/fsck-tests/022-qgroup-rescan-halfway/test.sh  |  2 +-
 tests/fsck-tests/023-qgroup-stack-overflow/test.sh  |  2 +-
 tests/fsck-tests/024-clear-space-cache/test.sh  |  2 +-
 tests/fsck-tests/025-file-extents/test.sh   |  2 +-
 tests/fsck-tests/026-bad-dir-item-name/test.sh  |  2 +-
 tests/fsck-tests/027-tree-reloc-tree/test.sh|  2 +-
 tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh  |  2 +-
 tests/fuzz-tests.sh | 15 ++-
 tests/fuzz-tests/001-simple-check-unmounted/test.sh |  4 ++--
 tests/fuzz-tests/002-simple-image/test.sh   |  4 ++--
 tests/fuzz-tests/003-multi-check-unmounted/test.sh  |  4 ++--
 tests/fuzz-tests/004-simple-dump-tree/test.sh   |  4 ++--
 tests/fuzz-tests/005-simple-dump-super/test.sh  |  4 ++--
 tests/fuzz-tests/006-simple-tree-stats/test.sh  |  4 ++--
 tests/fuzz-tests/007-simple-super-recover/test.sh   |  4 ++--
 tests/fuzz-tests/008-simple-chunk-recover/test.sh   |  4 ++--
 tests/fuzz-tests/009-simple-zero-log/test.sh|  4 ++--
 tests/misc-tests.sh | 17 -
 tests/misc-tests/001-btrfstune-features/test.sh |  2 +-
 tests/misc-tests/002-uuid-rewrite/test.sh   |  6 +++---
 tests/misc-tests/003-zero-log/test.sh   |  4 ++--
 tests/misc-tests/004-shrink-fs/test.sh  |  2 +-
 .../005-convert-progress-thread-crash/test.sh   |  2 +-
 tests/misc-tests/006-image-on-missing-device/test.sh|  2 +-
 tests/misc-tests/007-subvolume-sync/test.sh |  2 +-
 tests/misc-tests/008-leaf-crossing-stripes/test.sh  |  2 +-
 tests/misc-tests/009-subvolume-sync-must-wait/test.sh   |  2 +-
 tests/misc-tests/010-convert-delete-ext2-subvol/test.sh |  2 +-
 tests/misc-tests/011-delete-missing-device/test.sh  |  2 +-
 tests/misc-tests/012-find-root-no-result/test.sh|  2 +-
 tests/misc-tests/013-subvolume-sync-crash/test.sh   |  2 +-
 tests/misc-tests/014-filesystem-label/test.sh   |  2 +-
 tests/misc-tests/015-dump-super-garbage/test.sh |  2 +-
 tests/misc-tests/016-send-clone-src/test.sh |  2 +-
 tests/misc-tests/017-recv-stream-malformatted/test.sh   |  2 +-
 tests/misc-tests/018-recv-end-of-stream/test.sh |  2 +-
 .../019-receive-clones-on-mounted-subvol/test.sh|  4 ++--
 tests/misc-tests/020-fix-superblock-corruption/test.sh  |  2 +-
 tests/misc-tests/021-image-multi-devices/test.sh   

[PATCH 1/3] btrfs-progs: add prerequisite mkfs.btrfs for test-cli

2018-02-07 Thread Gu Jinxiang
Since tests/cli-tests/002-balance-full-no-filters/test.sh need
the mkfs.btrfs for prerequisite.
So add the dependency in Makefile.

Signed-off-by: Gu Jinxiang 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 00e2137..034c943 100644
--- a/Makefile
+++ b/Makefile
@@ -315,7 +315,7 @@ test-fuzz: btrfs
@echo "[TEST]   fuzz-tests.sh"
$(Q)bash tests/fuzz-tests.sh
 
-test-cli: btrfs
+test-cli: btrfs mkfs.btrfs
@echo "[TEST]   cli-tests.sh"
$(Q)bash tests/cli-tests.sh
 
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: add prerequisite btrfs-convert for test-misc

2018-02-07 Thread Gu Jinxiang
Since tests/misc-tests/005-convert-progress-thread-crash/test.sh need
the btrfs-convert for prerequisite.
So add the dependency in Makefile.

Signed-off-by: Gu Jinxiang 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 9299411..7ccba62 100644
--- a/Makefile
+++ b/Makefile
@@ -303,7 +303,7 @@ test-fsck: btrfs btrfs-image btrfs-corrupt-block mkfs.btrfs 
btrfstune
$(Q)bash tests/fsck-tests.sh
 
 test-misc: btrfs btrfs-image btrfs-corrupt-block mkfs.btrfs btrfstune fssum \
-   btrfs-zero-log btrfs-find-root btrfs-select-super
+   btrfs-zero-log btrfs-find-root btrfs-select-super btrfs-convert
@echo "[TEST]   misc-tests.sh"
$(Q)bash tests/misc-tests.sh
 
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: add prerequisite btrfs-image for test-fuzz

2018-02-07 Thread Gu Jinxiang
Since tests/fuzz-tests/002-simple-image/test.sh need
the btrfs-image for prerequisite.
So add the dependency in Makefile.

Signed-off-by: Gu Jinxiang 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 034c943..9299411 100644
--- a/Makefile
+++ b/Makefile
@@ -311,7 +311,7 @@ test-mkfs: btrfs mkfs.btrfs
@echo "[TEST]   mkfs-tests.sh"
$(Q)bash tests/mkfs-tests.sh
 
-test-fuzz: btrfs
+test-fuzz: btrfs btrfs-image
@echo "[TEST]   fuzz-tests.sh"
$(Q)bash tests/fuzz-tests.sh
 
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IO Error (.snapshots is not a btrfs subvolume)

2018-02-07 Thread Andrei Borzenkov
08.02.2018 06:03, Chris Murphy пишет:
> On Wed, Feb 7, 2018 at 6:26 PM, Nick Gilmour  wrote:
>> Hi all,
>>
>> I have successfully restored a snapshot of root but now when I try to

How exactly was it done?

>> make a new snapshot I get this error:
>> IO Error (.snapshots is not a btrfs subvolume).
>> My snapshots were within @ which I renamed to @_old.
>> What can I do now? How can I move the snapshots from @_old/ into @ and
>> be able to make snapshots again?
>>
>> This is an excerpt of my subvolumes list:
>>
>> # btrfs subvolume list /
>> ID 257 gen 175397 top level 5 path @_old
>> ID 258 gen 175392 top level 5 path @pkg
>> ID 260 gen 175447 top level 5 path @tmp
>> ID 262 gen 19 top level 257 path @_old/var/lib/machines
>> ID 268 gen 175441 top level 5 path @test
>> ID 291 gen 175394 top level 257 path @_old/.snapshots
>> ID 292 gen 1705 top level 291 path @_old/.snapshots/1/snapshot
>> ...
>>
>> ID 3538 gen 175398 top level 291 path @_old/.snapshots/1594/snapshot
>> ID 3540 gen 175447 top level 5 path @
>>
> 
> 
> This is a snapper behavior. It creates .snapshots as a subvolume and
> then puts snapshots into that subvolume. If you snapshot a subvolume
> that contains another subvolume, the nested subvolume is not snapshot,
> instead a plain directory placeholder is created instead. So your
> restored snapshot contains a .snapshot directory rather than a
> .snapshot subvolume. Possibly if you delete the directory and create a
> new subvolume .snapshot, the problem will be fixed.
> 

No, you should create subvolume @/.snapshots and mount it as /.snapshots
(and have it in /etc/fstab). Snapshots should always be available in
running system under fixed path and this only possible when it is
mounted, otherwise after rollback /.snapshots will be lost just like it
happened now.

Exact subvolume name probably not matters that much, but better stick
with what installer does by default. It may matter for grub2 snapshots
handling.

Also openSUSE expects that actual root is subvolume under /.snapshots
which is valid snapper snapshot (i.e. it has valid metadata). Again, not
having this may confuse snapper.

It may be possible to move @_old/.snapshots into @/.snapshots, although
this breaks parent-child relationships those old snapshots cannot be
cleaned up without removing old root completely.

> I can't tell you how this will confuse snapper though, or how to
> unconfuse it. It pretty much expects to be in control of all
> snapshots, creation, deletion, and rollbacks. So if you do it manually
> for whatever reason, I think it can confuse snapper.
> 
> 

There was blog post recently outlining how to restore openSUSE root. You
may want to search opensuse or opensuse-factory mailing list. Ah found:

https://rootco.de/2018-01-19-opensuse-btrfs-subvolumes/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 17/18] btrfs-progs: lowmem check: end of removing parameters @trans in lowmem

2018-02-07 Thread Su Yue
Remove @trans in check_chunks_and_extents().

This patch let lowmem repair work again.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 40a179f75319..4aad69fc9eb1 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4872,7 +4872,6 @@ out:
  */
 int check_chunks_and_extents_lowmem(struct btrfs_fs_info *fs_info)
 {
-   struct btrfs_trans_handle *trans = NULL;
struct btrfs_path path;
struct btrfs_key old_key;
struct btrfs_key key;
@@ -4884,14 +4883,6 @@ int check_chunks_and_extents_lowmem(struct btrfs_fs_info 
*fs_info)
 
root = fs_info->fs_root;
 
-   if (repair) {
-   trans = btrfs_start_transaction(fs_info->extent_root, 1);
-   if (IS_ERR(trans)) {
-   error("failed to start transaction before check");
-   return PTR_ERR(trans);
-   }
-   }
-
root1 = root->fs_info->chunk_root;
ret = check_btrfs_root(root1, 0, 1);
err |= ret;
@@ -4961,10 +4952,6 @@ out:
err &= ~BG_ACCOUNTING_ERROR;
}
 
-   if (trans)
-   btrfs_commit_transaction(trans, root->fs_info->extent_root);
-
btrfs_release_path();
-
return err;
 }
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 15/18] btrfs-progs: lowmem check: remove parameter @trans of check_btrfs_root()

2018-02-07 Thread Su Yue
Remove parameters @trans of delete_extent_item() and walk_down_tree_v2().

Note: This patch and next patches cause error in lowmem repair like:
"Error: Commit_root already set when starting transaction".
This error will disappear after removing @trans finished.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index d4c8de4e69af..d92278d2993c 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4271,8 +4271,7 @@ out:
  * Returns <0  Fatal error, must exit the whole check
  * Returns 0   No errors found
  */
-static int walk_down_tree(struct btrfs_trans_handle *trans,
- struct btrfs_root *root, struct btrfs_path *path,
+static int walk_down_tree(struct btrfs_root *root, struct btrfs_path *path,
  int *level, struct node_refs *nrefs, int ext_ref,
  int check_all)
 {
@@ -4585,8 +4584,7 @@ out:
  * Returns 0  represents OK.
  * Returns >0 represents error bits.
  */
-static int check_btrfs_root(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root, unsigned int ext_ref,
+static int check_btrfs_root(struct btrfs_root *root, unsigned int ext_ref,
int check_all)
 {
struct btrfs_path path;
@@ -4631,7 +4629,7 @@ static int check_btrfs_root(struct btrfs_trans_handle 
*trans,
}
 
while (1) {
-   ret = walk_down_tree(trans, root, , , ,
+   ret = walk_down_tree(root, , , ,
 ext_ref, check_all);
 
if (ret > 0)
@@ -4667,7 +4665,7 @@ out:
 static int check_fs_root(struct btrfs_root *root, unsigned int ext_ref)
 {
reset_cached_block_groups(root->fs_info);
-   return check_btrfs_root(NULL, root, ext_ref, 0);
+   return check_btrfs_root(root, ext_ref, 0);
 }
 
 /*
@@ -4871,11 +4869,11 @@ int check_chunks_and_extents_lowmem(struct 
btrfs_fs_info *fs_info)
}
 
root1 = root->fs_info->chunk_root;
-   ret = check_btrfs_root(trans, root1, 0, 1);
+   ret = check_btrfs_root(root1, 0, 1);
err |= ret;
 
root1 = root->fs_info->tree_root;
-   ret = check_btrfs_root(trans, root1, 0, 1);
+   ret = check_btrfs_root(root1, 0, 1);
err |= ret;
 
btrfs_init_path();
@@ -4906,7 +4904,7 @@ int check_chunks_and_extents_lowmem(struct btrfs_fs_info 
*fs_info)
goto next;
}
 
-   ret = check_btrfs_root(trans, cur_root, 0, 1);
+   ret = check_btrfs_root(cur_root, 0, 1);
err |= ret;
 
if (key.objectid == BTRFS_TREE_RELOC_OBJECTID)
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 05/18] btrfs-progs: lowmem check: introduce mark/clear_block_groups_full()

2018-02-07 Thread Su Yue
Excluding or pining all metadata blocks is not time-efficient for large
storage filesystems.
Here is another way to mark all metadata block groups full and allocate
a new chunk for CoW. So new reservered extents never overwrite
extents.

Introduce modify_block_groups_cache() to modify all blocks groups
cache state and set all extents in block groups unfree in free space
cache.
mark/clear_block_groups_full() wraps above function.

Suggested-by: Qu Wenruo 
Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 93 +
 1 file changed, 93 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 1fc84f1e8c44..a200c28a9cf7 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -233,6 +233,99 @@ static int update_nodes_refs(struct btrfs_root *root, u64 
bytenr,
return 0;
 }
 
+/*
+ * Mark all extents unfree in the block group. And set @block_group->cached
+ * according to @cache.
+ */
+static int modify_block_group_cache(struct btrfs_fs_info *fs_info,
+   struct btrfs_block_group_cache *block_group, int cache)
+{
+   struct extent_io_tree *free_space_cache = _info->free_space_cache;
+   u64 start = block_group->key.objectid;
+   u64 end = start + block_group->key.offset;
+
+   if (cache && !block_group->cached) {
+   block_group->cached = 1;
+   clear_extent_dirty(free_space_cache, start, end - 1);
+   }
+
+   if (!cache && block_group->cached) {
+   block_group->cached = 0;
+   clear_extent_dirty(free_space_cache, start, end - 1);
+   }
+   return 0;
+}
+
+/*
+ * Modify block groups which have @flags unfree in free space cache.
+ *
+ * @cache: if 0, clear block groups cache state;
+ * not 0, mark blocks groups cached.
+ */
+static int modify_block_groups_cache(struct btrfs_fs_info *fs_info, u64 flags,
+int cache)
+{
+   struct btrfs_root *root = fs_info->extent_root;
+   struct btrfs_key key;
+   struct btrfs_path path;
+   struct btrfs_block_group_cache *bg_cache;
+   struct btrfs_block_group_item *bi;
+   struct btrfs_block_group_item bg_item;
+   struct extent_buffer *eb;
+   int slot;
+   int ret;
+
+   key.objectid = 0;
+   key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+   key.offset = 0;
+
+   btrfs_init_path();
+   ret = btrfs_search_slot(NULL, root, , , 0, 0);
+   if (ret < 0) {
+   error("fail to search block groups due to %s", strerror(-ret));
+   goto out;
+   }
+
+   while (1) {
+   eb = path.nodes[0];
+   slot = path.slots[0];
+   btrfs_item_key_to_cpu(eb, , slot);
+   bg_cache = btrfs_lookup_block_group(fs_info, key.objectid);
+   if (!bg_cache) {
+   ret = -ENOENT;
+   goto out;
+   }
+
+   bi = btrfs_item_ptr(eb, slot, struct btrfs_block_group_item);
+   read_extent_buffer(eb, _item, (unsigned long)bi,
+  sizeof(bg_item));
+   if (btrfs_block_group_flags(_item) & flags)
+   modify_block_group_cache(fs_info, bg_cache, cache);
+
+   ret = btrfs_next_item(root, );
+   if (ret > 0) {
+   ret = 0;
+   goto out;
+   }
+   if (ret < 0)
+   goto out;
+   }
+
+out:
+   btrfs_release_path();
+   return ret;
+}
+
+static int mark_block_groups_full(struct btrfs_fs_info *fs_info, u64 flags)
+{
+   return modify_block_groups_cache(fs_info, flags, 1);
+}
+
+static int clear_block_groups_full(struct btrfs_fs_info *fs_info, u64 flags)
+{
+   return modify_block_groups_cache(fs_info, flags, 0);
+}
+
 /*
  * This function only handles BACKREF_MISSING,
  * If corresponding extent item exists, increase the ref, else insert an extent
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 16/18] btrfs-progs: lowmem check: introduce repair_block_accounting()

2018-02-07 Thread Su Yue
Introduce repair_block_accounting() which calls
btrfs_fix_block_accounting() to repair block group accouting.

Replace btrfs_fix_block_accounting() with the new function.

Note: This patch and next patches cause error in lowmem repair like:
"Error: Commit_root already set when starting transaction".
This error will disappear after removing @trans finished.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index d92278d2993c..40a179f75319 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -537,6 +537,30 @@ static int end_avoid_extents_overwrite(struct 
btrfs_fs_info *fs_info)
return ret;
 }
 
+/*
+ * Wrapper function for btrfs_fix_block_accounting().
+ *
+ * Returns 0 on success.
+ * Returns != 0  on error.
+ */
+static int repair_block_accounting(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_trans_handle *trans = NULL;
+   struct btrfs_root *root = fs_info->extent_root;
+   int ret;
+
+   trans = btrfs_start_transaction(root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   error("fail to start transaction %s", strerror(-ret));
+   return ret;
+   }
+
+   ret = btrfs_fix_block_accounting(trans, root);
+   btrfs_commit_transaction(trans, root);
+   return ret;
+}
+
 /*
  * This function only handles BACKREF_MISSING,
  * If corresponding extent item exists, increase the ref, else insert an extent
@@ -4930,7 +4954,7 @@ out:
 
reset_cached_block_groups(fs_info);
/* update block accounting */
-   ret = btrfs_fix_block_accounting(trans, root);
+   ret = repair_block_accounting(fs_info);
if (ret)
err |= ret;
else
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 12/18] btrfs-progs: lowmem check: remove parameter @trans of repair_extent_item()

2018-02-07 Thread Su Yue
This patch removes parameter @trans of repair_extent_item().
It calls try_avoid_extents_overwrite() and starts a transaction by
itself.

Note: This patch and next patches cause error in lowmem repair like:
"Error: Commit_root already set when starting transaction".
This error will disappear after removing @trans finished.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 54 +
 1 file changed, 34 insertions(+), 20 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 53377848f361..443fa513a13e 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -3588,40 +3588,55 @@ out:
  *   means error after repair
  * Returns  0   nothing happened
  */
-static int repair_extent_item(struct btrfs_trans_handle *trans,
- struct btrfs_root *root, struct btrfs_path *path,
+static int repair_extent_item(struct btrfs_root *root, struct btrfs_path *path,
  u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
  u64 owner, u64 offset, int err)
 {
+   struct btrfs_trans_handle *trans;
+   struct btrfs_root *extent_root = root->fs_info->extent_root;
struct btrfs_key old_key;
int freed = 0;
int ret;
 
btrfs_item_key_to_cpu(path->nodes[0], _key, path->slots[0]);
 
-   if (err & (REFERENCER_MISSING | REFERENCER_MISMATCH)) {
-   /* delete the backref */
-   ret = btrfs_free_extent(trans, root->fs_info->fs_root, bytenr,
- num_bytes, parent, root_objectid, owner, offset);
-   if (!ret) {
-   freed = 1;
-   err &= ~REFERENCER_MISSING;
-   printf("Delete backref in extent [%llu %llu]\n",
-  bytenr, num_bytes);
-   } else {
-   error("fail to delete backref in extent [%llu %llu]",
-  bytenr, num_bytes);
-   }
+   if ((err & (REFERENCER_MISSING | REFERENCER_MISMATCH)) == 0)
+   return err;
+
+   ret = avoid_extents_overwrite(root->fs_info);
+   if (ret)
+   return err;
+
+   trans = btrfs_start_transaction(extent_root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   error("fail to start transaction %s", strerror(-ret));
+   /* nothing happened */
+   ret = 0;
+   goto out;
}
+   /* delete the backref */
+   ret = btrfs_free_extent(trans, root->fs_info->fs_root, bytenr,
+   num_bytes, parent, root_objectid, owner, offset);
+   if (!ret) {
+   freed = 1;
+   err &= ~REFERENCER_MISSING;
+   printf("Delete backref in extent [%llu %llu]\n",
+  bytenr, num_bytes);
+   } else {
+   error("fail to delete backref in extent [%llu %llu]",
+ bytenr, num_bytes);
+   }
+   btrfs_commit_transaction(trans, extent_root);
 
/* btrfs_free_extent may delete the extent */
btrfs_release_path(path);
ret = btrfs_search_slot(NULL, root, _key, path, 0, 0);
-
if (ret)
ret = -ENOENT;
else if (freed)
ret = err;
+out:
return ret;
 }
 
@@ -3631,8 +3646,7 @@ static int repair_extent_item(struct btrfs_trans_handle 
*trans,
  *
  * Since we don't use extent_record anymore, introduce new error bit
  */
-static int check_extent_item(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info,
+static int check_extent_item(struct btrfs_fs_info *fs_info,
 struct btrfs_path *path)
 {
struct btrfs_extent_item *ei;
@@ -3763,7 +3777,7 @@ next:
}
 
if (err && repair) {
-   ret = repair_extent_item(trans, fs_info->extent_root, path,
+   ret = repair_extent_item(fs_info->extent_root, path,
 key.objectid, num_bytes, parent, root_objectid,
 owner, owner_offset, ret);
if (ret < 0)
@@ -4183,7 +4197,7 @@ again:
break;
case BTRFS_EXTENT_ITEM_KEY:
case BTRFS_METADATA_ITEM_KEY:
-   ret = check_extent_item(trans, fs_info, path);
+   ret = check_extent_item(fs_info, path);
err |= ret;
break;
case BTRFS_EXTENT_CSUM_KEY:
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 13/18] btrfs-progs: lowmem check: remove parameter @trans of check_leaf_items()

2018-02-07 Thread Su Yue
This patch removes parameter @trans of check_leaf_items().

Note: This patch and next patches cause error in lowmem repair like:
"Error: Commit_root already set when starting transaction".
This error will disappear after removing @trans finished.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 443fa513a13e..a7660a25b844 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4139,8 +4139,7 @@ out:
 /*
  * Main entry function to check known items and update related accounting info
  */
-static int check_leaf_items(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root, struct btrfs_path *path,
+static int check_leaf_items(struct btrfs_root *root, struct btrfs_path *path,
struct node_refs *nrefs, int account_bytes)
 {
struct btrfs_fs_info *fs_info = root->fs_info;
@@ -4336,7 +4335,7 @@ static int walk_down_tree(struct btrfs_trans_handle 
*trans,
ret = process_one_leaf(root, path, nrefs,
   level, ext_ref);
else
-   ret = check_leaf_items(trans, root, path,
+   ret = check_leaf_items(root, path,
   nrefs, account_file_data);
err |= ret;
break;
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 11/18] btrfs-progs: lowmem check: remove parameter @trans of repair_chunk_item()

2018-02-07 Thread Su Yue
This patch removes parameter @trans of repair_chunk_item().
It calls try_avoid_extents_overwrite() and starts a transaction by
itself.

Note: This patch and next patches cause error in lowmem repair like:
"Error: Commit_root already set when starting transaction".
This error will disappear after removing @trans finished.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 48 
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 272e658296e7..53377848f361 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4026,13 +4026,14 @@ out:
  *
  * Returns error after repair.
  */
-static int repair_chunk_item(struct btrfs_trans_handle *trans,
-struct btrfs_root *chunk_root,
+static int repair_chunk_item(struct btrfs_root *chunk_root,
 struct btrfs_path *path, int err)
 {
struct btrfs_chunk *chunk;
struct btrfs_key chunk_key;
struct extent_buffer *eb = path->nodes[0];
+   struct btrfs_root *extent_root = chunk_root->fs_info->extent_root;
+   struct btrfs_trans_handle *trans;
u64 length;
int slot = path->slots[0];
u64 type;
@@ -4045,21 +4046,36 @@ static int repair_chunk_item(struct btrfs_trans_handle 
*trans,
type = btrfs_chunk_type(path->nodes[0], chunk);
length = btrfs_chunk_length(eb, chunk);
 
-   if (err & REFERENCER_MISSING) {
-   ret = btrfs_make_block_group(trans, chunk_root->fs_info, 0,
-type, chunk_key.offset, length);
-   if (ret) {
-   error("fail to add block group item[%llu %llu]",
- chunk_key.offset, length);
-   goto out;
-   } else {
-   err &= ~REFERENCER_MISSING;
-   printf("Added block group item[%llu %llu]\n",
-  chunk_key.offset, length);
-   }
+   /* now repair only adds block group */
+   if ((err & REFERENCER_MISSING) == 0)
+   return err;
+
+   ret = avoid_extents_overwrite(chunk_root->fs_info);
+   if (ret)
+   return ret;
+
+   trans = btrfs_start_transaction(extent_root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   error("fail to start transaction %s", strerror(-ret));
+   return ret;
}
 
-out:
+   ret = btrfs_make_block_group(trans, chunk_root->fs_info, 0, type,
+chunk_key.offset, length);
+   if (ret) {
+   error("fail to add block group item[%llu %llu]",
+ chunk_key.offset, length);
+   } else {
+   err &= ~REFERENCER_MISSING;
+   printf("Added block group item[%llu %llu]\n", chunk_key.offset,
+  length);
+   }
+
+   btrfs_commit_transaction(trans, extent_root);
+   if (ret)
+   error("fail to repair item(s) related to chunk item[%llu %llu]",
+ chunk_key.objectid, chunk_key.offset);
return err;
 }
 
@@ -4158,7 +4174,7 @@ again:
case BTRFS_CHUNK_ITEM_KEY:
ret = check_chunk_item(fs_info, eb, slot);
if (repair && ret)
-   ret = repair_chunk_item(trans, root, path, ret);
+   ret = repair_chunk_item(root, path, ret);
err |= ret;
break;
case BTRFS_DEV_EXTENT_KEY:
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 06/18] btrfs-progs: lowmem check: introduce try_force_cow_in_new_chunk()

2018-02-07 Thread Su Yue
Introduce create_chunk_and_block_block_group() to allocate new chunk
and corresponding block group.

The new function force_cow_in_new_chunk() first allocates new chunk
and records its start.
Then it modifies all metadata block groups cached and full.
Finally it marks the new block group uncached and unfree.
In the next CoW, extents states will be updated automatically by
cache_block_group().

New function try_to_force_cow_in_new_chunk() will try to mark block
groups full, allocate a new chunk and records the start.
If the last allocated chunk is almost full, a new chunk will be
allocated.

Suggested-by: Qu Wenruo 
Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 165 
 1 file changed, 165 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index a200c28a9cf7..3649d570e11c 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -326,6 +326,171 @@ static int clear_block_groups_full(struct btrfs_fs_info 
*fs_info, u64 flags)
return modify_block_groups_cache(fs_info, flags, 0);
 }
 
+static int create_chunk_and_block_group(struct btrfs_fs_info *fs_info,
+   u64 flags, u64 *start, u64 *nbytes)
+{
+   struct btrfs_trans_handle *trans;
+   struct btrfs_root *root = fs_info->extent_root;
+   int ret;
+
+   if ((flags & BTRFS_BLOCK_GROUP_TYPE_MASK) == 0)
+   return -EINVAL;
+
+   trans = btrfs_start_transaction(root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   error("error starting transaction %s", strerror(-ret));
+   return ret;
+   }
+   ret = btrfs_alloc_chunk(trans, fs_info, start, nbytes, flags);
+   if (ret) {
+   error("fail to allocate new chunk %s", strerror(-ret));
+   goto out;
+   }
+   ret = btrfs_make_block_group(trans, fs_info, 0, flags, *start,
+*nbytes);
+   if (ret) {
+   error("fail to make block group for chunk %llu %llu %s",
+ *start, *nbytes, strerror(-ret));
+   goto out;
+   }
+out:
+   btrfs_commit_transaction(trans, root);
+   return ret;
+}
+
+static int force_cow_in_new_chunk(struct btrfs_fs_info *fs_info,
+ u64 *start_ret)
+{
+   struct btrfs_block_group_cache *bg;
+   u64 start;
+   u64 nbytes;
+   u64 alloc_profile;
+   u64 flags;
+   int ret;
+
+   alloc_profile = (fs_info->avail_metadata_alloc_bits &
+fs_info->metadata_alloc_profile);
+   flags = BTRFS_BLOCK_GROUP_METADATA | alloc_profile;
+   if (btrfs_fs_incompat(fs_info, MIXED_GROUPS))
+   flags |= BTRFS_BLOCK_GROUP_DATA;
+
+   ret = create_chunk_and_block_group(fs_info, flags, , );
+   if (ret)
+   goto err;
+   printf("Created new chunk [%llu %llu]\n", start, nbytes);
+
+   flags = BTRFS_BLOCK_GROUP_METADATA;
+   /* Mark all metadata block groups cached and full in free space*/
+   ret = mark_block_groups_full(fs_info, flags);
+   if (ret)
+   goto clear_bgs_full;
+
+   bg = btrfs_lookup_block_group(fs_info, start);
+   if (!bg) {
+   ret = -ENOENT;
+   error("fail to look up block group %llu %llu", start, nbytes);
+   goto clear_bgs_full;
+   }
+
+   /* Clear block group cache just allocated */
+   ret = modify_block_group_cache(fs_info, bg, 0);
+   if (ret)
+   goto clear_bgs_full;
+   if (start_ret)
+   *start_ret = start;
+   return 0;
+
+clear_bgs_full:
+   clear_block_groups_full(fs_info, flags);
+err:
+   return ret;
+}
+
+/*
+ * Returns 0 means not almost full.
+ * Returns >0 means almost full.
+ * Returns <0 means fatal error.
+ */
+static int is_chunk_almost_full(struct btrfs_fs_info *fs_info, u64 start)
+{
+   struct btrfs_path path;
+   struct btrfs_key key;
+   struct btrfs_root *root = fs_info->extent_root;
+   struct btrfs_block_group_item *bi;
+   struct btrfs_block_group_item bg_item;
+   struct extent_buffer *eb;
+   u64 used;
+   u64 total;
+   u64 min_free;
+   int ret;
+   int slot;
+
+   key.objectid = start;
+   key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
+   key.offset = (u64)-1;
+
+   btrfs_init_path();
+   ret = btrfs_search_slot(NULL, root, , , 0, 0);
+   if (!ret)
+   ret = -EIO;
+   if (ret < 0)
+   goto out;
+   ret = btrfs_previous_item(root, , start,
+ BTRFS_BLOCK_GROUP_ITEM_KEY);
+   if (ret) {
+   error("failed to find block group %llu", start);
+   ret = -ENOENT;
+   goto out;
+   }
+
+   eb = path.nodes[0];
+   slot = path.slots[0];
+   btrfs_item_key_to_cpu(eb, , slot);
+   

[PATCH v4 14/18] btrfs-progs: lowmem check: remove parameter @trans of repair_tree_back_ref()

2018-02-07 Thread Su Yue
This patch removes parameter @trans of repair_tree_back_ref().
It calls try_avoid_extents_overwrite() and starts a transaction by
itself.

Note: This patch and next patches cause error in lowmem repair like:
"Error: Commit_root already set when starting transaction".
This error will disappear after removing @trans finished.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index a7660a25b844..d4c8de4e69af 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -544,11 +544,11 @@ static int end_avoid_extents_overwrite(struct 
btrfs_fs_info *fs_info)
  *
  * Returns error bits after repair.
  */
-static int repair_tree_block_ref(struct btrfs_trans_handle *trans,
-struct btrfs_root *root,
+static int repair_tree_block_ref(struct btrfs_root *root,
 struct extent_buffer *node,
 struct node_refs *nrefs, int level, int err)
 {
+   struct btrfs_trans_handle *trans = NULL;
struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_root *extent_root = fs_info->extent_root;
struct btrfs_path path;
@@ -598,6 +598,16 @@ static int repair_tree_block_ref(struct btrfs_trans_handle 
*trans,
if (nrefs->full_backref[level] != 0)
flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF;
 
+   ret = avoid_extents_overwrite(root->fs_info);
+   if (ret)
+   goto out;
+   trans = btrfs_start_transaction(extent_root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   trans = NULL;
+   error("fail to start transaction %s", strerror(-ret));
+   goto out;
+   }
/* insert an extent item */
if (insert_extent) {
struct btrfs_disk_key copy_key;
@@ -663,6 +673,8 @@ static int repair_tree_block_ref(struct btrfs_trans_handle 
*trans,
 
nrefs->refs[level]++;
 out:
+   if (trans)
+   btrfs_commit_transaction(trans, extent_root);
btrfs_release_path();
if (ret) {
error(
@@ -4304,7 +4316,7 @@ static int walk_down_tree(struct btrfs_trans_handle 
*trans,
   btrfs_header_owner(cur), nrefs);
 
if (repair && ret)
-   ret = repair_tree_block_ref(trans, root,
+   ret = repair_tree_block_ref(root,
path->nodes[*level], nrefs, *level, ret);
err |= ret;
 
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 07/18] btrfs-progs: lowmem check: introduce avoid_extents_overwrite()

2018-02-07 Thread Su Yue
Another global u64 last_allocated_chunk records the last chunk start
allocated by lowmem repair.
Although global variable is not so graceful, it simplifies codes much.

avoid_extents_overwrite() prefer to allocates new chunk first.
If it failed because of no space or wrong used bytes(fsck-tests/004),
then it try to exclude metadata blocks but costs lots of time in
large filesystem.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 3649d570e11c..ea4019c32a3f 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -28,6 +28,8 @@
 #include "check/mode-common.h"
 #include "check/mode-lowmem.h"
 
+static u64 last_allocated_chunk;
+
 static int calc_extent_flag(struct btrfs_root *root, struct extent_buffer *eb,
u64 *flags_ret)
 {
@@ -491,6 +493,50 @@ static int try_to_force_cow_in_new_chunk(struct 
btrfs_fs_info *fs_info,
return ret;
 }
 
+static int avoid_extents_overwrite(struct btrfs_fs_info *fs_info)
+{
+   int ret;
+   int mixed = btrfs_fs_incompat(fs_info, MIXED_GROUPS);
+
+   if (fs_info->excluded_extents)
+   return 0;
+
+   if (last_allocated_chunk != (u64)-1) {
+   ret = try_to_force_cow_in_new_chunk(fs_info,
+   last_allocated_chunk, _allocated_chunk);
+   if (!ret)
+   goto out;
+   /*
+* If failed, do not try to allocate chunk again in
+* next call.
+* If there is no space left to allocate, try to exclude all
+* metadata blocks. Mixed filesystem is unsupported.
+*/
+   last_allocated_chunk = (u64)-1;
+   if (ret != -ENOSPC || mixed)
+   goto out;
+   }
+
+   printf(
+   "Try to exclude all metadata blcoks and extents, it may be slow\n");
+   ret = exclude_metadata_blocks(fs_info);
+out:
+   if (ret)
+   error("failed to avoid extents overwrite %s", strerror(-ret));
+   return ret;
+}
+
+static int end_avoid_extents_overwrite(struct btrfs_fs_info *fs_info)
+{
+   int ret = 0;
+
+   cleanup_excluded_extents(fs_info);
+   if (last_allocated_chunk)
+   ret = clear_block_groups_full(fs_info,
+   BTRFS_BLOCK_GROUP_METADATA);
+   return ret;
+}
+
 /*
  * This function only handles BACKREF_MISSING,
  * If corresponding extent item exists, increase the ref, else insert an extent
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 18/18] btrfs-progs: fsck-tests: add image no extent with normal device size

2018-02-07 Thread Su Yue
This new image only misses one extent which leads lowmem mode to
allocate new chunk in repair.
Original image renamed to no_extent_bad_dev.img should let lowmem mode
exclude blocks in repair.

Due to problems of btrfs-image, choose xz as compression tool.

Signed-off-by: Su Yue 
---
 tests/fsck-tests/014-no-extent-info/.lowmem_repairable  |   0
 tests/fsck-tests/014-no-extent-info/no_extent.raw.xz| Bin 0 -> 28084 bytes
 .../{default_case.img => no_extent_bad_dev.img} | Bin
 3 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 tests/fsck-tests/014-no-extent-info/.lowmem_repairable
 create mode 100644 tests/fsck-tests/014-no-extent-info/no_extent.raw.xz
 rename tests/fsck-tests/014-no-extent-info/{default_case.img => 
no_extent_bad_dev.img} (100%)

diff --git a/tests/fsck-tests/014-no-extent-info/.lowmem_repairable 
b/tests/fsck-tests/014-no-extent-info/.lowmem_repairable
new file mode 100644
index ..e69de29bb2d1
diff --git a/tests/fsck-tests/014-no-extent-info/no_extent.raw.xz 
b/tests/fsck-tests/014-no-extent-info/no_extent.raw.xz
new file mode 100644
index 
..6e568a9cf1f0a1d1bcd00222b07cf14d3c09afc5
GIT binary patch
literal 28084
zcmeHwRdglUlAV~Dr4lnUGcz+YlHTGm`hY*W@ct)W@gDf9*;jfYmHxzU(agW
z_wTOzlbMlmc0}y6&(3#_ADY@gKwt+8b>bjEM8LQ}KtM>7nq!}zeqgG4KtQ(dpP%`S
zpA!6%=nh;)N=@;U2l>H}|6{<94I+`Pat)!xJ5;c^W5@XJ0$yOAWwDqC7u{m$V#
zu1^h#K5*Z

[PATCH v4 08/18] btrfs-progs: lowmem check: exclude extents if init-extent-tree in lowmem

2018-02-07 Thread Su Yue
If options '--init-extent-tree' and '--mode=lowmem' are both
input, all metadata blocks will be traversed twice.
First one is done by pin_metadata_blocks() in reinit_extent_tree().
Second one is in check_chunks_and_extents_v2().

Excluding instead of pining metadata blocks before reinit extent tree
in lowmem can save some time.

Signed-off-by: Su Yue 
---
 check/mode-common.c | 27 ---
 check/mode-common.h |  2 +-
 check/mode-lowmem.c |  8 +++-
 cmds-check.c|  3 ++-
 4 files changed, 30 insertions(+), 10 deletions(-)

diff --git a/check/mode-common.c b/check/mode-common.c
index acceb24b9597..afe5f04d1deb 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -706,7 +706,7 @@ out:
  * Using fs and other trees to rebuild extent tree.
  */
 int reinit_extent_tree(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info)
+  struct btrfs_fs_info *fs_info, bool pin)
 {
u64 start = 0;
int ret;
@@ -728,13 +728,26 @@ int reinit_extent_tree(struct btrfs_trans_handle *trans,
 
/*
 * first we need to walk all of the trees except the extent tree and pin
-* down the bytes that are in use so we don't overwrite any existing
-* metadata.
+* down/exclude the bytes that are in use so we don't overwrite any
+* existing metadata.
+* If pin, unpin will be done in end of transaction.
+* If exclude, cleanup will be done in check_chunks_and_extents_lowmem.
 */
-   ret = pin_metadata_blocks(fs_info);
-   if (ret) {
-   fprintf(stderr, "error pinning down used bytes\n");
-   return ret;
+again:
+   if (pin) {
+   ret = pin_metadata_blocks(fs_info);
+   if (ret) {
+   fprintf(stderr, "error pinning down used bytes\n");
+   return ret;
+   }
+   } else {
+   ret = exclude_metadata_blocks(fs_info);
+   if (ret) {
+   fprintf(stderr, "error excluding used bytes\n");
+   printf("try to pin down used bytes\n");
+   pin = true;
+   goto again;
+   }
}
 
/*
diff --git a/check/mode-common.h b/check/mode-common.h
index e2a824a318c1..8af7dd3066ff 100644
--- a/check/mode-common.h
+++ b/check/mode-common.h
@@ -122,7 +122,7 @@ int check_child_node(struct extent_buffer *parent, int slot,
 void reset_cached_block_groups(struct btrfs_fs_info *fs_info);
 int zero_log_tree(struct btrfs_root *root);
 int reinit_extent_tree(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info);
+  struct btrfs_fs_info *fs_info, bool pin);
 int btrfs_fsck_reinit_root(struct btrfs_trans_handle *trans,
   struct btrfs_root *root, int overwrite);
 int fill_csum_tree(struct btrfs_trans_handle *trans,
diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index ea4019c32a3f..1e0545e6249d 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4860,8 +4860,14 @@ next:
}
 out:
 
-   /* if repair, update block accounting */
if (repair) {
+   ret = end_avoid_extents_overwrite(fs_info);
+   if (ret < 0)
+   ret = FATAL_ERROR;
+   err |= ret;
+
+   reset_cached_block_groups(fs_info);
+   /* update block accounting */
ret = btrfs_fix_block_accounting(trans, root);
if (ret)
err |= ret;
diff --git a/cmds-check.c b/cmds-check.c
index 28746712fac1..ed81fd3c22b4 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -453,7 +453,8 @@ int cmd_check(int argc, char **argv)
 
if (init_extent_tree) {
printf("Creating a new extent tree\n");
-   ret = reinit_extent_tree(trans, info);
+   ret = reinit_extent_tree(trans, info,
+check_mode == CHECK_MODE_ORIGINAL);
err |= !!ret;
if (ret)
goto close_out;
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 09/18] btrfs-progs: lowmem check: start to remove parameters @trans in lowmem

2018-02-07 Thread Su Yue
Since extents can be avoid overwrite by excluding or new chunk
allocation. It's unnessesary to do all repairs in one transaction.

This patch removes parameter @trans of repair_extent_data_item().
repair_extent_data_item() calls try_avoid_extents_overwrite()
and starts a transaction by itself.

Note: This patch and next patches cause error in lowmem repair like:
"Error: Commit_root already set when starting transaction".
This error will disappear after removing @trans finished.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 1e0545e6249d..446ea4a21bfa 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -2739,12 +2739,12 @@ out:
  *
  * Returns error bits after reapir.
  */
-static int repair_extent_data_item(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root,
+static int repair_extent_data_item(struct btrfs_root *root,
   struct btrfs_path *pathp,
   struct node_refs *nrefs,
   int err)
 {
+   struct btrfs_trans_handle *trans = NULL;
struct btrfs_file_extent_item *fi;
struct btrfs_key fi_key;
struct btrfs_key key;
@@ -2761,6 +2761,7 @@ static int repair_extent_data_item(struct 
btrfs_trans_handle *trans,
u64 file_offset;
int generation;
int slot;
+   int need_insert = 0;
int ret = 0;
 
eb = pathp->nodes[0];
@@ -2799,9 +2800,20 @@ static int repair_extent_data_item(struct 
btrfs_trans_handle *trans,
ret = -EIO;
goto out;
}
+   need_insert = ret;
 
+   ret = avoid_extents_overwrite(root->fs_info);
+   if (ret)
+   goto out;
+   trans = btrfs_start_transaction(root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   trans = NULL;
+   error("fail to start transaction %s", strerror(-ret));
+   goto out;
+   }
/* insert an extent item */
-   if (ret > 0) {
+   if (need_insert) {
key.objectid = disk_bytenr;
key.type = BTRFS_EXTENT_ITEM_KEY;
key.offset = num_bytes;
@@ -2841,6 +2853,8 @@ static int repair_extent_data_item(struct 
btrfs_trans_handle *trans,
 
err &= ~BACKREF_MISSING;
 out:
+   if (trans)
+   btrfs_commit_transaction(trans, root);
btrfs_release_path();
if (ret)
error("can't repair root %llu extent data item[%llu %llu]",
@@ -4117,8 +4131,7 @@ again:
case BTRFS_EXTENT_DATA_KEY:
ret = check_extent_data_item(root, path, nrefs, account_bytes);
if (repair && ret)
-   ret = repair_extent_data_item(trans, root, path, nrefs,
- ret);
+   ret = repair_extent_data_item(root, path, nrefs, ret);
err |= ret;
break;
case BTRFS_BLOCK_GROUP_ITEM_KEY:
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 10/18] btrfs-progs: lowmem check: remove parameter @trans of delete_extent_item()

2018-02-07 Thread Su Yue
This patch removes the parameter @trans of delete_extent_item().
It calls try_avoid_extents_overwrite() and starts a transaction by itself.

Note: This patch and next patches cause error in lowmem repair like:
"Error: Commit_root already set when starting transaction".
This error will disappear after removing @trans finished.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 446ea4a21bfa..272e658296e7 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4063,13 +4063,22 @@ out:
return err;
 }
 
-static int delete_extent_tree_item(struct btrfs_trans_handle *trans,
-  struct btrfs_root *root,
+static int delete_extent_tree_item(struct btrfs_root *root,
   struct btrfs_path *path)
 {
struct btrfs_key key;
+   struct btrfs_trans_handle *trans;
int ret = 0;
 
+   ret = avoid_extents_overwrite(root->fs_info);
+   if (ret)
+   return ret;
+   trans = btrfs_start_transaction(root, 1);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   error("fail to start transaction %s", strerror(-ret));
+   goto out;
+   }
btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
btrfs_release_path(path);
ret = btrfs_search_slot(trans, root, , path, -1, 1);
@@ -4087,6 +4096,7 @@ static int delete_extent_tree_item(struct 
btrfs_trans_handle *trans,
else
path->slots[0]--;
 out:
+   btrfs_commit_transaction(trans, root);
if (ret)
error("failed to delete root %llu item[%llu, %u, %llu]",
  root->objectid, key.objectid, key.type, key.offset);
@@ -4138,7 +4148,7 @@ again:
ret = check_block_group_item(fs_info, eb, slot);
if (repair &&
ret & REFERENCER_MISSING)
-   ret = delete_extent_tree_item(trans, root, path);
+   ret = delete_extent_tree_item(root, path);
err |= ret;
break;
case BTRFS_DEV_ITEM_KEY:
@@ -4169,7 +4179,7 @@ again:
   key.objectid, -1);
if (repair &&
ret & (REFERENCER_MISMATCH | REFERENCER_MISSING))
-   ret = delete_extent_tree_item(trans, root, path);
+   ret = delete_extent_tree_item(root, path);
err |= ret;
break;
case BTRFS_EXTENT_DATA_REF_KEY:
@@ -4182,7 +4192,7 @@ again:
btrfs_extent_data_ref_count(eb, dref));
if (repair &&
ret & (REFERENCER_MISMATCH | REFERENCER_MISSING))
-   ret = delete_extent_tree_item(trans, root, path);
+   ret = delete_extent_tree_item(root, path);
err |= ret;
break;
case BTRFS_SHARED_BLOCK_REF_KEY:
@@ -4190,7 +4200,7 @@ again:
 key.objectid, -1);
if (repair &&
ret & (REFERENCER_MISMATCH | REFERENCER_MISSING))
-   ret = delete_extent_tree_item(trans, root, path);
+   ret = delete_extent_tree_item(root, path);
err |= ret;
break;
case BTRFS_SHARED_DATA_REF_KEY:
@@ -4198,7 +4208,7 @@ again:
key.objectid);
if (repair &&
ret & (REFERENCER_MISMATCH | REFERENCER_MISSING))
-   ret = delete_extent_tree_item(trans, root, path);
+   ret = delete_extent_tree_item(root, path);
err |= ret;
break;
default:
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 04/18] btrfs-progs: lowmem check: exclude extents of metadata blocks

2018-02-07 Thread Su Yue
Commit d17d6663c99c ("btrfs-progs: lowmem check: Fix regression which
screws up extent allocator") removes pin_metadata_blocks() from
lowmem repair.
So we have to find another way to exclude extents which should be
occupied by tree blocks.

Introduce exclude_metadata_blocks() to mark extents of all tree
blocks dirty in fs_info->excluded_extents.
Export it since it will be used in lowmem too.

Signed-off-by: Su Yue 
---
 check/mode-common.c | 73 +
 check/mode-common.h |  2 ++
 2 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/check/mode-common.c b/check/mode-common.c
index e6d8ebe8b9b7..acceb24b9597 100644
--- a/check/mode-common.c
+++ b/check/mode-common.c
@@ -377,40 +377,54 @@ int zero_log_tree(struct btrfs_root *root)
return ret;
 }
 
-static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info,
-   struct extent_buffer *eb, int tree_root)
+static int traverse_tree_blocks(struct btrfs_fs_info *fs_info,
+   struct extent_buffer *eb, int tree_root,
+   int pin)
 {
struct extent_buffer *tmp;
struct btrfs_root_item *ri;
struct btrfs_key key;
+   struct extent_io_tree *tree;
u64 bytenr;
int level = btrfs_header_level(eb);
int nritems;
int ret;
int i;
+   u64 end = eb->start + eb->len;
 
+   if (pin)
+   tree = _info->pinned_extents;
+   else
+   tree = fs_info->excluded_extents;
/*
-* If we have pinned this block before, don't pin it again.
+* If we have pinned/excluded this block before, don't do it again.
 * This can not only avoid forever loop with broken filesystem
 * but also give us some speedups.
 */
-   if (test_range_bit(_info->pinned_extents, eb->start,
-  eb->start + eb->len - 1, EXTENT_DIRTY, 0))
+   if (test_range_bit(tree, eb->start, end - 1, EXTENT_DIRTY, 0))
return 0;
 
-   btrfs_pin_extent(fs_info, eb->start, eb->len);
+   if (pin)
+   btrfs_pin_extent(fs_info, eb->start, eb->len);
+   else
+   set_extent_dirty(tree, eb->start, end - 1);
 
nritems = btrfs_header_nritems(eb);
for (i = 0; i < nritems; i++) {
if (level == 0) {
+   bool is_extent_root;
btrfs_item_key_to_cpu(eb, , i);
if (key.type != BTRFS_ROOT_ITEM_KEY)
continue;
/* Skip the extent root and reloc roots */
-   if (key.objectid == BTRFS_EXTENT_TREE_OBJECTID ||
-   key.objectid == BTRFS_TREE_RELOC_OBJECTID ||
+   if (key.objectid == BTRFS_TREE_RELOC_OBJECTID ||
key.objectid == BTRFS_DATA_RELOC_TREE_OBJECTID)
continue;
+   is_extent_root =
+   key.objectid == BTRFS_EXTENT_TREE_OBJECTID;
+   /* If pin, skip the extent root */
+   if (pin && is_extent_root)
+   continue;
ri = btrfs_item_ptr(eb, i, struct btrfs_root_item);
bytenr = btrfs_disk_root_bytenr(eb, ri);
 
@@ -425,7 +439,7 @@ static int pin_down_tree_blocks(struct btrfs_fs_info 
*fs_info,
fprintf(stderr, "Error reading root block\n");
return -EIO;
}
-   ret = pin_down_tree_blocks(fs_info, tmp, 0);
+   ret = traverse_tree_blocks(fs_info, tmp, 0, pin);
free_extent_buffer(tmp);
if (ret)
return ret;
@@ -444,7 +458,8 @@ static int pin_down_tree_blocks(struct btrfs_fs_info 
*fs_info,
fprintf(stderr, "Error reading tree block\n");
return -EIO;
}
-   ret = pin_down_tree_blocks(fs_info, tmp, tree_root);
+   ret = traverse_tree_blocks(fs_info, tmp, tree_root,
+  pin);
free_extent_buffer(tmp);
if (ret)
return ret;
@@ -454,6 +469,12 @@ static int pin_down_tree_blocks(struct btrfs_fs_info 
*fs_info,
return 0;
 }
 
+static int pin_down_tree_blocks(struct btrfs_fs_info *fs_info,
+   struct extent_buffer *eb, int tree_root)
+{
+   return traverse_tree_blocks(fs_info, eb, tree_root, 1);
+}
+
 static int pin_metadata_blocks(struct btrfs_fs_info *fs_info)
 {
int ret;
@@ -465,6 +486,38 @@ static int pin_metadata_blocks(struct btrfs_fs_info 

[PATCH v4 03/18] btrfs-progs: lowmem check: assign @parent early in repair_extent_data_item()

2018-02-07 Thread Su Yue
The variable @eb is assigned to leaf in fs_tree before insertion of
backref. It will causes wrong parent of new inserted backref.

Set @parent at beginning solves the problem.

Reviewed-by: Qu Wenruo 
Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 18ec6db098e7..1fc84f1e8c44 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -2475,6 +2475,11 @@ static int repair_extent_data_item(struct 
btrfs_trans_handle *trans,
extent_offset = btrfs_file_extent_offset(eb, fi);
offset = file_offset - extent_offset;
 
+   if (nrefs->full_backref[0])
+   parent = btrfs_header_bytenr(eb);
+   else
+   parent = 0;
+
/* now repair only adds backref */
if ((err & BACKREF_MISSING) == 0)
return err;
@@ -2516,11 +2521,6 @@ static int repair_extent_data_item(struct 
btrfs_trans_handle *trans,
btrfs_release_path();
}
 
-   if (nrefs->full_backref[0])
-   parent = btrfs_header_bytenr(eb);
-   else
-   parent = 0;
-
ret = btrfs_inc_extent_ref(trans, root, disk_bytenr, num_bytes, parent,
   root->objectid,
   parent ? BTRFS_FIRST_FREE_OBJECTID : fi_key.objectid,
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 00/18] btrfs-progs: lowmem check: avoid extents overwrite

2018-02-07 Thread Su Yue
This patchset can be fetched from my github:
https://github.com/Damenly/btrfs-progs/tree/lowmem
based on unmerged patchset whose cover:
  [PATCH 0/3] btrfs-progs: Split original mode check to its own
  Author: Qu Wenruo 
  
I'm sorry to send patches based on unmerged patch if it let you feel
uncomfortable.
I think the three patches from Qu are good enough so I send it before
my vacation.

Patch[1-3] fix minor problems of lowmem repair.

Patch[4-8] introduce two ways to avoid extents overwrite:
1) Traverse trees and exclude all metadata blocks.
   It's time-inefficient for large filesystems.
2) Mark all existed chunks full, allocate new chunk for CoW and
   records chunk start.
   If the last allocated chunk is almost full, allocated a new one.
2) is More efficient than 1). However, it can't handle situations
like no space(fsck/004).
Lowmem repair will try method 2 first and then method 1.

Patch[9-17] remove parameters @trans in functions for lowmem repair.
They try to avoid extents overwrite if necessary and start
transactions by themselves.

Patch[18] adds a test image.
Those patches are mainly for lowmem repair. Original mode is not
influenced.

---
Changlog:
v4->v3:
 - Remove global enum extents_operation to simplify
   avoid_extents_overwrite() and its cleanup.
 - Rebase after work of check split.
 
v3->v2:
 - check_btrfs_root() returns FATAL_ERROR if check_fs_first_inode()
   failed. Thanks Nikolay Borisov.
 - Add function try_to_force_cow_in_new_chunk() and global u64
   varaiable to record start of the last allocated chunk.
 - Remove unused EXTENTS_PIN in enum lowmem_extents_operation.
 
v2->v1:
 - Let @err in check_btrfs_root() record err bits but excluded
   negative values.
 - Do not delete a line of code to release path after extent item'
   insertion in repair_extent_data_item().
 - Add patch[3].
 - Force CoW in new allocated chunk to avoid extents overwrite.
 - Remove parameters @trans in check_chunks_and_extents_v2() and
   related callees.
 - Repair functions for lowmem mode call try_avoid_extents_overwrite()
   and start transactions.
   
Su Yue (18):
  btrfs-progs: lowmem check: release path in repair_extent_data_item()
  btrfs-progs: lowmem check: record returned errors after
walk_down_tree_v2()
  btrfs-progs: lowmem check: assign @parent early in
repair_extent_data_item()
  btrfs-progs: lowmem check: exclude extents of metadata blocks
  btrfs-progs: lowmem check: introduce mark/clear_block_groups_full()
  btrfs-progs: lowmem check: introduce try_force_cow_in_new_chunk()
  btrfs-progs: lowmem check: introduce avoid_extents_overwrite()
  btrfs-progs: lowmem check: exclude extents if init-extent-tree in
lowmem
  btrfs-progs: lowmem check: start to remove parameters @trans in lowmem
  btrfs-progs: lowmem check: remove parameter @trans of
delete_extent_item()
  btrfs-progs: lowmem check: remove parameter @trans of
repair_chunk_item()
  btrfs-progs: lowmem check: remove parameter @trans of
repair_extent_item()
  btrfs-progs: lowmem check: remove parameter @trans of
check_leaf_items()
  btrfs-progs: lowmem check: remove parameter @trans of
repair_tree_back_ref()
  btrfs-progs: lowmem check: remove parameter @trans of
check_btrfs_root()
  btrfs-progs: lowmem check: introduce repair_block_accounting()
  btrfs-progs: lowmem check: end of removing parameters @trans in lowmem
  btrfs-progs: fsck-tests: add image no extent with normal device size

 check/mode-common.c| 100 +++-
 check/mode-common.h|   4 +-
 check/mode-lowmem.c| 560 +
 check/mode-lowmem.h|   1 +
 cmds-check.c   |   3 +-
 .../014-no-extent-info/.lowmem_repairable  |   0
 .../fsck-tests/014-no-extent-info/no_extent.raw.xz | Bin 0 -> 28084 bytes
 .../{default_case.img => no_extent_bad_dev.img}| Bin
 8 files changed, 561 insertions(+), 107 deletions(-)
 create mode 100644 tests/fsck-tests/014-no-extent-info/.lowmem_repairable
 create mode 100644 tests/fsck-tests/014-no-extent-info/no_extent.raw.xz
 rename tests/fsck-tests/014-no-extent-info/{default_case.img => 
no_extent_bad_dev.img} (100%)

-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 01/18] btrfs-progs: lowmem check: release path in repair_extent_data_item()

2018-02-07 Thread Su Yue
In repair_extent_data_item(), path is not be released if some
errors occurs which causes extent buffer leak.

So release path in end of the function.

Reviewed-by: Qu Wenruo 
Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 62bcf3d2e126..d168a3ddd5e5 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -2537,6 +2537,7 @@ static int repair_extent_data_item(struct 
btrfs_trans_handle *trans,
 
err &= ~BACKREF_MISSING;
 out:
+   btrfs_release_path();
if (ret)
error("can't repair root %llu extent data item[%llu %llu]",
  root->objectid, disk_bytenr, num_bytes);
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 02/18] btrfs-progs: lowmem check: record returned errors after walk_down_tree_v2()

2018-02-07 Thread Su Yue
In lowmem mode with '--repair', check_chunks_and_extents_v2()
will fix accounting in block groups and clear the error
bit BG_ACCOUNTING_ERROR.
However, return value of check_btrfs_root() doesn't contain error bits.

If extent tree is on error, lowmem repair always prints error and
returns nonzero value even the filesystem is fine after repair.

Introduce FATAL_ERROR for lowmem mode to represents negative return
values since negative and positive can't not be mixed in bits operations.

Then let check_btrfs_root() return error bits.

Signed-off-by: Su Yue 
---
 check/mode-lowmem.c | 10 +-
 check/mode-lowmem.h |  1 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index d168a3ddd5e5..18ec6db098e7 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -4215,7 +4215,7 @@ out:
  *otherwise means check fs tree(s) items relationship and
  *   @root MUST be a fs tree root.
  * Returns 0  represents OK.
- * Returns not 0  represents error.
+ * Returns >0 represents error bits.
  */
 static int check_btrfs_root(struct btrfs_trans_handle *trans,
struct btrfs_root *root, unsigned int ext_ref,
@@ -4238,7 +4238,7 @@ static int check_btrfs_root(struct btrfs_trans_handle 
*trans,
 */
ret = check_fs_first_inode(root, ext_ref);
if (ret < 0)
-   return ret;
+   return FATAL_ERROR;
}
 
 
@@ -4266,11 +4266,11 @@ static int check_btrfs_root(struct btrfs_trans_handle 
*trans,
ret = walk_down_tree(trans, root, , , ,
 ext_ref, check_all);
 
-   err |= !!ret;
-
+   if (ret > 0)
+   err |= ret;
/* if ret is negative, walk shall stop */
if (ret < 0) {
-   ret = err;
+   ret = err | FATAL_ERROR;
break;
}
 
diff --git a/check/mode-lowmem.h b/check/mode-lowmem.h
index 73d5799951b7..e7ba62e2413e 100644
--- a/check/mode-lowmem.h
+++ b/check/mode-lowmem.h
@@ -43,6 +43,7 @@
 #define DIR_INDEX_MISMATCH  (1<<19) /* INODE_INDEX found but not match */
 #define DIR_COUNT_AGAIN (1<<20) /* DIR isize should be recalculated */
 #define BG_ACCOUNTING_ERROR (1<<21) /* Block group accounting error */
+#define FATAL_ERROR (1<<22) /* Fatal bit for errno */
 
 /*
  * Error bit for low memory mode check.
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash when unraring large archives on btrfs-filesystem

2018-02-07 Thread Chris Murphy
Another way to test for this problem is one of the responses in that
lkml thread by Btrfs list regular Duncan, about tweaking the knobs
that handle dirty write caching. So you could try those suggested
tweaks first, rather than changing kernels.


https://lkml.org/lkml/2016/12/13/753


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IO Error (.snapshots is not a btrfs subvolume)

2018-02-07 Thread Chris Murphy
On Wed, Feb 7, 2018 at 6:26 PM, Nick Gilmour  wrote:
> Hi all,
>
> I have successfully restored a snapshot of root but now when I try to
> make a new snapshot I get this error:
> IO Error (.snapshots is not a btrfs subvolume).
> My snapshots were within @ which I renamed to @_old.
> What can I do now? How can I move the snapshots from @_old/ into @ and
> be able to make snapshots again?
>
> This is an excerpt of my subvolumes list:
>
> # btrfs subvolume list /
> ID 257 gen 175397 top level 5 path @_old
> ID 258 gen 175392 top level 5 path @pkg
> ID 260 gen 175447 top level 5 path @tmp
> ID 262 gen 19 top level 257 path @_old/var/lib/machines
> ID 268 gen 175441 top level 5 path @test
> ID 291 gen 175394 top level 257 path @_old/.snapshots
> ID 292 gen 1705 top level 291 path @_old/.snapshots/1/snapshot
> ...
>
> ID 3538 gen 175398 top level 291 path @_old/.snapshots/1594/snapshot
> ID 3540 gen 175447 top level 5 path @
>


This is a snapper behavior. It creates .snapshots as a subvolume and
then puts snapshots into that subvolume. If you snapshot a subvolume
that contains another subvolume, the nested subvolume is not snapshot,
instead a plain directory placeholder is created instead. So your
restored snapshot contains a .snapshot directory rather than a
.snapshot subvolume. Possibly if you delete the directory and create a
new subvolume .snapshot, the problem will be fixed.

I can't tell you how this will confuse snapper though, or how to
unconfuse it. It pretty much expects to be in control of all
snapshots, creation, deletion, and rollbacks. So if you do it manually
for whatever reason, I think it can confuse snapper.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crash when unraring large archives on btrfs-filesystem

2018-02-07 Thread Chris Murphy
On Wed, Feb 7, 2018 at 12:57 PM, Stefan Malte Schumacher <
s.schumac...@netcologne.de> wrote:

>
>
> Feb 5 21:25:12 mars kernel: [250116.605471] Node 0 active_anon:176kB
> inactive_anon:276kB active_file:14228752kB inactive_file:1631728kB
> unevictable:0kB isolated(anon):0kB isolated(file):4096kB mapped:9316kB
> dirty:1636856kB writeback:248kB shmem:84kB shmem_thp: 0kB
> shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
> pages_scanned:13631918 all_unreclaimable? no
>

How much RAM on the machine and how much swap available? This looks like a
lot of dirty data has accumulated, and then also there's swapping
happening. Both swap out and swap in.

>4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64 GNU/Linux

I don't know if this bears any relation to the upstream longterm 4.9.65,
but there are definitely many memory and btrfs changes between 4.9.66 and
4.9.80, including a deadlock when writing out freespace cache fix. I don't
know that this is related to your particular problem, there might be more
than one thing going on. But the easiest thing to until someone who
actually knows for sure (a developer with time to respond) is to just
upgrade the kernel and see if the problem goes away.

I did also find a similar problem related to the first problem, unclear if
it's the instigator, page allocation stalls for 12104ms, order:0,
mode:0x24200ca(GFP_HIGHUSER_MOVABLE), happening along with Btrfs. That
thread:

https://lkml.org/lkml/2016/12/13/529


---
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] Btrfs: expose bad chunks in sysfs

2018-02-07 Thread Qu Wenruo


On 2018年02月08日 06:57, Liu Bo wrote:
> On Tue, Feb 06, 2018 at 09:28:14AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2018年02月06日 07:15, Liu Bo wrote:
>>> Btrfs tries its best to tolerate write errors, but kind of silently
>>> (except some messages in kernel log).
>>>
>>> For raid1 and raid10, this is usually not a problem because there is a
>>> copy as backup, while for parity based raid setup, i.e. raid5 and
>>> raid6, the problem is that, if a write error occurs due to some bad
>>> sectors, one horizonal stripe becomes degraded and the number of write
>>> errors it can tolerate gets reduced by one, now if two disk fails,
>>> data may be lost forever.
>>>
>>> One way to mitigate the data loss pain is to expose 'bad chunks',
>>> i.e. degraded chunks, to users, so that they can use 'btrfs balance'
>>> to relocate the whole chunk and get the full raid6 protection again
>>> (if the relocation works).
>>>
>>> This introduces 'bad_chunks' in btrfs's per-fs sysfs directory.  Once
>>> a chunk of raid5 or raid6 becomes degraded, it will appear in
>>> 'bad_chunks'.
>>
>> Sysfs looks good.
>>
>> Although other systems uses their own interface to handle their status.
>> Mdadm uses /proc/mdstat to show such status, LVM uses lvdisplay/lvs.
>>
> 
> It's more like badblocks in md, instead of /proc/mdstat.

I see the point now.

> 
>> So here comes to a new sys-fs interface.
>>
>>>
>>> Signed-off-by: Liu Bo 
>>> ---
>>> - In this patch, 'bad chunks' is not persistent on disk, but it can be
>>>   added if it's thought to be a good idea.
>>
>> IHMO such bad chunks list can be built using existing dev status at
>> mount time.
>>
> 
> What dev status offers is counters, but here chunk info. is needed if
> we want balance to do relocation.  I'll think harder about how to use
> it.

In my opinion, if we get write error, relocation may help for a short
time, but as long as we're using the same device, it may happen again,
and the root fix will be replace the device.

> 
>> Although using dev status may cause extra problems like false alerts.
>>
>>> - This is lightly tested, comments are very welcome.
>>
>> Just checked the code, there are 2 concerns:
>>
>> 1) The way to remove bad chunk
>>Currently it can only be removed when the chunk is removed.
>>If any transient write error happened, the bad chunk will just be
>>there forever (if not removed)
>>
> 
> The fundamental assumption about write error is that filesystem should
> not get any transient write error, as the underlying layers in IO
> stack should do their best to get rid of transient write error.
> (probably I should add this to the patch log.)
> 
> So once we get a bad chunk, there is a real IO error, for now what I
> can think of is to use balance to create a new chunk to hold
> everything in the bad chunk and the new chunk has the full raid
> protection.

Then the problem is about the granularity.

If write error happens, should we just ignore that bad blocks, or the
whole device?

And in that case I prefer the latter.

> 
>>It seems to cause false alert.
>>
>>And extra logic to determine if it's a real bad chunk in kernel seems
>>a little complex and less flex.
>>(Maybe an interface to info userspace where problem happens is more
>> flex?)
>>
> 
> It depends on what users care about, when raid6 is in use, I think
> users would care how many disk failures btrfs could tolerate at any
> point, about bad chunks whether it's true or false, probably they
> don't care, they might think it'd help a lot if some operations could
> be done to get the system back to the protect level they want.
> 
>> 2) Bad chunk is only added when writing
>>Read routine should also be able to detect bad chunks, with better
>>accuracy.
>>
> 
> Do you mean a read error should also report bad chunk?
> Or am I misunderstanding your point?
> 
> Typically read failure would trigger reconstruction and a write for
> correction will be issued, then we could get bad chunks if correction
> write fails.

Right, I just forgot the fix procedure.

> 
>>>
>>>  fs/btrfs/ctree.h   |  8 +++
>>>  fs/btrfs/disk-io.c |  2 ++
>>>  fs/btrfs/extent-tree.c | 13 +++
>>>  fs/btrfs/raid56.c  | 59 
>>> --
>>>  fs/btrfs/sysfs.c   | 26 ++
>>>  fs/btrfs/volumes.c | 15 +++--
>>>  fs/btrfs/volumes.h |  2 ++
>>>  7 files changed, 121 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>>> index 13c260b..08aad65 100644
>>> --- a/fs/btrfs/ctree.h
>>> +++ b/fs/btrfs/ctree.h
>>> @@ -1101,6 +1101,9 @@ struct btrfs_fs_info {
>>> spinlock_t ref_verify_lock;
>>> struct rb_root block_tree;
>>>  #endif
>>> +
>>> +   struct list_head bad_chunks;
>>
>> Rbtree may be better here.
>>
>> Since iterating a list to remove bad chunk can sometimes be slow.
>>
> 
> At the point I wrote the patch, I thought bad chunk should be rare
> case so 

Re: [PATCH RFC] Btrfs: expose bad chunks in sysfs

2018-02-07 Thread Liu Bo
On Tue, Feb 06, 2018 at 09:28:14AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年02月06日 07:15, Liu Bo wrote:
> > Btrfs tries its best to tolerate write errors, but kind of silently
> > (except some messages in kernel log).
> > 
> > For raid1 and raid10, this is usually not a problem because there is a
> > copy as backup, while for parity based raid setup, i.e. raid5 and
> > raid6, the problem is that, if a write error occurs due to some bad
> > sectors, one horizonal stripe becomes degraded and the number of write
> > errors it can tolerate gets reduced by one, now if two disk fails,
> > data may be lost forever.
> > 
> > One way to mitigate the data loss pain is to expose 'bad chunks',
> > i.e. degraded chunks, to users, so that they can use 'btrfs balance'
> > to relocate the whole chunk and get the full raid6 protection again
> > (if the relocation works).
> > 
> > This introduces 'bad_chunks' in btrfs's per-fs sysfs directory.  Once
> > a chunk of raid5 or raid6 becomes degraded, it will appear in
> > 'bad_chunks'.
> 
> Sysfs looks good.
> 
> Although other systems uses their own interface to handle their status.
> Mdadm uses /proc/mdstat to show such status, LVM uses lvdisplay/lvs.
>

It's more like badblocks in md, instead of /proc/mdstat.

> So here comes to a new sys-fs interface.
> 
> > 
> > Signed-off-by: Liu Bo 
> > ---
> > - In this patch, 'bad chunks' is not persistent on disk, but it can be
> >   added if it's thought to be a good idea.
> 
> IHMO such bad chunks list can be built using existing dev status at
> mount time.
>

What dev status offers is counters, but here chunk info. is needed if
we want balance to do relocation.  I'll think harder about how to use
it.

> Although using dev status may cause extra problems like false alerts.
> 
> > - This is lightly tested, comments are very welcome.
> 
> Just checked the code, there are 2 concerns:
> 
> 1) The way to remove bad chunk
>Currently it can only be removed when the chunk is removed.
>If any transient write error happened, the bad chunk will just be
>there forever (if not removed)
>

The fundamental assumption about write error is that filesystem should
not get any transient write error, as the underlying layers in IO
stack should do their best to get rid of transient write error.
(probably I should add this to the patch log.)

So once we get a bad chunk, there is a real IO error, for now what I
can think of is to use balance to create a new chunk to hold
everything in the bad chunk and the new chunk has the full raid
protection.

>It seems to cause false alert.
> 
>And extra logic to determine if it's a real bad chunk in kernel seems
>a little complex and less flex.
>(Maybe an interface to info userspace where problem happens is more
> flex?)
>

It depends on what users care about, when raid6 is in use, I think
users would care how many disk failures btrfs could tolerate at any
point, about bad chunks whether it's true or false, probably they
don't care, they might think it'd help a lot if some operations could
be done to get the system back to the protect level they want.

> 2) Bad chunk is only added when writing
>Read routine should also be able to detect bad chunks, with better
>accuracy.
>

Do you mean a read error should also report bad chunk?
Or am I misunderstanding your point?

Typically read failure would trigger reconstruction and a write for
correction will be issued, then we could get bad chunks if correction
write fails.

> > 
> >  fs/btrfs/ctree.h   |  8 +++
> >  fs/btrfs/disk-io.c |  2 ++
> >  fs/btrfs/extent-tree.c | 13 +++
> >  fs/btrfs/raid56.c  | 59 
> > --
> >  fs/btrfs/sysfs.c   | 26 ++
> >  fs/btrfs/volumes.c | 15 +++--
> >  fs/btrfs/volumes.h |  2 ++
> >  7 files changed, 121 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> > index 13c260b..08aad65 100644
> > --- a/fs/btrfs/ctree.h
> > +++ b/fs/btrfs/ctree.h
> > @@ -1101,6 +1101,9 @@ struct btrfs_fs_info {
> > spinlock_t ref_verify_lock;
> > struct rb_root block_tree;
> >  #endif
> > +
> > +   struct list_head bad_chunks;
> 
> Rbtree may be better here.
> 
> Since iterating a list to remove bad chunk can sometimes be slow.
>

At the point I wrote the patch, I thought bad chunk should be rare
case so list search is fine, but now I'm not sure.

> > +   seqlock_t bc_lock;
> >  };
> >  
> >  static inline struct btrfs_fs_info *btrfs_sb(struct super_block *sb)
> > @@ -2568,6 +2571,11 @@ static inline gfp_t btrfs_alloc_write_mask(struct 
> > address_space *mapping)
> >  
> >  /* extent-tree.c */
> >  
> > +struct btrfs_bad_chunk {
> > +   u64 chunk_offset;
> 
> It would be better to have chunk_size to info user.
> Just chunk start won't tell user how serious the problem is.
>

Hmm, I don't understand what extra value chunk_size can offer.

> And 

IO Error (.snapshots is not a btrfs subvolume)

2018-02-07 Thread Nick Gilmour
Hi all,

I have successfully restored a snapshot of root but now when I try to
make a new snapshot I get this error:
IO Error (.snapshots is not a btrfs subvolume).
My snapshots were within @ which I renamed to @_old.
What can I do now? How can I move the snapshots from @_old/ into @ and
be able to make snapshots again?

This is an excerpt of my subvolumes list:

# btrfs subvolume list /
ID 257 gen 175397 top level 5 path @_old
ID 258 gen 175392 top level 5 path @pkg
ID 260 gen 175447 top level 5 path @tmp
ID 262 gen 19 top level 257 path @_old/var/lib/machines
ID 268 gen 175441 top level 5 path @test
ID 291 gen 175394 top level 257 path @_old/.snapshots
ID 292 gen 1705 top level 291 path @_old/.snapshots/1/snapshot
...

ID 3538 gen 175398 top level 291 path @_old/.snapshots/1594/snapshot
ID 3540 gen 175447 top level 5 path @


Regards,
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs-progs: ctree: Add extra level check for read_node_slot()

2018-02-07 Thread Qu Wenruo
Strangely, we have level check in btrfs_print_tree() while we don't have
the same check in read_node_slot().

That's to say, for the following corruption, btrfs_search_slot() or
btrfs_next_leaf() can return invalid leaf:

Parent eb:
  node XX level 1
  ^^^
  Child should be leaf (level 0)
  ...
  key (XXX XXX XXX) block YY

Child eb:
  leaf YY level 1
  ^^^
  Something went wrong now

And for the corrupted leaf returned, later caller can be screwed up
easily.

Although the root cause (powerloss, but still something wrong breaking
metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to
avoid SEGV.

Reported-by: Ralph Gauges 
Signed-off-by: Qu Wenruo 
---
changlog:
v2:
  Check if the extent buffer is up-to-date before checking its level to
  avoid possible NULL pointer access.
---
 ctree.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/ctree.c b/ctree.c
index 4fc33b14000a..430805e3043f 100644
--- a/ctree.c
+++ b/ctree.c
@@ -22,6 +22,7 @@
 #include "repair.h"
 #include "internal.h"
 #include "sizes.h"
+#include "messages.h"
 
 static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
  *root, struct btrfs_path *path, int level);
@@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct 
btrfs_key *key,
 struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
   struct extent_buffer *parent, int slot)
 {
+   struct extent_buffer *ret;
int level = btrfs_header_level(parent);
+
if (slot < 0)
return NULL;
if (slot >= btrfs_header_nritems(parent))
@@ -649,8 +652,19 @@ struct extent_buffer *read_node_slot(struct btrfs_fs_info 
*fs_info,
if (level == 0)
return NULL;
 
-   return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
+   ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
   btrfs_node_ptr_generation(parent, slot));
+   if (!extent_buffer_uptodate(ret))
+   return ERR_PTR(-EIO);
+
+   if (btrfs_header_level(ret) != level - 1) {
+   error("child eb corrupted: parent bytenr=%llu item=%d parent 
level=%d child level=%d",
+ btrfs_header_bytenr(parent), slot,
+ btrfs_header_level(parent), btrfs_header_level(ret));
+   free_extent_buffer(ret);
+   return ERR_PTR(-EIO);
+   }
+   return ret;
 }
 
 static int balance_level(struct btrfs_trans_handle *trans,
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[josef-btrfs:current-work 3/3] block/bio.c:1801:2: error: implicit declaration of function 'rq_qos_done_bio'; did you mean 'rq_qos_id'?

2018-02-07 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
current-work
head:   71fe7e0ab249e42c17f387951aa09de7cb362d35
commit: 71fe7e0ab249e42c17f387951aa09de7cb362d35 [3/3] current-work
config: x86_64-randconfig-x008-201805 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
git checkout 71fe7e0ab249e42c17f387951aa09de7cb362d35
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

   In file included from block/bio.c:20:0:
   include/linux/bio.h:521:55: warning: 'struct blkcg_gq' declared inside 
parameter list will not be visible outside of this definition or declaration
static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { 
return 0; }
  ^~~~
   block/bio.c: In function 'bio_endio':
>> block/bio.c:1801:2: error: implicit declaration of function 
>> 'rq_qos_done_bio'; did you mean 'rq_qos_id'? 
>> [-Werror=implicit-function-declaration]
 rq_qos_done_bio(bio->bi_disk->queue, bio);
 ^~~
 rq_qos_id
   In file included from block/bio.c:20:0:
   At top level:
   include/linux/bio.h:521:12: warning: 'bio_associate_blkg' defined but not 
used [-Wunused-function]
static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { 
return 0; }
   ^~
   cc1: some warnings being treated as errors
--
   In file included from include/linux/blkdev.h:21:0,
from include/linux/backing-dev.h:15,
from block/blk-core.c:16:
   include/linux/bio.h:521:55: warning: 'struct blkcg_gq' declared inside 
parameter list will not be visible outside of this definition or declaration
static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { 
return 0; }
  ^~~~
   block/blk-core.c: In function 'blk_requeue_request':
>> block/blk-core.c:1545:2: error: implicit declaration of function 
>> 'rq_qos_requeue'; did you mean 'wbt_requeue'? 
>> [-Werror=implicit-function-declaration]
 rq_qos_requeue(q, >issue_stat);
 ^~
 wbt_requeue
   block/blk-core.c: In function '__blk_put_request':
>> block/blk-core.c:1651:2: error: implicit declaration of function 
>> 'rq_qos_done'; did you mean 'rq_qos_add'? 
>> [-Werror=implicit-function-declaration]
 rq_qos_done(q, >issue_stat);
 ^~~
 rq_qos_add
   block/blk-core.c: In function 'blk_queue_bio':
>> block/blk-core.c:1943:12: error: implicit declaration of function 
>> 'rq_qos_throttle' [-Werror=implicit-function-declaration]
 wb_acct = rq_qos_throttle(q, bio, q->queue_lock);
   ^~~
>> block/blk-core.c:1953:3: error: implicit declaration of function 
>> 'rq_qos_cleanup'; did you mean 'rq_qos_add'? 
>> [-Werror=implicit-function-declaration]
  rq_qos_cleanup(q, wb_acct);
  ^~
  rq_qos_add
   block/blk-core.c: In function 'blk_start_request':
>> block/blk-core.c:2841:3: error: implicit declaration of function 
>> 'rq_qos_issue'; did you mean 'rq_qos_id'? 
>> [-Werror=implicit-function-declaration]
  rq_qos_issue(req->q, >issue_stat);
  ^~~~
  rq_qos_id
   In file included from include/linux/blkdev.h:21:0,
from include/linux/backing-dev.h:15,
from block/blk-core.c:16:
   At top level:
   include/linux/bio.h:521:12: warning: 'bio_associate_blkg' defined but not 
used [-Wunused-function]
static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { 
return 0; }
   ^~
   cc1: some warnings being treated as errors
--
   In file included from block/blk-sysfs.c:8:0:
   include/linux/bio.h:521:55: warning: 'struct blkcg_gq' declared inside 
parameter list will not be visible outside of this definition or declaration
static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { 
return 0; }
  ^~~~
   block/blk-sysfs.c: In function 'queue_wb_lat_show':
>> block/blk-sysfs.c:432:41: error: implicit declaration of function 
>> 'wbt_get_min_lat'; did you mean 'bdi_set_min_ratio'? 
>> [-Werror=implicit-function-declaration]
 return sprintf(page, "%llu\n", div_u64(wbt_get_min_lat(q), 1000));
^~~
bdi_set_min_ratio
   block/blk-sysfs.c: In function 'queue_wb_lat_store':
>> block/blk-sysfs.c:460:2: error: implicit declaration of function 
>> 'wbt_set_min_lat'; did you mean 'bdi_set_min_ratio'? 
>> [-Werror=implicit-function-declaration]
 wbt_set_min_lat(q, val);
 ^~~
 bdi_set_min_ratio
>> block/blk-sysfs.c:462:20: error: passing argument 1 of 'wbt_update_limits' 
>> from incompatible pointer type [-Werror=incompatible-pointer-types]
 wbt_update_limits(q);
  

[josef-btrfs:current-work 3/3] block/blk-wbt.c:1005:33: error: 'struct blkcg' has no member named 'css'

2018-02-07 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
current-work
head:   71fe7e0ab249e42c17f387951aa09de7cb362d35
commit: 71fe7e0ab249e42c17f387951aa09de7cb362d35 [3/3] current-work
config: i386-randconfig-x010-201805 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
git checkout 71fe7e0ab249e42c17f387951aa09de7cb362d35
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from include/linux/blkdev.h:21:0,
from include/linux/backing-dev.h:15,
from block/blk-wbt.c:24:
   include/linux/bio.h:521:55: warning: 'struct blkcg_gq' declared inside 
parameter list will not be visible outside of this definition or declaration
static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { 
return 0; }
  ^~~~
   block/blk-wbt.c: In function 'blkcg_qos_throttle':
>> block/blk-wbt.c:1005:33: error: 'struct blkcg' has no member named 'css'
 bio_associate_blkcg(bio, >css);
^~
   block/blk-wbt.c:1010:10: error: implicit declaration of function 
'blkg_lookup_create'; did you mean 'blk_lookup_devt'? 
[-Werror=implicit-function-declaration]
  blkg = blkg_lookup_create(blkcg, q);
 ^~
 blk_lookup_devt
   block/blk-wbt.c:1010:8: warning: assignment makes pointer from integer 
without a cast [-Wint-conversion]
  blkg = blkg_lookup_create(blkcg, q);
   ^
>> block/blk-wbt.c:1019:26: error: passing argument 2 of 'bio_associate_blkg' 
>> from incompatible pointer type [-Werror=incompatible-pointer-types]
 bio_associate_blkg(bio, blkg);
 ^~~~
   In file included from include/linux/blkdev.h:21:0,
from include/linux/backing-dev.h:15,
from block/blk-wbt.c:24:
   include/linux/bio.h:521:12: note: expected 'struct blkcg_gq *' but argument 
is of type 'struct blkcg_gq *'
static int bio_associate_blkg(struct bio *bio, struct blkcg_gq *blkg) { 
return 0; }
   ^~
>> block/blk-wbt.c:1030:26: error: 'struct bio' has no member named 
>> 'bi_issue_stat'
  blk_stat_set_issue(>bi_issue_stat, bio_sectors(bio));
 ^~
   block/blk-wbt.c: In function 'blkcg_qos_done_bio':
>> block/blk-wbt.c:1105:14: error: 'struct bio' has no member named 'bi_blkg'; 
>> did you mean 'bi_flags'?
 blkg = bio->bi_blkg;
 ^~~
 bi_flags
   block/blk-wbt.c:1112:26: error: 'struct bio' has no member named 
'bi_issue_stat'
 qos_record_time(qg, >bi_issue_stat, now);
 ^~
   block/blk-wbt.c: In function 'qos_set_min_lat_nsec':
   block/blk-wbt.c:1167:13: error: 'struct blkcg_gq' has no member named 
'parent'
 while (blkg->parent) {
^~
   block/blk-wbt.c:1168:44: error: 'struct blkcg_gq' has no member named 
'parent'
  struct qos_grp *this_qg = blkg_to_qg(blkg->parent);
   ^~
   block/blk-wbt.c:1170:14: error: 'struct blkcg_gq' has no member named 
'parent'
  blkg = blkg->parent;
 ^~
   block/blk-wbt.c: In function 'qos_set_limit':
   block/blk-wbt.c:1182:24: error: implicit declaration of function 
'css_to_blkcg'; did you mean 'qg_to_blkg'? 
[-Werror=implicit-function-declaration]
 struct blkcg *blkcg = css_to_blkcg(of_css(of));
   ^~~~
   qg_to_blkg
   block/blk-wbt.c:1182:24: warning: initialization makes pointer from integer 
without a cast [-Wint-conversion]
   block/blk-wbt.c:1186:23: error: storage size of 'ctx' isn't known
 struct blkg_conf_ctx ctx;
  ^~~
   block/blk-wbt.c:1193:8: error: implicit declaration of function 
'blkg_conf_prep'; did you mean 'blkg_to_pd'? 
[-Werror=implicit-function-declaration]
 ret = blkg_conf_prep(blkcg, _policy_qos, buf, );
   ^~
   blkg_to_pd
   block/blk-wbt.c:1217:2: error: implicit declaration of function 
'blkg_for_each_descendant_pre'; did you mean 'css_for_each_descendant_pre'? 
[-Werror=implicit-function-declaration]
 blkg_for_each_descendant_pre(blkg, pos_css, ctx.blkg)
 ^~~~
 css_for_each_descendant_pre
>> block/blk-wbt.c:1218:3: error: expected ';' before 'qos_set_min_lat_nsec'
  qos_set_min_lat_nsec(blkg, 1);
  ^~~~
>> block/blk-wbt.c:1221:2: error: implicit declaration of function 
>> 'blkg_conf_finish'; did you mean 'blkcg_qos_init'? 
>> [-Werror=implicit-function-declaration]
 blkg_conf_finish();
 ^~~~
 blkcg_qos_init
   block/blk-wbt.c:1186:23: warning: unused variable 'ctx' [-Wunused-variable]
 struct blkg_conf_ctx ctx;
  ^~~
   block/blk-wbt.c: In function 

Crash when unraring large archives on btrfs-filesystem

2018-02-07 Thread Stefan Malte Schumacher
Hello,


I have encountered what I think is a problem with btrfs, which causes
my file server to become unresponsive. But let‘s start with the basic
information:

uname -a = Linux mars 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2
(2018-01-04) x86_64 GNU/Linux

btrfs –version = btrfs-progs v4.7.3


Label: none uuid: 1609e4e1-4037-4d31-bf12-f84a691db5d8

Total devices 5 FS bytes used 7.15TiB

devid 1 size 3.64TiB used 2.90TiB path /dev/sda

devid 2 size 3.64TiB used 2.90TiB path /dev/sdb

devid 3 size 3.64TiB used 2.90TiB path /dev/sdc

devid 4 size 3.64TiB used 2.90TiB path /dev/sdd

devid 5 size 3.64TiB used 2.90TiB path /dev/sde


Data, RAID1: total=7.25TiB, used=7.14TiB

System, RAID1: total=40.00MiB, used=1.02MiB

Metadata, RAID1: total=9.00GiB, used=7.75GiB

GlobalReserve, single: total=512.00MiB, used=0.00B


The following entry in kern.log seems to be the point where it all
started and which causes me to believe that the problem is related to
btrfs. At that time the server was unraring

a large archive stored on the btrfs filesystem.


Feb 5 21:22:42 mars kernel: [249979.829318] BTRFS info (device sda):
The free space cache file (4701944807424) is invalid. skip it

Feb 5 21:22:42 mars kernel: [249979.829318]

Feb 5 21:25:12 mars kernel: [250090.149452] unrar: page allocation
stalls for 12104ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)

Feb 5 21:25:12 mars kernel: [250116.605420] [] ?
alloc_pages_vma+0xae/0x260

Feb 5 21:25:12 mars kernel: [250116.605422] [] ?
__read_swap_cache_async+0x118/0x1c0

Feb 5 21:25:12 mars kernel: [250116.605423] [] ?
read_swap_cache_async+0x24/0x60

Feb 5 21:25:12 mars kernel: [250116.605425] [] ?
swapin_readahead+0x1a9/0x210

Feb 5 21:25:12 mars kernel: [250116.605427] [] ?
radix_tree_lookup_slot+0x1e/0x50

Feb 5 21:25:12 mars kernel: [250116.605429] [] ?
find_get_entry+0x1b/0x100

Feb 5 21:25:12 mars kernel: [250116.605431] [] ?
pagecache_get_page+0x30/0x2b0

Feb 5 21:25:12 mars kernel: [250116.605434] [] ?
do_swap_page+0x2a3/0x750

Feb 5 21:25:12 mars kernel: [250116.605436] [] ?
handle_mm_fault+0x892/0x12d0

Feb 5 21:25:12 mars kernel: [250116.605438] [] ?
__do_page_fault+0x25c/0x500

Feb 5 21:25:12 mars kernel: [250116.605440] [] ?
page_fault+0x28/0x30

Feb 5 21:25:12 mars kernel: [250116.605442] [] ?
__get_user_8+0x1b/0x25

Feb 5 21:25:12 mars kernel: [250116.605445] [] ?
exit_robust_list+0x30/0x110

Feb 5 21:25:12 mars kernel: [250116.605447] [] ?
mm_release+0xf8/0x130

Feb 5 21:25:12 mars kernel: [250116.605449] [] ?
do_exit+0x150/0xae0

Feb 5 21:25:12 mars kernel: [250116.605450] [] ?
do_group_exit+0x3a/0xa0

Feb 5 21:25:12 mars kernel: [250116.605452] [] ?
get_signal+0x297/0x640

Feb 5 21:25:12 mars kernel: [250116.605454] [] ?
do_signal+0x36/0x6a0

Feb 5 21:25:12 mars kernel: [250116.605457] [] ?
exit_to_usermode_loop+0x71/0xb0

Feb 5 21:25:12 mars kernel: [250116.605459] [] ?
syscall_return_slowpath+0x54/0x60

Feb 5 21:25:12 mars kernel: [250116.605461] [] ?
system_call_fast_compare_end+0xb5/0xb7

Feb 5 21:25:12 mars kernel: [250116.605462] Mem-Info:

Feb 5 21:25:12 mars kernel: [250116.605466] active_anon:44
inactive_anon:69 isolated_anon:0

Feb 5 21:25:12 mars kernel: [250116.605466] active_file:3557188
inactive_file:407932 isolated_file:1024

Feb 5 21:25:12 mars kernel: [250116.605466] unevictable:0 dirty:409214
writeback:62 unstable:0

Feb 5 21:25:12 mars kernel: [250116.605466] slab_reclaimable:37022
slab_unreclaimable:10475

Feb 5 21:25:12 mars kernel: [250116.605466] mapped:2329 shmem:21
pagetables:3522 bounce:0

Feb 5 21:25:12 mars kernel: [250116.605466] free:34036 free_pcp:291 free_cma:0

Feb 5 21:25:12 mars kernel: [250116.605471] Node 0 active_anon:176kB
inactive_anon:276kB active_file:14228752kB inactive_file:1631728kB
unevictable:0kB isolated(anon):0kB isolated(file):4096kB mapped:9316kB
dirty:1636856kB writeback:248kB shmem:84kB shmem_thp: 0kB
shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB
pages_scanned:13631918 all_unreclaimable? no


Searching for "btrfs" in kern.log shows a lot of entries for kern.log
and kern.log.1 but but none before that point of time. I think that
there is a relation between upgrading to kernel 4.9.0.5 and the start
of these problems. What follows is the output of of "zless kern.log |
grep btrfs".

Feb  5 21:25:21 mars kernel: [250128.490899] Workqueue: writeback
wb_workfn (flush-btrfs-1)

Feb  5 21:25:21 mars kernel: [250128.490940]  [] ?
io_ctl_prepare_pages+0x4c/0x180 [btrfs]

Feb  5 21:25:21 mars kernel: [250128.490953]  [] ?
__load_free_space_cache+0x1eb/0x6d0 [btrfs]

Feb  5 21:25:21 mars kernel: [250128.490966]  [] ?
load_free_space_cache+0xe9/0x190 [btrfs]

Feb  5 21:25:21 mars kernel: [250128.490975]  [] ?
cache_block_group+0x1c2/0x3c0 [btrfs]

Feb  5 21:25:21 mars kernel: [250128.490989]  [] ?
find_free_extent+0x66d/0x10d0 [btrfs]

Feb  5 21:25:21 mars kernel: [250128.490999]  [] ?
btrfs_reserve_extent+0xa1/0x210 [btrfs]

Feb  5 21:25:21 mars kernel: [250128.491011]  [] ?

Re: [PATCH v4 2/3] btrfs-progs: introduce TEST_TOP for resources except binaries

2018-02-07 Thread David Sterba
On Tue, Feb 06, 2018 at 01:37:24PM +0800, Gu Jinxiang wrote:
> Use TEST_TOP for tests/common, Documentation, images, and internal
> binaries.

Well, the point of TEST_TOP was also to remove the /tests/ subdirectory
from the paths if it's inside git and to set it to the top directory
where the exported testsuite resides. I'm not sure if we should continue
this back-and-forth. The project idea was stated out tersly so the
implementation was left "as an exercise". The v4 is close to what I'd
liek to merge, so let's give it a v5 and if there will be only small
things to fix I'll update the patches at commit time.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/14] btrfs: Remove fs_info argument from btrfs_trans_release_metadata

2018-02-07 Thread Nikolay Borisov
All current callers of this function just get a reference to the
trans->fs_info member and pass it as the second argument. Collapse this
into the function itself. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 5ca4302c136c..3c3ed6e3d484 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -818,9 +818,11 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle 
*trans)
return should_end_transaction(trans);
 }
 
-static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info)
+static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans)
+
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
+
if (!trans->block_rsv) {
ASSERT(!trans->bytes_reserved);
return;
@@ -854,7 +856,7 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
return 0;
}
 
-   btrfs_trans_release_metadata(trans, info);
+   btrfs_trans_release_metadata(trans);
trans->block_rsv = NULL;
 
if (!list_empty(>new_bgs))
@@ -875,7 +877,7 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
must_run_delayed_refs = 2;
}
 
-   btrfs_trans_release_metadata(trans, info);
+   btrfs_trans_release_metadata(trans);
trans->block_rsv = NULL;
 
if (!list_empty(>new_bgs))
@@ -1968,7 +1970,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
return ret;
}
 
-   btrfs_trans_release_metadata(trans, fs_info);
+   btrfs_trans_release_metadata(trans);
trans->block_rsv = NULL;
 
cur_trans = trans->transaction;
@@ -2322,7 +2324,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
 scrub_continue:
btrfs_scrub_continue(fs_info);
 cleanup_transaction:
-   btrfs_trans_release_metadata(trans, fs_info);
+   btrfs_trans_release_metadata(trans);
btrfs_trans_release_chunk_metadata(trans);
trans->block_rsv = NULL;
btrfs_warn(fs_info, "Skipping commit of aborted transaction.");
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/14] btrfs: Don't pass fs_info to commit_cowonly_roots

2018-02-07 Thread Nikolay Borisov
We already pass a transaction handle which refrences the fs_info so
we can grab it from there. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index d58c4cf461f3..354143e6d440 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1160,9 +1160,9 @@ static int update_cowonly_root(struct btrfs_trans_handle 
*trans,
  * failures will cause the file system to go offline. We still need
  * to clean up the delayed refs.
  */
-static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info)
+static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct list_head *dirty_bgs = >transaction->dirty_bgs;
struct list_head *io_bgs = >transaction->io_bgs;
struct list_head *next;
@@ -1402,7 +1402,7 @@ static int qgroup_account_snapshot(struct 
btrfs_trans_handle *trans,
 * like chunk and root tree, as they won't affect qgroup.
 * And we don't write super to avoid half committed status.
 */
-   ret = commit_cowonly_roots(trans, fs_info);
+   ret = commit_cowonly_roots(trans);
if (ret)
goto out;
switch_commit_roots(trans->transaction, fs_info);
@@ -2202,7 +2202,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
goto scrub_continue;
}
 
-   ret = commit_cowonly_roots(trans, fs_info);
+   ret = commit_cowonly_roots(trans);
if (ret) {
mutex_unlock(_info->tree_log_mutex);
mutex_unlock(_info->reloc_mutex);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/14] btrfs: Remove fs_info argument from create_pending_snapshots/create_pending_snapshot

2018-02-07 Thread Nikolay Borisov
We already pass the trans handle which has a reference to fs_info to
create_pending_snapshot so we can refer to it directly. Doing this
obviates the need to pass the fs_info to create_pending_snapshots as
well. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 81143ac1d88d..abee26b269a1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1435,9 +1435,10 @@ static int qgroup_account_snapshot(struct 
btrfs_trans_handle *trans,
  * the creation of the pending snapshots, just return 0.
  */
 static noinline int create_pending_snapshot(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info,
   struct btrfs_pending_snapshot *pending)
 {
+
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_key key;
struct btrfs_root_item *new_root_item;
struct btrfs_root *tree_root = fs_info->tree_root;
@@ -1704,8 +1705,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 /*
  * create all the snapshots we've scheduled for creation
  */
-static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info)
+static noinline int create_pending_snapshots(struct btrfs_trans_handle *trans)
 {
struct btrfs_pending_snapshot *pending, *next;
struct list_head *head = >transaction->pending_snapshots;
@@ -1713,7 +1713,7 @@ static noinline int create_pending_snapshots(struct 
btrfs_trans_handle *trans,
 
list_for_each_entry_safe(pending, next, head, list) {
list_del(>list);
-   ret = create_pending_snapshot(trans, fs_info, pending);
+   ret = create_pending_snapshot(trans, pending);
if (ret)
break;
}
@@ -2110,7 +2110,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
 * deal with them in create_pending_snapshot(), which is the
 * core function of the snapshot creation.
 */
-   ret = create_pending_snapshots(trans, fs_info);
+   ret = create_pending_snapshots(trans);
if (ret) {
mutex_unlock(_info->reloc_mutex);
goto scrub_continue;
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/14] btrfs: Don't pass fs_info to __btrfs_run_delayed_items

2018-02-07 Thread Nikolay Borisov
We already pass the transactino handle, which contains a refrence to
the fs_info so grab it from there. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-inode.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 1c0bab4080a0..1305872bbff8 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1114,9 +1114,9 @@ __btrfs_commit_inode_delayed_items(struct 
btrfs_trans_handle *trans,
  * Returns < 0 on error and returns with an aborted transaction with any
  * outstanding delayed items cleaned up.
  */
-static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans,
-struct btrfs_fs_info *fs_info, int nr)
+static int __btrfs_run_delayed_items(struct btrfs_trans_handle *trans, int nr)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_delayed_root *delayed_root;
struct btrfs_delayed_node *curr_node, *prev_node;
struct btrfs_path *path;
@@ -1164,13 +1164,13 @@ static int __btrfs_run_delayed_items(struct 
btrfs_trans_handle *trans,
 int btrfs_run_delayed_items(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info)
 {
-   return __btrfs_run_delayed_items(trans, fs_info, -1);
+   return __btrfs_run_delayed_items(trans, -1);
 }
 
 int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info, int nr)
 {
-   return __btrfs_run_delayed_items(trans, fs_info, nr);
+   return __btrfs_run_delayed_items(trans, nr);
 }
 
 int btrfs_commit_inode_delayed_items(struct btrfs_trans_handle *trans,
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/14] btrfs: Remove fs_info argument from btrfs_create_pending_block_groups

2018-02-07 Thread Nikolay Borisov
It can be referenced from the passed transaciton so no point in
passing it as function argument. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   |  3 +--
 fs/btrfs/extent-tree.c | 10 +-
 fs/btrfs/transaction.c |  6 +++---
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1cc77c4bf3c3..9963b6caadeb 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2712,8 +2712,7 @@ int btrfs_remove_block_group(struct btrfs_trans_handle 
*trans,
 void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info);
 void btrfs_get_block_group_trimming(struct btrfs_block_group_cache *cache);
 void btrfs_put_block_group_trimming(struct btrfs_block_group_cache *cache);
-void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info);
+void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans);
 u64 btrfs_data_alloc_profile(struct btrfs_fs_info *fs_info);
 u64 btrfs_metadata_alloc_profile(struct btrfs_fs_info *fs_info);
 u64 btrfs_system_alloc_profile(struct btrfs_fs_info *fs_info);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index b079ebc1f842..99bfc628ab89 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3086,7 +3086,7 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle 
*trans,
 
if (run_all) {
if (!list_empty(>new_bgs))
-   btrfs_create_pending_block_groups(trans, fs_info);
+   btrfs_create_pending_block_groups(trans);
 
spin_lock(_refs->lock);
node = rb_first(_refs->href_root);
@@ -3686,7 +3686,7 @@ int btrfs_start_dirty_block_groups(struct 
btrfs_trans_handle *trans,
 * make sure all the block groups on our dirty list actually
 * exist
 */
-   btrfs_create_pending_block_groups(trans, fs_info);
+   btrfs_create_pending_block_groups(trans);
 
if (!path) {
path = btrfs_alloc_path();
@@ -4706,7 +4706,7 @@ static int do_chunk_alloc(struct btrfs_trans_handle 
*trans,
 */
if (trans->can_flush_pending_bgs &&
trans->chunk_bytes_reserved >= (u64)SZ_2M) {
-   btrfs_create_pending_block_groups(trans, fs_info);
+   btrfs_create_pending_block_groups(trans);
btrfs_trans_release_chunk_metadata(trans);
}
return ret;
@@ -10130,9 +10130,9 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info)
return ret;
 }
 
-void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info)
+void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_block_group_cache *block_group, *tmp;
struct btrfs_root *extent_root = fs_info->extent_root;
struct btrfs_block_group_item item;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 3c3ed6e3d484..82b7e5855119 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -860,7 +860,7 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
trans->block_rsv = NULL;
 
if (!list_empty(>new_bgs))
-   btrfs_create_pending_block_groups(trans, info);
+   btrfs_create_pending_block_groups(trans);
 
trans->delayed_ref_updates = 0;
if (!trans->sync) {
@@ -881,7 +881,7 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
trans->block_rsv = NULL;
 
if (!list_empty(>new_bgs))
-   btrfs_create_pending_block_groups(trans, info);
+   btrfs_create_pending_block_groups(trans);
 
btrfs_trans_release_chunk_metadata(trans);
 
@@ -1983,7 +1983,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
smp_wmb();
 
if (!list_empty(>new_bgs))
-   btrfs_create_pending_block_groups(trans, fs_info);
+   btrfs_create_pending_block_groups(trans);
 
ret = btrfs_run_delayed_refs(trans, fs_info, 0);
if (ret) {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/14] btrfs: Remove root argument of cleanup_transaction

2018-02-07 Thread Nikolay Borisov
The only thing the passed root is used for is:
1. get a reference to the fs_info and to
2. call trace_btrfs_transaction_commit.

We can achieve 1) by simply referring to the fs_info from passed trans
object. As far as 2) is concerned cleanup_transaction is called from
only one place and the 'root' argument passed is the one from the trans
handle. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 354143e6d440..b8fd9fe8a9c1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1866,10 +1866,9 @@ int btrfs_commit_transaction_async(struct 
btrfs_trans_handle *trans,
 }
 
 
-static void cleanup_transaction(struct btrfs_trans_handle *trans,
-   struct btrfs_root *root, int err)
+static void cleanup_transaction(struct btrfs_trans_handle *trans, int err)
 {
-   struct btrfs_fs_info *fs_info = root->fs_info;
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_transaction *cur_trans = trans->transaction;
DEFINE_WAIT(wait);
 
@@ -1909,7 +1908,7 @@ static void cleanup_transaction(struct btrfs_trans_handle 
*trans,
btrfs_put_transaction(cur_trans);
btrfs_put_transaction(cur_trans);
 
-   trace_btrfs_transaction_commit(root);
+   trace_btrfs_transaction_commit(trans->root);
 
if (current->journal_info == trans)
current->journal_info = NULL;
@@ -2330,7 +2329,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
btrfs_warn(fs_info, "Skipping commit of aborted transaction.");
if (current->journal_info == trans)
current->journal_info = NULL;
-   cleanup_transaction(trans, trans->root, ret);
+   cleanup_transaction(trans, ret);
 
return ret;
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/14] btrfs: Don't pass fs_info to commit_fs_roots

2018-02-07 Thread Nikolay Borisov
We already pass the transaction handle which has a reference to the
fs_info. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f24f05fb508e..d58c4cf461f3 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1256,9 +1256,9 @@ void btrfs_add_dead_root(struct btrfs_root *root)
 /*
  * update all the cowonly tree roots on disk
  */
-static noinline int commit_fs_roots(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info)
+static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_root *gang[8];
int i;
int ret;
@@ -1376,7 +1376,7 @@ static int qgroup_account_snapshot(struct 
btrfs_trans_handle *trans,
 */
mutex_lock(_info->tree_log_mutex);
 
-   ret = commit_fs_roots(trans, fs_info);
+   ret = commit_fs_roots(trans);
if (ret)
goto out;
ret = btrfs_qgroup_account_extents(trans, fs_info);
@@ -2162,7 +2162,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
 */
mutex_lock(_info->tree_log_mutex);
 
-   ret = commit_fs_roots(trans, fs_info);
+   ret = commit_fs_roots(trans);
if (ret) {
mutex_unlock(_info->tree_log_mutex);
mutex_unlock(_info->reloc_mutex);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/14] btrfs: Make btrfs_trans_release_metadata private to transaction.c

2018-02-07 Thread Nikolay Borisov
This function is only ever used in __btrfs_end_transaction and
btrfs_commit_transaction so there is no need to export it via header.
Let's move it closer to where it's used, make it static and remove it
from the header. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   |  2 --
 fs/btrfs/extent-tree.c | 18 --
 fs/btrfs/transaction.c | 19 +++
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aee4365e82ba..1cc77c4bf3c3 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2748,8 +2748,6 @@ void btrfs_delalloc_release_space(struct inode *inode,
struct extent_changeset *reserved, u64 start, u64 len);
 void btrfs_free_reserved_data_space_noquota(struct inode *inode, u64 start,
u64 len);
-void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info);
 void btrfs_trans_release_chunk_metadata(struct btrfs_trans_handle *trans);
 int btrfs_orphan_reserve_metadata(struct btrfs_trans_handle *trans,
  struct btrfs_inode *inode);
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index cc08e6af3542..b079ebc1f842 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5893,24 +5893,6 @@ static void release_global_block_rsv(struct 
btrfs_fs_info *fs_info)
WARN_ON(fs_info->delayed_block_rsv.reserved > 0);
 }
 
-void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info)
-{
-   if (!trans->block_rsv) {
-   ASSERT(!trans->bytes_reserved);
-   return;
-   }
-
-   if (!trans->bytes_reserved)
-   return;
-
-   ASSERT(trans->block_rsv == _info->trans_block_rsv);
-   trace_btrfs_space_reservation(fs_info, "transaction",
- trans->transid, trans->bytes_reserved, 0);
-   btrfs_block_rsv_release(fs_info, trans->block_rsv,
-   trans->bytes_reserved);
-   trans->bytes_reserved = 0;
-}
 
 /*
  * To be called after all the new block groups attached to the transaction
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 2141587195d4..beca25635787 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -818,6 +818,25 @@ int btrfs_should_end_transaction(struct btrfs_trans_handle 
*trans)
return should_end_transaction(trans);
 }
 
+static void btrfs_trans_release_metadata(struct btrfs_trans_handle *trans,
+ struct btrfs_fs_info *fs_info)
+{
+   if (!trans->block_rsv) {
+   ASSERT(!trans->bytes_reserved);
+   return;
+   }
+
+   if (!trans->bytes_reserved)
+   return;
+
+   ASSERT(trans->block_rsv == _info->trans_block_rsv);
+   trace_btrfs_space_reservation(fs_info, "transaction",
+ trans->transid, trans->bytes_reserved, 0);
+   btrfs_block_rsv_release(fs_info, trans->block_rsv,
+   trans->bytes_reserved);
+   trans->bytes_reserved = 0;
+}
+
 static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
   int throttle)
 {
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/14] btrfs: Don't pass fs_info to btrfs_run_delayed_items/_nr

2018-02-07 Thread Nikolay Borisov
We already pass the transaction which has a reference to the fs_info,
so use that. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/delayed-inode.c |  6 ++
 fs/btrfs/delayed-inode.h |  6 ++
 fs/btrfs/extent-tree.c   |  2 +-
 fs/btrfs/transaction.c   |  8 
 fs/btrfs/tree-log.c  | 12 
 5 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 1305872bbff8..86cc0f5b0435 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1161,14 +1161,12 @@ static int __btrfs_run_delayed_items(struct 
btrfs_trans_handle *trans, int nr)
return ret;
 }
 
-int btrfs_run_delayed_items(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info)
+int btrfs_run_delayed_items(struct btrfs_trans_handle *trans)
 {
return __btrfs_run_delayed_items(trans, -1);
 }
 
-int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info, int nr)
+int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans, int nr)
 {
return __btrfs_run_delayed_items(trans, nr);
 }
diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
index c4189d495934..ae893d85224f 100644
--- a/fs/btrfs/delayed-inode.h
+++ b/fs/btrfs/delayed-inode.h
@@ -111,10 +111,8 @@ int btrfs_delete_delayed_dir_index(struct 
btrfs_trans_handle *trans,
 
 int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode);
 
-int btrfs_run_delayed_items(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info);
-int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info, int nr);
+int btrfs_run_delayed_items(struct btrfs_trans_handle *trans);
+int btrfs_run_delayed_items_nr(struct btrfs_trans_handle *trans, int nr);
 
 void btrfs_balance_delayed_items(struct btrfs_fs_info *fs_info);
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 47c27fc403b9..52cb4eb12318 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4994,7 +4994,7 @@ static void flush_space(struct btrfs_fs_info *fs_info,
ret = PTR_ERR(trans);
break;
}
-   ret = btrfs_run_delayed_items_nr(trans, fs_info, nr);
+   ret = btrfs_run_delayed_items_nr(trans, nr);
btrfs_end_transaction(trans);
break;
case FLUSH_DELALLOC:
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index aef311531ab2..f24f05fb508e 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1529,7 +1529,7 @@ static noinline int create_pending_snapshot(struct 
btrfs_trans_handle *trans,
 * otherwise we corrupt the FS during
 * snapshot
 */
-   ret = btrfs_run_delayed_items(trans, fs_info);
+   ret = btrfs_run_delayed_items(trans);
if (ret) {  /* Transaction aborted */
btrfs_abort_transaction(trans, ret);
goto fail;
@@ -2066,7 +2066,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
if (ret)
goto cleanup_transaction;
 
-   ret = btrfs_run_delayed_items(trans, fs_info);
+   ret = btrfs_run_delayed_items(trans);
if (ret)
goto cleanup_transaction;
 
@@ -2074,7 +2074,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
   extwriter_counter_read(cur_trans) == 0);
 
/* some pending stuffs might be added after the previous flush. */
-   ret = btrfs_run_delayed_items(trans, fs_info);
+   ret = btrfs_run_delayed_items(trans);
if (ret)
goto cleanup_transaction;
 
@@ -2127,7 +2127,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
 * because all the tree which are snapshoted will be forced to COW
 * the nodes and leaves.
 */
-   ret = btrfs_run_delayed_items(trans, fs_info);
+   ret = btrfs_run_delayed_items(trans);
if (ret) {
mutex_unlock(_info->reloc_mutex);
goto scrub_continue;
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index df8e76d01dbe..2fbe49a04933 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -852,7 +852,6 @@ static noinline int drop_one_dir_item(struct 
btrfs_trans_handle *trans,
  struct btrfs_inode *dir,
  struct btrfs_dir_item *di)
 {
-   struct btrfs_fs_info *fs_info = root->fs_info;
struct inode *inode;
char *name;
int name_len;
@@ -886,7 +885,7 @@ static noinline int drop_one_dir_item(struct 
btrfs_trans_handle *trans,
if (ret)
goto out;
else
-   ret = btrfs_run_delayed_items(trans, fs_info);
+   ret = 

[PATCH 05/14] btrfs: Don't pass fs_info arg to btrfs_start_dirty_block_groups

2018-02-07 Thread Nikolay Borisov
It can be referenced from the passed transaction so no point in passing
it as a function argument. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.h   | 3 +--
 fs/btrfs/extent-tree.c | 4 ++--
 fs/btrfs/transaction.c | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 9963b6caadeb..f929685b80e2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2690,8 +2690,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
 u64 bytenr, u64 num_bytes, u64 parent,
 u64 root_objectid, u64 owner, u64 offset);
 
-int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info);
+int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans);
 int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans,
   struct btrfs_fs_info *fs_info);
 int btrfs_setup_space_cache(struct btrfs_trans_handle *trans,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 99bfc628ab89..47c27fc403b9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3660,9 +3660,9 @@ int btrfs_setup_space_cache(struct btrfs_trans_handle 
*trans,
  * the commit latency by getting rid of the easy block groups while
  * we're still allowing others to join the commit.
  */
-int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
-  struct btrfs_fs_info *fs_info)
+int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_block_group_cache *cache;
struct btrfs_transaction *cur_trans = trans->transaction;
int ret = 0;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 82b7e5855119..aef311531ab2 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2014,7 +2014,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
mutex_unlock(_info->ro_block_group_mutex);
 
if (run_it)
-   ret = btrfs_start_dirty_block_groups(trans, fs_info);
+   ret = btrfs_start_dirty_block_groups(trans);
}
if (ret) {
btrfs_end_transaction(trans);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/14] btrfs: Remove fs_info argument from switch_commit_roots

2018-02-07 Thread Nikolay Borisov
We already have the fs_info from the passed transaction so use it
directly. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b8fd9fe8a9c1..81143ac1d88d 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -126,9 +126,9 @@ static void clear_btree_io_tree(struct extent_io_tree *tree)
spin_unlock(>lock);
 }
 
-static noinline void switch_commit_roots(struct btrfs_transaction *trans,
-struct btrfs_fs_info *fs_info)
+static noinline void switch_commit_roots(struct btrfs_transaction *trans)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct btrfs_root *root, *tmp;
 
down_write(_info->commit_root_sem);
@@ -1405,7 +1405,7 @@ static int qgroup_account_snapshot(struct 
btrfs_trans_handle *trans,
ret = commit_cowonly_roots(trans);
if (ret)
goto out;
-   switch_commit_roots(trans->transaction, fs_info);
+   switch_commit_roots(trans->transaction);
ret = btrfs_write_and_wait_transaction(trans, fs_info);
if (ret)
btrfs_handle_fs_error(fs_info, ret,
@@ -2233,7 +2233,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
list_add_tail(_info->chunk_root->dirty_list,
  _trans->switch_commits);
 
-   switch_commit_roots(cur_trans, fs_info);
+   switch_commit_roots(cur_trans);
 
ASSERT(list_empty(_trans->dirty_bgs));
ASSERT(list_empty(_trans->io_bgs));
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/14] btrfs: Remove fs_info argument of btrfs_write_and_wait_transaction

2018-02-07 Thread Nikolay Borisov
We already pass btrfs_trans_handle which contains a reference to the
fs_info so use that. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index b8dbe4e88631..a57065f022ff 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1091,10 +1091,10 @@ int btrfs_wait_tree_log_extents(struct btrfs_root 
*log_root, int mark)
  *
  * @trans: transaction whose dirty pages we'd like to write
  */
-static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info)
+static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans)
 {
struct extent_io_tree *dirty_pages = >transaction->dirty_pages;
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct blk_plug plug;
int ret, ret2;
 
@@ -1406,7 +1406,7 @@ static int qgroup_account_snapshot(struct 
btrfs_trans_handle *trans,
if (ret)
goto out;
switch_commit_roots(trans->transaction);
-   ret = btrfs_write_and_wait_transaction(trans, fs_info);
+   ret = btrfs_write_and_wait_transaction(trans);
if (ret)
btrfs_handle_fs_error(fs_info, ret,
"Error while writing out transaction for qgroup");
@@ -2260,7 +2260,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
 
wake_up(_info->transaction_wait);
 
-   ret = btrfs_write_and_wait_transaction(trans, fs_info);
+   ret = btrfs_write_and_wait_transaction(trans);
if (ret) {
btrfs_handle_fs_error(fs_info, ret,
  "Error while writing out transaction");
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/14] btrfs: Remove fs_info argument from btrfs_update_commit_device_bytes_used

2018-02-07 Thread Nikolay Borisov
We already pass the btrfs_transaction which references fs_info so no
need to pass the later as an argument. Also use the opportunity to
shorten transaction->trans. No functional changes

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c |  2 +-
 fs/btrfs/volumes.c | 10 +-
 fs/btrfs/volumes.h |  3 +--
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index abee26b269a1..b8dbe4e88631 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2245,7 +2245,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans)
   sizeof(*fs_info->super_copy));
 
btrfs_update_commit_device_size(fs_info);
-   btrfs_update_commit_device_bytes_used(fs_info, cur_trans);
+   btrfs_update_commit_device_bytes_used(cur_trans);
 
clear_bit(BTRFS_FS_LOG1_ERR, _info->flags);
clear_bit(BTRFS_FS_LOG2_ERR, _info->flags);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 71f9abd44f21..c61fef86538d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5229,7 +5229,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 
logical, u64 len)
/*
 * There could be two corrupted data stripes, we need
 * to loop retry in order to rebuild the correct data.
-* 
+*
 * Fail a stripe at a time on every retry except the
 * stripe under reconstruction.
 */
@@ -7387,20 +7387,20 @@ void btrfs_update_commit_device_size(struct 
btrfs_fs_info *fs_info)
 }
 
 /* Must be invoked during the transaction commit */
-void btrfs_update_commit_device_bytes_used(struct btrfs_fs_info *fs_info,
-   struct btrfs_transaction *transaction)
+void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans)
 {
+   struct btrfs_fs_info *fs_info = trans->fs_info;
struct extent_map *em;
struct map_lookup *map;
struct btrfs_device *dev;
int i;
 
-   if (list_empty(>pending_chunks))
+   if (list_empty(>pending_chunks))
return;
 
/* In order to kick the device replace finish process */
mutex_lock(_info->chunk_mutex);
-   list_for_each_entry(em, >pending_chunks, list) {
+   list_for_each_entry(em, >pending_chunks, list) {
map = em->map_lookup;
 
for (i = 0; i < map->num_stripes; i++) {
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index ca6640445a88..8692b40036d6 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -569,8 +569,7 @@ static inline enum btrfs_raid_types 
btrfs_bg_flags_to_raid_index(u64 flags)
 }
 
 void btrfs_update_commit_device_size(struct btrfs_fs_info *fs_info);
-void btrfs_update_commit_device_bytes_used(struct btrfs_fs_info *fs_info,
-   struct btrfs_transaction *transaction);
+void btrfs_update_commit_device_bytes_used(struct btrfs_transaction *trans);
 
 struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/14] Misc transaction cleanups

2018-02-07 Thread Nikolay Borisov
Here are a bunch of transaction-related cleanups all of them present no 
functional changes. The first 2 patches could be more interesting - the first
one moves trans_release_metadata to transaction.c and makes it static and the 
second one opencodes btrfs_write_and_wait_marked_extents in its sole caller
to make the chall chain shorter. The rest of the patches just kill the 
extraneous fs_info argument since they also take either a btrfs_trans_handle or
btrfs_transaction pointer which already contain fs_info. 

The modified functions are all called from btrfs_commit_transaction. With this
series applied the only function which remain that still take fs_info and 
some type of transaction reference are: 

btrfs_finish_extent_commit
btrfs_qgroup_account_extents
btrfs_run_delayed_refs


The reason I haven't touched them is that David expressed some reservation 
about mass cleaning of functions which are more or less public interface. And 
the above 3 are such functions. David if you don't objec to converting those 3
I will keep them in mind when doing further cleanups in the transaction area. 

Nikolay Borisov (14):
  btrfs: Make btrfs_trans_release_metadata private to transaction.c
  btrfs: Open code btrfs_write_and_wait_marked_extents
  btrfs: Remove fs_info argument from btrfs_trans_release_metadata
  btrfs: Remove fs_info argument from btrfs_create_pending_block_groups
  btrfs: Don't pass fs_info arg to btrfs_start_dirty_block_groups
  btrfs: Don't pass fs_info to __btrfs_run_delayed_items
  btrfs: Don't pass fs_info to btrfs_run_delayed_items/_nr
  btrfs: Don't pass fs_info to commit_fs_roots
  btrfs: Don't pass fs_info to commit_cowonly_roots
  btrfs: Remove root argument of cleanup_transaction
  btrfs: Remove fs_info argument from switch_commit_roots
  btrfs: Remove fs_info argument from
create_pending_snapshots/create_pending_snapshot
  btrfs: Remove fs_info argument from
btrfs_update_commit_device_bytes_used
  btrfs: Remove fs_info argument of btrfs_write_and_wait_transaction

 fs/btrfs/ctree.h |   8 +--
 fs/btrfs/delayed-inode.c |  14 +++--
 fs/btrfs/delayed-inode.h |   6 +--
 fs/btrfs/extent-tree.c   |  34 +++-
 fs/btrfs/transaction.c   | 134 ++-
 fs/btrfs/tree-log.c  |  12 ++---
 fs/btrfs/volumes.c   |  10 ++--
 fs/btrfs/volumes.h   |   3 +-
 8 files changed, 101 insertions(+), 120 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/14] btrfs: Open code btrfs_write_and_wait_marked_extents

2018-02-07 Thread Nikolay Borisov
btrfs_write_and_wait_transaction is essentially a wrapper of
btrfs_write_and_wait_marked_extents with the addition of calling
clear_btree_io_tree. Having the code split doesn't really bring any
benefit. Open code the later into the former and add proper
documentation header.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/transaction.c | 40 
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index beca25635787..5ca4302c136c 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1082,41 +1082,33 @@ int btrfs_wait_tree_log_extents(struct btrfs_root 
*log_root, int mark)
return err;
 }
 
-/*
- * when btree blocks are allocated, they have some corresponding bits set for
- * them in one of two extent_io trees.  This is used to make sure all of
- * those extents are on disk for transaction or log commit
+
+/* btrfs_write_and_wait_transaction - When btree blocks are allocated the
+ * corresponding extents are marked dirty. This function ensures such extents
+ * are persisted on disk for  * transaction or log commit.
+ *
+ * @trans: transaction whose dirty pages we'd like to write
  */
-static int btrfs_write_and_wait_marked_extents(struct btrfs_fs_info *fs_info,
-   struct extent_io_tree *dirty_pages, int mark)
+static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info)
 {
-   int ret;
-   int ret2;
+   struct extent_io_tree *dirty_pages = >transaction->dirty_pages;
struct blk_plug plug;
+   int ret, ret2;
 
blk_start_plug();
-   ret = btrfs_write_marked_extents(fs_info, dirty_pages, mark);
+   ret = btrfs_write_marked_extents(fs_info, dirty_pages, EXTENT_DIRTY);
blk_finish_plug();
ret2 = btrfs_wait_extents(fs_info, dirty_pages);
 
+   clear_btree_io_tree(>transaction->dirty_pages);
+
if (ret)
return ret;
-   if (ret2)
+   else if (ret2)
return ret2;
-   return 0;
-}
-
-static int btrfs_write_and_wait_transaction(struct btrfs_trans_handle *trans,
-   struct btrfs_fs_info *fs_info)
-{
-   int ret;
-
-   ret = btrfs_write_and_wait_marked_extents(fs_info,
-  >transaction->dirty_pages,
-  EXTENT_DIRTY);
-   clear_btree_io_tree(>transaction->dirty_pages);
-
-   return ret;
+   else
+   return 0;
 }
 
 /*
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: ctree: Add extra level check for read_node_slot()

2018-02-07 Thread Qu Wenruo


On 2018年02月07日 22:35, David Sterba wrote:
> On Wed, Feb 07, 2018 at 05:18:25PM +0800, Qu Wenruo wrote:
>> Strangely, we have level check in btrfs_print_tree() while we don't have
>> the same check in read_node_slot().
>>
>> That's to say, for the following corruption, btrfs_search_slot() or
>> btrfs_next_leaf() can return invalid leaf:
>>
>> Parent eb:
>>   node XX level 1
>>   ^^^
>>   Child should be leaf (level 0)
>>   ...
>>   key (XXX XXX XXX) block YY
>>
>> Child eb:
>>   leaf YY level 1
>>   ^^^
>>   Something went wrong now
>>
>> And for the corrupted leaf returned, later caller can be screwed up
>> easily.
>>
>> Although the root cause (powerloss, but still something wrong breaking
>> metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to
>> avoid SEGV.
>>
>> Reported-by: Ralph Gauges 
>> Signed-off-by: Qu Wenruo 
>> ---
>>  ctree.c | 13 -
>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/ctree.c b/ctree.c
>> index 4fc33b14000a..ddb1e9cc6d37 100644
>> --- a/ctree.c
>> +++ b/ctree.c
>> @@ -22,6 +22,7 @@
>>  #include "repair.h"
>>  #include "internal.h"
>>  #include "sizes.h"
>> +#include "messages.h"
>>  
>>  static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
>>*root, struct btrfs_path *path, int level);
>> @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct 
>> btrfs_key *key,
>>  struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
>> struct extent_buffer *parent, int slot)
>>  {
>> +struct extent_buffer *ret;
>>  int level = btrfs_header_level(parent);
>> +
>>  if (slot < 0)
>>  return NULL;
>>  if (slot >= btrfs_header_nritems(parent))
>> @@ -649,8 +652,16 @@ struct extent_buffer *read_node_slot(struct 
>> btrfs_fs_info *fs_info,
>>  if (level == 0)
>>  return NULL;
>>  
>> -return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
>> +ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
> 
> The result of read_tree_block should be checked before use, by
> extent_buffer_uptodate, null check or IS_ERR at least (depending on the
> context of use).

Right, just forgot that.

Will fix it in next version.

Thanks,
Qu

> 
>> btrfs_node_ptr_generation(parent, slot));
>> +if (btrfs_header_level(ret) != level - 1) {
>> +error("child eb corrupted: parent bytenr=%llu item=%d parent 
>> level=%d child level=%d",
>> +  btrfs_header_bytenr(parent), slot,
>> +  btrfs_header_level(parent), btrfs_header_level(ret));
>> +free_extent_buffer(ret);
>> +return ERR_PTR(-EIO);
>> +}
>> +return ret;
>>  }
>>  
>>  static int balance_level(struct btrfs_trans_handle *trans,
>> -- 
>> 2.16.1
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs-progs: ctree: Add extra level check for read_node_slot()

2018-02-07 Thread David Sterba
On Wed, Feb 07, 2018 at 05:18:25PM +0800, Qu Wenruo wrote:
> Strangely, we have level check in btrfs_print_tree() while we don't have
> the same check in read_node_slot().
> 
> That's to say, for the following corruption, btrfs_search_slot() or
> btrfs_next_leaf() can return invalid leaf:
> 
> Parent eb:
>   node XX level 1
>   ^^^
>   Child should be leaf (level 0)
>   ...
>   key (XXX XXX XXX) block YY
> 
> Child eb:
>   leaf YY level 1
>   ^^^
>   Something went wrong now
> 
> And for the corrupted leaf returned, later caller can be screwed up
> easily.
> 
> Although the root cause (powerloss, but still something wrong breaking
> metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to
> avoid SEGV.
> 
> Reported-by: Ralph Gauges 
> Signed-off-by: Qu Wenruo 
> ---
>  ctree.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/ctree.c b/ctree.c
> index 4fc33b14000a..ddb1e9cc6d37 100644
> --- a/ctree.c
> +++ b/ctree.c
> @@ -22,6 +22,7 @@
>  #include "repair.h"
>  #include "internal.h"
>  #include "sizes.h"
> +#include "messages.h"
>  
>  static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
> *root, struct btrfs_path *path, int level);
> @@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct 
> btrfs_key *key,
>  struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
>  struct extent_buffer *parent, int slot)
>  {
> + struct extent_buffer *ret;
>   int level = btrfs_header_level(parent);
> +
>   if (slot < 0)
>   return NULL;
>   if (slot >= btrfs_header_nritems(parent))
> @@ -649,8 +652,16 @@ struct extent_buffer *read_node_slot(struct 
> btrfs_fs_info *fs_info,
>   if (level == 0)
>   return NULL;
>  
> - return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
> + ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),

The result of read_tree_block should be checked before use, by
extent_buffer_uptodate, null check or IS_ERR at least (depending on the
context of use).

>  btrfs_node_ptr_generation(parent, slot));
> + if (btrfs_header_level(ret) != level - 1) {
> + error("child eb corrupted: parent bytenr=%llu item=%d parent 
> level=%d child level=%d",
> +   btrfs_header_bytenr(parent), slot,
> +   btrfs_header_level(parent), btrfs_header_level(ret));
> + free_extent_buffer(ret);
> + return ERR_PTR(-EIO);
> + }
> + return ret;
>  }
>  
>  static int balance_level(struct btrfs_trans_handle *trans,
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: tests common: remove meaningless colon in extract_image()

2018-02-07 Thread David Sterba
On Wed, Feb 07, 2018 at 05:57:43PM +0800, Su Yue wrote:
> The colon is meaningless so remove it.
> 
> Signed-off-by: Su Yue 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: send, do not issue unnecessary truncate operations

2018-02-07 Thread fdmanana
From: Filipe Manana 

When send finishes processing an inode representing a regular file, it
always issues a truncate operation for that file, even if its size did
not change or the last write sets the file size correctly. In the most
common cases, the issued write operations set the file to correct size
(either full or incremental sends) or the file size did not change (for
incremental sends), so the only case where a truncate operation is needed
is when a file size becomes smaller in the send snapshot when compared
to the parent snapshot.

By not issuing unnecessary truncate operations we reduce the stream size
and save time in the receiver. Currently truncating a file to the same
size triggers writeback of its last page (if it's dirty) and waits for it
to complete (only if the file size is not aligned with the filesystem's
sector size). This is being fixed by another patch and is independent of
this change (that patch's title is "Btrfs: skip writeback of last page
when truncating file to same size").

The following script was used to measure time spent by a receiver without
this change applied, with this change applied, and without this change and
with the truncate fix applied (the fix to not make it start and wait for
writeback to complete).

  $ cat test_send.sh
  #!/bin/bash

  SRC_DEV=/dev/sdc
  DST_DEV=/dev/sdd
  SRC_MNT=/mnt/sdc
  DST_MNT=/mnt/sdd

  mkfs.btrfs -f $SRC_DEV >/dev/null
  mkfs.btrfs -f $DST_DEV >/dev/null
  mount $SRC_DEV $SRC_MNT
  mount $DST_DEV $DST_MNT

  echo "Creating source filesystem"
  for ((t = 0; t < 10; t++)); do
  (
  for ((i = 1; i <= 2; i++)); do
  xfs_io -f -c "pwrite -S 0xab 0 5000" \
  $SRC_MNT/file_$i > /dev/null
  done
  ) &
 worker_pids[$t]=$!
  done
  wait ${worker_pids[@]}

  echo "Creating and sending snapshot"
  btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null
  /usr/bin/time -f "send took %e seconds"\
 btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1
  /usr/bin/time -f "receive took %e seconds" \
 btrfs receive -f $SRC_MNT/send_file $DST_MNT

  umount $SRC_MNT
  umount $DST_MNT

The results, which are averages for 5 runs for each case, were the
following:

* Without this change

average receive time was 26.49 seconds
standard deviation of 2.53 seconds

* Without this change and with the truncate fix

average receive time was 12.51 seconds
standard deviation of 0.32 seconds

* With this change and without the truncate fix

average receive time was 10.02 seconds
standard deviation of 1.11 seconds

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 484e2af793de..5df50d67d319 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -112,6 +112,7 @@ struct send_ctx {
u64 cur_inode_mode;
u64 cur_inode_rdev;
u64 cur_inode_last_extent;
+   u64 cur_inode_next_write_offset;
 
u64 send_progress;
 
@@ -5029,6 +5030,7 @@ static int send_hole(struct send_ctx *sctx, u64 end)
break;
offset += len;
}
+   sctx->cur_inode_next_write_offset = offset;
 tlv_put_failure:
fs_path_free(p);
return ret;
@@ -5264,6 +5266,7 @@ static int send_write_or_clone(struct send_ctx *sctx,
} else {
ret = send_extent_data(sctx, offset, len);
}
+   sctx->cur_inode_next_write_offset = offset + len;
 out:
return ret;
 }
@@ -5788,6 +5791,7 @@ static int finish_inode_if_needed(struct send_ctx *sctx, 
int at_end)
u64 right_gid;
int need_chmod = 0;
int need_chown = 0;
+   int need_truncate = 1;
int pending_move = 0;
int refs_processed = 0;
 
@@ -5825,9 +5829,13 @@ static int finish_inode_if_needed(struct send_ctx *sctx, 
int at_end)
need_chown = 1;
if (!S_ISLNK(sctx->cur_inode_mode))
need_chmod = 1;
+   if (sctx->cur_inode_next_write_offset == sctx->cur_inode_size)
+   need_truncate = 0;
} else {
+   u64 old_size;
+
ret = get_inode_info(sctx->parent_root, sctx->cur_ino,
-   NULL, NULL, _mode, _uid,
+   _size, NULL, _mode, _uid,
_gid, NULL);
if (ret < 0)
goto out;
@@ -5836,6 +5844,10 @@ static int finish_inode_if_needed(struct send_ctx *sctx, 
int at_end)
need_chown = 1;
if (!S_ISLNK(sctx->cur_inode_mode) && left_mode != right_mode)
need_chmod = 1;
+   if ((old_size == sctx->cur_inode_size) ||
+   (sctx->cur_inode_size > old_size &&
+sctx->cur_inode_next_write_offset == sctx->cur_inode_size))
+   

[PATCH] Btrfs: skip writeback of last page when truncating file to same size

2018-02-07 Thread fdmanana
From: Filipe Manana 

When we truncate a file to the same size and that size is not aligned
with the sector size, we end up triggering writeback (and wait for it to
complete) of the last page. This is unncessary as we can not have delayed
allocation beyond the inode's i_size and the goal of truncating a file
to its own size is to discard prealloc extents (allocated via the
fallocate(2) system call). Besides the unnecessary IO start and wait, it
also breaks the oppurtunity for larger contiguous extents on disk, as
before the last dirty page there might be other dirty pages.

This scenario is probably not very common in general, however it is
common for btrfs receive implementations because currently the send
stream always issues a truncate operation for each processed inode as
the last operation for that inode (this truncate operation is not
always needed and the send implementation will be addressed to avoid
them).

So improve this by not starting and waiting for writeback of the inode's
last page when we are truncating to exactly the same size.

The following script was used to quickly measure the time a receive
operation takes:

 $ cat test_send.sh
 #!/bin/bash

 SRC_DEV=/dev/sdc
 DST_DEV=/dev/sdd
 SRC_MNT=/mnt/sdc
 DST_MNT=/mnt/sdd

 mkfs.btrfs -f $SRC_DEV >/dev/null
 mkfs.btrfs -f $DST_DEV >/dev/null
 mount $SRC_DEV $SRC_MNT
 mount $DST_DEV $DST_MNT

 echo "Creating source filesystem"
 for ((t = 0; t < 10; t++)); do
 (
 for ((i = 1; i <= 2; i++)); do
 xfs_io -f -c "pwrite -S 0xab 0 5000" \
$SRC_MNT/file_$i > /dev/null
 done
 ) &
 worker_pids[$t]=$!
 done
 wait ${worker_pids[@]}

 echo "Creating and sending snapshot"
 btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null
 /usr/bin/time -f "send took %e seconds"\
 btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1
 /usr/bin/time -f "receive took %e seconds" \
 btrfs receive -f $SRC_MNT/send_file $DST_MNT

 umount $SRC_MNT
 umount $DST_MNT

The results for 5 runs were the following:

* Without this change

average receive time was 26.49 seconds
standard deviation of 2.53 seconds

* With this change

average receive time was 12.51 seconds
standard deviation of 0.32 seconds

Reported-by: Robbie Ko 
Signed-off-by: Filipe Manana 
---
 fs/btrfs/inode.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 2a19413a7868..dae631ab5cb2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -101,7 +101,7 @@ static const unsigned char btrfs_type_by_mode[S_IFMT >> 
S_SHIFT] = {
 };
 
 static int btrfs_setsize(struct inode *inode, struct iattr *attr);
-static int btrfs_truncate(struct inode *inode);
+static int btrfs_truncate(struct inode *inode, bool skip_writeback);
 static int btrfs_finish_ordered_io(struct btrfs_ordered_extent 
*ordered_extent);
 static noinline int cow_file_range(struct inode *inode,
   struct page *locked_page,
@@ -3625,7 +3625,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
goto out;
}
 
-   ret = btrfs_truncate(inode);
+   ret = btrfs_truncate(inode, false);
if (ret)
btrfs_orphan_del(NULL, BTRFS_I(inode));
} else {
@@ -5109,7 +5109,7 @@ static int btrfs_setsize(struct inode *inode, struct 
iattr *attr)
inode_dio_wait(inode);
btrfs_inode_resume_unlocked_dio(BTRFS_I(inode));
 
-   ret = btrfs_truncate(inode);
+   ret = btrfs_truncate(inode, newsize == oldsize);
if (ret && inode->i_nlink) {
int err;
 
@@ -9087,7 +9087,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf)
return ret;
 }
 
-static int btrfs_truncate(struct inode *inode)
+static int btrfs_truncate(struct inode *inode, bool skip_writeback)
 {
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -9098,10 +9098,12 @@ static int btrfs_truncate(struct inode *inode)
u64 mask = fs_info->sectorsize - 1;
u64 min_size = btrfs_calc_trunc_metadata_size(fs_info, 1);
 
-   ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask),
-  (u64)-1);
-   if (ret)
-   return ret;
+   if (!skip_writeback) {
+   ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask),
+  (u64)-1);
+   if (ret)
+   return ret;
+   }
 
/*
 * Yes ladies and gentlemen, this is indeed ugly.  The fact is we have
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More 

[PATCH] Btrfs: send, fix issuing write op when processing hole in no data mode

2018-02-07 Thread fdmanana
From: Filipe Manana 

When doing an incremental send of a filesystem with the no-holes feature
enabled, we end up issuing a write operation when using the no data mode
send flag, instead of issuing an update extent operation. Fix this by
issuing the update extent operation instead.

Trivial reproducer:

  $ mkfs.btrfs -f -O no-holes /dev/sdc
  $ mkfs.btrfs -f /dev/sdd
  $ mount /dev/sdc /mnt/sdc
  $ mount /dev/sdd /mnt/sdd

  $ xfs_io -f -c "pwrite -S 0xab 0 32K" /mnt/sdc/foobar
  $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap1

  $ xfs_io -c "fpunch 8K 8K" /mnt/sdc/foobar
  $ btrfs subvolume snapshot -r /mnt/sdc /mnt/sdc/snap2

  $ btrfs send /mnt/sdc/snap1 | btrfs receive /mnt/sdd
  $ btrfs send --no-data -p /mnt/sdc/snap1 /mnt/sdc/snap2 \
   | btrfs receive -vv /mnt/sdd

Before this change the output of the second receive command is:

  receiving snapshot snap2 uuid=f6922049-8c22-e544-9ff9-fc6755918447...
  utimes
  write foobar, offset 8192, len 8192
  utimes foobar
  BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=f6922049-8c22-e544-9ff9-...

After this change it is:

  receiving snapshot snap2 uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...
  utimes
  update_extent foobar: offset=8192, len=8192
  utimes foobar
  BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=564d36a3-ebc8-7343-aec9-bf6fda278e64...

Signed-off-by: Filipe Manana 
---
 fs/btrfs/send.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index f306c608dc28..484e2af793de 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -5005,6 +5005,9 @@ static int send_hole(struct send_ctx *sctx, u64 end)
u64 len;
int ret = 0;
 
+   if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA)
+   return send_update_extent(sctx, offset, end - offset);
+
p = fs_path_alloc();
if (!p)
return -ENOMEM;
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: tests common: remove meaningless colon in extract_image()

2018-02-07 Thread Su Yue
The colon is meaningless so remove it.

Signed-off-by: Su Yue 
---
 tests/common | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/common b/tests/common
index 8e5d0cde1b7e..7f641a004661 100644
--- a/tests/common
+++ b/tests/common
@@ -331,7 +331,7 @@ extract_image()
case "$image" in
*.img)
rm -f "$image.restored"
-   : ;;
+   ;;
*.img.xz)
xz --decompress --keep "$image" || \
_fail "failed to decompress image $image" >&2
-- 
2.16.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't hardcode the csum size in btrfs_ordered_sum_size

2018-02-07 Thread Su Yue



On 02/07/2018 05:19 PM, Nikolay Borisov wrote:

Currently the function uses a hardcoded value for the checksum size of
a sector. This is fine, given that we currently support only a single
algorithm, whose checksum is 4 bytes == sizeof(u32). Despite not
having other algorithms, btrfs' design supports using a different
algorithm whith different space requirements. To future-proof the code
query the size of the currently used algorithm from the in-memory copy
of the super block. No functional changes.

Signed-off-by: Nikolay Borisov 


Reviewed-by: Su Yue 

---
  fs/btrfs/ordered-data.h | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 56c4c0ee6381..c53e2cfb72d9 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -151,7 +151,9 @@ static inline int btrfs_ordered_sum_size(struct 
btrfs_fs_info *fs_info,
 unsigned long bytes)
  {
int num_sectors = (int)DIV_ROUND_UP(bytes, fs_info->sectorsize);
-   return sizeof(struct btrfs_ordered_sum) + num_sectors * sizeof(u32);
+   int csum_size = btrfs_super_csum_size(fs_info->super_copy);
+
+   return sizeof(struct btrfs_ordered_sum) + num_sectors * csum_size;
  }
  
  static inline void





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Don't hardcode the csum size in btrfs_ordered_sum_size

2018-02-07 Thread Qu Wenruo


On 2018年02月07日 17:19, Nikolay Borisov wrote:
> Currently the function uses a hardcoded value for the checksum size of
> a sector. This is fine, given that we currently support only a single
> algorithm, whose checksum is 4 bytes == sizeof(u32). Despite not
> having other algorithms, btrfs' design supports using a different
> algorithm whith different space requirements. To future-proof the code
> query the size of the currently used algorithm from the in-memory copy
> of the super block. No functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: Qu Wenruo 

Thanks,
Qu

> ---
>  fs/btrfs/ordered-data.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
> index 56c4c0ee6381..c53e2cfb72d9 100644
> --- a/fs/btrfs/ordered-data.h
> +++ b/fs/btrfs/ordered-data.h
> @@ -151,7 +151,9 @@ static inline int btrfs_ordered_sum_size(struct 
> btrfs_fs_info *fs_info,
>unsigned long bytes)
>  {
>   int num_sectors = (int)DIV_ROUND_UP(bytes, fs_info->sectorsize);
> - return sizeof(struct btrfs_ordered_sum) + num_sectors * sizeof(u32);
> + int csum_size = btrfs_super_csum_size(fs_info->super_copy);
> +
> + return sizeof(struct btrfs_ordered_sum) + num_sectors * csum_size;
>  }
>  
>  static inline void
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs-progs: fsck-tests: Cleanup the restored image for 028

2018-02-07 Thread Qu Wenruo
Please ignore this one.

I just forgot to remove unrelated patch.

Thanks,
Qu

On 2018年02月07日 17:17, Qu Wenruo wrote:
> Signed-off-by: Qu Wenruo 
> ---
>  tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh 
> b/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh
> index 3928f548c3f9..4bbcfbae662e 100755
> --- a/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh
> +++ b/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh
> @@ -21,3 +21,5 @@ run_check "$TOP/btrfs" check "$TEST_DEV"
>  # mount test
>  run_check_mount_test_dev
>  run_check_umount_test_dev "$TEST_MNT"
> +# don't forget to clean it up
> +rm "$TEST_DEV"
> 



signature.asc
Description: OpenPGP digital signature


[PATCH] btrfs: Don't hardcode the csum size in btrfs_ordered_sum_size

2018-02-07 Thread Nikolay Borisov
Currently the function uses a hardcoded value for the checksum size of
a sector. This is fine, given that we currently support only a single
algorithm, whose checksum is 4 bytes == sizeof(u32). Despite not
having other algorithms, btrfs' design supports using a different
algorithm whith different space requirements. To future-proof the code
query the size of the currently used algorithm from the in-memory copy
of the super block. No functional changes.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ordered-data.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 56c4c0ee6381..c53e2cfb72d9 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -151,7 +151,9 @@ static inline int btrfs_ordered_sum_size(struct 
btrfs_fs_info *fs_info,
 unsigned long bytes)
 {
int num_sectors = (int)DIV_ROUND_UP(bytes, fs_info->sectorsize);
-   return sizeof(struct btrfs_ordered_sum) + num_sectors * sizeof(u32);
+   int csum_size = btrfs_super_csum_size(fs_info->super_copy);
+
+   return sizeof(struct btrfs_ordered_sum) + num_sectors * csum_size;
 }
 
 static inline void
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: ctree: Add extra level check for read_node_slot()

2018-02-07 Thread Qu Wenruo
Strangely, we have level check in btrfs_print_tree() while we don't have
the same check in read_node_slot().

That's to say, for the following corruption, btrfs_search_slot() or
btrfs_next_leaf() can return invalid leaf:

Parent eb:
  node XX level 1
  ^^^
  Child should be leaf (level 0)
  ...
  key (XXX XXX XXX) block YY

Child eb:
  leaf YY level 1
  ^^^
  Something went wrong now

And for the corrupted leaf returned, later caller can be screwed up
easily.

Although the root cause (powerloss, but still something wrong breaking
metadata CoW of btrfs) is still unknown, at least enhance btrfs-progs to
avoid SEGV.

Reported-by: Ralph Gauges 
Signed-off-by: Qu Wenruo 
---
 ctree.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/ctree.c b/ctree.c
index 4fc33b14000a..ddb1e9cc6d37 100644
--- a/ctree.c
+++ b/ctree.c
@@ -22,6 +22,7 @@
 #include "repair.h"
 #include "internal.h"
 #include "sizes.h"
+#include "messages.h"
 
 static int split_node(struct btrfs_trans_handle *trans, struct btrfs_root
  *root, struct btrfs_path *path, int level);
@@ -640,7 +641,9 @@ static int bin_search(struct extent_buffer *eb, struct 
btrfs_key *key,
 struct extent_buffer *read_node_slot(struct btrfs_fs_info *fs_info,
   struct extent_buffer *parent, int slot)
 {
+   struct extent_buffer *ret;
int level = btrfs_header_level(parent);
+
if (slot < 0)
return NULL;
if (slot >= btrfs_header_nritems(parent))
@@ -649,8 +652,16 @@ struct extent_buffer *read_node_slot(struct btrfs_fs_info 
*fs_info,
if (level == 0)
return NULL;
 
-   return read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
+   ret = read_tree_block(fs_info, btrfs_node_blockptr(parent, slot),
   btrfs_node_ptr_generation(parent, slot));
+   if (btrfs_header_level(ret) != level - 1) {
+   error("child eb corrupted: parent bytenr=%llu item=%d parent 
level=%d child level=%d",
+ btrfs_header_bytenr(parent), slot,
+ btrfs_header_level(parent), btrfs_header_level(ret));
+   free_extent_buffer(ret);
+   return ERR_PTR(-EIO);
+   }
+   return ret;
 }
 
 static int balance_level(struct btrfs_trans_handle *trans,
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: fsck-tests: Cleanup the restored image for 028

2018-02-07 Thread Qu Wenruo
Signed-off-by: Qu Wenruo 
---
 tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh 
b/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh
index 3928f548c3f9..4bbcfbae662e 100755
--- a/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh
+++ b/tests/fsck-tests/028-unaligned-super-dev-sizes/test.sh
@@ -21,3 +21,5 @@ run_check "$TOP/btrfs" check "$TEST_DEV"
 # mount test
 run_check_mount_test_dev
 run_check_umount_test_dev "$TEST_MNT"
+# don't forget to clean it up
+rm "$TEST_DEV"
-- 
2.16.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html