Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes

2018-06-19 Thread Qu Wenruo



On 2018年06月20日 13:25, Nikolay Borisov wrote:
> 
> 
> On 15.06.2018 04:35, Qu Wenruo wrote:
>> [BUG]
>> Under certain KVM load and LTP tests, we are possible to hit the
>> following calltrace if quota is enabled:
>> --
>> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
>> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
>> [ cut here ]
>> WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 
>> blk_status_to_errno+0x1a/0x30
>> CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 
>> (unreleased)
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.0.0-prebuilt.qemu-project.org 04/01/2014
>> Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
>> task: 9f827b340bc0 task.stack: b4f8c0304000
>> RIP: 0010:blk_status_to_errno+0x1a/0x30
>> Call Trace:
>>  submit_extent_page+0x191/0x270 [btrfs]
>>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>>  __do_readpage+0x2d2/0x810 [btrfs]
>>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>  __extent_read_full_page+0xe7/0x100 [btrfs]
>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>  read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
>>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>>  btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
>>  read_tree_block+0x31/0x60 [btrfs]
>>  read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
>>  btrfs_search_slot+0x46b/0xa00 [btrfs]
>>  ? kmem_cache_alloc+0x1a8/0x510
>>  ? btrfs_get_token_32+0x5b/0x120 [btrfs]
>>  find_parent_nodes+0x11d/0xeb0 [btrfs]
>>  ? leaf_space_used+0xb8/0xd0 [btrfs]
>>  ? btrfs_leaf_free_space+0x49/0x90 [btrfs]
>>  ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>>  btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>>  btrfs_find_all_roots+0x45/0x60 [btrfs]
>>  btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
>>  btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
>>  btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
>>  insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
>>  btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
>>  ? pick_next_task_fair+0x2cd/0x530
>>  ? __switch_to+0x92/0x4b0
>>  btrfs_worker_helper+0x81/0x300 [btrfs]
>>  process_one_work+0x1da/0x3f0
>>  worker_thread+0x2b/0x3f0
>>  ? process_one_work+0x3f0/0x3f0
>>  kthread+0x11a/0x130
>>  ? kthread_create_on_node+0x40/0x40
>>  ret_from_fork+0x35/0x40
>> Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 
>> 80 ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff 
>> ff ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00
>> ---[ end trace f079fb809e7a862b ]---
>> BTRFS critical (device vda2): unable to find logical 8820195328 length 16384
>> BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO 
>> failure
>> BTRFS info (device vda2): forced readonly
>> BTRFS error (device vda2): pending csums is 2887680
>> --
>>
>> [CAUSE]
>> It's caused by race with block group auto removal like the following
>> case:
>> - There is a meta block group X, which has only one tree block
>>   The tree block belongs to fs tree 257.
>> - In current transaction, some operation modified fs tree 257
>>   The tree block get CoWed, so the block group X is empty, and marked as
>>   unused, queued to be deleted.
>> - Some workload (like fsync) wakes up cleaner_kthread()
>>   Which will call btrfs_deleted_unused_bgs() to remove unused block
>>   groups.
>>   So block group X along its chunk map get removed.
>> - Some delalloc work finished for fs tree 257
>>   Quota needs to get the original reference of the extent, which will
>>   reads tree blocks of commit root of 257.
>>   Then since the chunk map get removed, above warning get triggered.
>>
>> [FIX]
>> Just teach btrfs_delete_unused_bgs() to skip block group who still has
>> pinned bytes.
>>
>> However there is a minor side effect, since currently we only queue
>> empty blocks at update_block_group(), and such empty block group with
>> pinned bytes won't go through update_block_group() again, such block
>> group won't be removed, until it get new extent allocated and removed.
>>
>> But please note that, there are more problems related to extent
>> allocator with block group auto removal.
>>
>> Even a block group is marked unused, extent allocator can still allocate
>> new extents from unused block group.
>> Thus delaying block group to next transaction won't work.
>> (Extents get allocated in current transaction, and removed again in next
>> transaction).
> 
> Isn't this easily solvable by making the allocator check whether the
> block group it has chosen is part of the unused_bgs_list ?

That's also my initial idea, however bg_cache->bg_list can also be used
in trans->new_bgs.

Another factor is, even we check the block group of an extent in
find_free_extent() and skip the allocation, we can lead to more frequent
false ENOSPC.

The correct way to do it is to allow find_free_extent() to skip block
group which 

Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes

2018-06-19 Thread Nikolay Borisov



On 15.06.2018 04:35, Qu Wenruo wrote:
> [BUG]
> Under certain KVM load and LTP tests, we are possible to hit the
> following calltrace if quota is enabled:
> --
> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
> [ cut here ]
> WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 
> blk_status_to_errno+0x1a/0x30
> CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 
> (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> task: 9f827b340bc0 task.stack: b4f8c0304000
> RIP: 0010:blk_status_to_errno+0x1a/0x30
> Call Trace:
>  submit_extent_page+0x191/0x270 [btrfs]
>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>  __do_readpage+0x2d2/0x810 [btrfs]
>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  __extent_read_full_page+0xe7/0x100 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
>  read_tree_block+0x31/0x60 [btrfs]
>  read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
>  btrfs_search_slot+0x46b/0xa00 [btrfs]
>  ? kmem_cache_alloc+0x1a8/0x510
>  ? btrfs_get_token_32+0x5b/0x120 [btrfs]
>  find_parent_nodes+0x11d/0xeb0 [btrfs]
>  ? leaf_space_used+0xb8/0xd0 [btrfs]
>  ? btrfs_leaf_free_space+0x49/0x90 [btrfs]
>  ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>  btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>  btrfs_find_all_roots+0x45/0x60 [btrfs]
>  btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
>  btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
>  btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
>  insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
>  btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
>  ? pick_next_task_fair+0x2cd/0x530
>  ? __switch_to+0x92/0x4b0
>  btrfs_worker_helper+0x81/0x300 [btrfs]
>  process_one_work+0x1da/0x3f0
>  worker_thread+0x2b/0x3f0
>  ? process_one_work+0x3f0/0x3f0
>  kthread+0x11a/0x130
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x35/0x40
> Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 
> ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff 
> ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00
> ---[ end trace f079fb809e7a862b ]---
> BTRFS critical (device vda2): unable to find logical 8820195328 length 16384
> BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO 
> failure
> BTRFS info (device vda2): forced readonly
> BTRFS error (device vda2): pending csums is 2887680
> --
> 
> [CAUSE]
> It's caused by race with block group auto removal like the following
> case:
> - There is a meta block group X, which has only one tree block
>   The tree block belongs to fs tree 257.
> - In current transaction, some operation modified fs tree 257
>   The tree block get CoWed, so the block group X is empty, and marked as
>   unused, queued to be deleted.
> - Some workload (like fsync) wakes up cleaner_kthread()
>   Which will call btrfs_deleted_unused_bgs() to remove unused block
>   groups.
>   So block group X along its chunk map get removed.
> - Some delalloc work finished for fs tree 257
>   Quota needs to get the original reference of the extent, which will
>   reads tree blocks of commit root of 257.
>   Then since the chunk map get removed, above warning get triggered.
> 
> [FIX]
> Just teach btrfs_delete_unused_bgs() to skip block group who still has
> pinned bytes.
> 
> However there is a minor side effect, since currently we only queue
> empty blocks at update_block_group(), and such empty block group with
> pinned bytes won't go through update_block_group() again, such block
> group won't be removed, until it get new extent allocated and removed.
> 
> But please note that, there are more problems related to extent
> allocator with block group auto removal.
> 
> Even a block group is marked unused, extent allocator can still allocate
> new extents from unused block group.
> Thus delaying block group to next transaction won't work.
> (Extents get allocated in current transaction, and removed again in next
> transaction).

Isn't this easily solvable by making the allocator check whether the
block group it has chosen is part of the unused_bgs_list ?

> 
> So the root fix need to co-operate with extent allocator.
> But current fix should be enough for this particular bug.
> 
> Signed-off-by: Qu Wenruo 

LGTM,

Reviewed-by: Nikolay Borisov 

> ---
> changelog:
> v2:
>   Commit message update, to better indicate how pinned byte is used in
>   btrfs and why it's related to quota.
> v3:
>   Commit message update, further explaining the bug with an example.
>   And added the side effect of the fix, and possible further fix.
> 

Re: [PATCH 2/2] btrfs-progs: misc-tests: Fix 029 test cases for sudo test environment

2018-06-19 Thread Nikolay Borisov



On 20.06.2018 03:38, Qu Wenruo wrote:
> Test misc/029 only works if the test case is executed as root, while for
> sudo usage, it doesn't work as initial mkdir and final cleanup doesn't
> use $SUDO_HELPER.
> 
> Add "run_check $SUDO_HELPER" for such cases to allow it works under sudo
> usage.
> 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Nikolay Borisov 

Although this seems unrelated to the first patch. I'd advise in this
case to send them as separate patches or if you are going to batch up
multiple fixes add a cover letter saying "just a bunch of random fixes
to xxx" :)

> ---
>  tests/misc-tests/029-send-p-different-mountpoints/test.sh | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/tests/misc-tests/029-send-p-different-mountpoints/test.sh 
> b/tests/misc-tests/029-send-p-different-mountpoints/test.sh
> index 0b42b772e3d7..90465e1d08fd 100755
> --- a/tests/misc-tests/029-send-p-different-mountpoints/test.sh
> +++ b/tests/misc-tests/029-send-p-different-mountpoints/test.sh
> @@ -14,7 +14,7 @@ prepare_test_dev
>  SUBVOL_MNT="$TEST_MNT/subvol"
>  TOPLEVEL_MNT="$TEST_MNT/toplevel"
>  TEST_MNT="$TOPLEVEL_MNT"
> -mkdir -p "$TOPLEVEL_MNT" "$SUBVOL_MNT"
> +run_check $SUDO_HELPER mkdir -p "$TOPLEVEL_MNT" "$SUBVOL_MNT"
>  
>  run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$TEST_DEV"
>  run_check_mount_test_dev
> @@ -47,5 +47,5 @@ run_mustfail_stdout "send -p on 2 mount points" \
>  run_check_umount_test_dev "$SUBVOL_MNT"
>  run_check_umount_test_dev "$TOPLEVEL_MNT"
>  
> -rmdir "$SUBVOL_MNT"
> -rmdir "$TOPLEVEL_MNT"
> +run_check $SUDO_HELPER rmdir "$SUBVOL_MNT"
> +run_check $SUDO_HELPER rmdir "$TOPLEVEL_MNT"
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: Don't remove block group still has pinned down bytes

2018-06-19 Thread Qu Wenruo
Gentle ping.

Since it's causing LTP tests failure, I'd like to backport it asap.

Thanks,
Qu

On 2018年06月15日 09:35, Qu Wenruo wrote:
> [BUG]
> Under certain KVM load and LTP tests, we are possible to hit the
> following calltrace if quota is enabled:
> --
> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
> BTRFS critical (device vda2): unable to find logical 8820195328 length 4096
> [ cut here ]
> WARNING: CPU: 0 PID: 49 at ../block/blk-core.c:172 
> blk_status_to_errno+0x1a/0x30
> CPU: 0 PID: 49 Comm: kworker/u2:1 Not tainted 4.12.14-15-default #1 SLE15 
> (unreleased)
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> task: 9f827b340bc0 task.stack: b4f8c0304000
> RIP: 0010:blk_status_to_errno+0x1a/0x30
> Call Trace:
>  submit_extent_page+0x191/0x270 [btrfs]
>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>  __do_readpage+0x2d2/0x810 [btrfs]
>  ? btrfs_create_repair_bio+0x130/0x130 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  __extent_read_full_page+0xe7/0x100 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  read_extent_buffer_pages+0x1ab/0x2d0 [btrfs]
>  ? run_one_async_done+0xc0/0xc0 [btrfs]
>  btree_read_extent_buffer_pages+0x94/0xf0 [btrfs]
>  read_tree_block+0x31/0x60 [btrfs]
>  read_block_for_search.isra.35+0xf0/0x2e0 [btrfs]
>  btrfs_search_slot+0x46b/0xa00 [btrfs]
>  ? kmem_cache_alloc+0x1a8/0x510
>  ? btrfs_get_token_32+0x5b/0x120 [btrfs]
>  find_parent_nodes+0x11d/0xeb0 [btrfs]
>  ? leaf_space_used+0xb8/0xd0 [btrfs]
>  ? btrfs_leaf_free_space+0x49/0x90 [btrfs]
>  ? btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>  btrfs_find_all_roots_safe+0x93/0x100 [btrfs]
>  btrfs_find_all_roots+0x45/0x60 [btrfs]
>  btrfs_qgroup_trace_extent_post+0x20/0x40 [btrfs]
>  btrfs_add_delayed_data_ref+0x1a3/0x1d0 [btrfs]
>  btrfs_alloc_reserved_file_extent+0x38/0x40 [btrfs]
>  insert_reserved_file_extent.constprop.71+0x289/0x2e0 [btrfs]
>  btrfs_finish_ordered_io+0x2f4/0x7f0 [btrfs]
>  ? pick_next_task_fair+0x2cd/0x530
>  ? __switch_to+0x92/0x4b0
>  btrfs_worker_helper+0x81/0x300 [btrfs]
>  process_one_work+0x1da/0x3f0
>  worker_thread+0x2b/0x3f0
>  ? process_one_work+0x3f0/0x3f0
>  kthread+0x11a/0x130
>  ? kthread_create_on_node+0x40/0x40
>  ret_from_fork+0x35/0x40
> Code: 00 00 5b c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 40 80 
> ff 0c 40 0f b6 c7 77 0b 48 c1 e0 04 8b 80 00 bf c8 bd c3 <0f> 0b b8 fb ff ff 
> ff c3 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00
> ---[ end trace f079fb809e7a862b ]---
> BTRFS critical (device vda2): unable to find logical 8820195328 length 16384
> BTRFS: error (device vda2) in btrfs_finish_ordered_io:3023: errno=-5 IO 
> failure
> BTRFS info (device vda2): forced readonly
> BTRFS error (device vda2): pending csums is 2887680
> --
> 
> [CAUSE]
> It's caused by race with block group auto removal like the following
> case:
> - There is a meta block group X, which has only one tree block
>   The tree block belongs to fs tree 257.
> - In current transaction, some operation modified fs tree 257
>   The tree block get CoWed, so the block group X is empty, and marked as
>   unused, queued to be deleted.
> - Some workload (like fsync) wakes up cleaner_kthread()
>   Which will call btrfs_deleted_unused_bgs() to remove unused block
>   groups.
>   So block group X along its chunk map get removed.
> - Some delalloc work finished for fs tree 257
>   Quota needs to get the original reference of the extent, which will
>   reads tree blocks of commit root of 257.
>   Then since the chunk map get removed, above warning get triggered.
> 
> [FIX]
> Just teach btrfs_delete_unused_bgs() to skip block group who still has
> pinned bytes.
> 
> However there is a minor side effect, since currently we only queue
> empty blocks at update_block_group(), and such empty block group with
> pinned bytes won't go through update_block_group() again, such block
> group won't be removed, until it get new extent allocated and removed.
> 
> But please note that, there are more problems related to extent
> allocator with block group auto removal.
> 
> Even a block group is marked unused, extent allocator can still allocate
> new extents from unused block group.
> Thus delaying block group to next transaction won't work.
> (Extents get allocated in current transaction, and removed again in next
> transaction).
> 
> So the root fix need to co-operate with extent allocator.
> But current fix should be enough for this particular bug.
> 
> Signed-off-by: Qu Wenruo 
> ---
> changelog:
> v2:
>   Commit message update, to better indicate how pinned byte is used in
>   btrfs and why it's related to quota.
> v3:
>   Commit message update, further explaining the bug with an example.
>   And added the side effect of the fix, and possible further fix.
> ---
>  fs/btrfs/extent-tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 

Re: [LKP] [lkp-robot] [mm] 9092c71bb7: blogbench.write_score -12.3% regression

2018-06-19 Thread Huang, Ying
Ping * 3

"Huang, Ying"  writes:

> Ping again ...
>
> "Huang, Ying"  writes:
>
>> Ping...
>>
>> "Huang, Ying"  writes:
>>
>>> Hi, Josef,
>>>
>>> Do you have time to take a look at the regression?
>>>
>>> kernel test robot  writes:
>>>
 Greeting,

 FYI, we noticed a -12.3% regression of blogbench.write_score and a +9.6% 
 improvement
 of blogbench.read_score due to commit:


 commit: 9092c71bb724dba2ecba849eae69e5c9d39bd3d2 ("mm: use sc->priority 
 for slab shrink targets")
 https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

 in testcase: blogbench
 on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G 
 memory
 with following parameters:

disk: 1SSD
fs: btrfs
cpufreq_governor: performance

 test-description: Blogbench is a portable filesystem benchmark that tries 
 to reproduce the load of a real-world busy file server.
 test-url: https://www.pureftpd.org/project/blogbench



 Details are as below:
 -->


 To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

 =
 compiler/cpufreq_governor/disk/fs/kconfig/rootfs/tbox_group/testcase:
   
 gcc-7/performance/1SSD/btrfs/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-bdw-de1/blogbench

 commit: 
   fcb2b0c577 ("mm: show total hugetlb memory consumption in /proc/meminfo")
   9092c71bb7 ("mm: use sc->priority for slab shrink targets")

 fcb2b0c577f145c7 9092c71bb724dba2ecba849eae 
  -- 
  %stddev %change %stddev
  \  |\  
   3256   -12.3%   2854blogbench.write_score
1235237   2%  +9.6%1354163blogbench.read_score
   28050912   -10.1%   25212230
 blogbench.time.file_system_outputs
6481995   3% +25.0%8105320   2%  
 blogbench.time.involuntary_context_switches
 906.00   +13.7%   1030
 blogbench.time.percent_of_cpu_this_job_got
   2552   +14.0%   2908blogbench.time.system_time
 173.80+8.4% 188.32blogbench.time.user_time
   19353936+3.6%   20045728
 blogbench.time.voluntary_context_switches
8719514   +13.0%9850451softirqs.RCU
   2.97   5%  -0.72.30   3%  mpstat.cpu.idle%
  24.92-6.5   18.46mpstat.cpu.iowait%
   0.65   2%  +0.10.75mpstat.cpu.soft%
  67.76+6.7   74.45mpstat.cpu.sys%
  50206   -10.7%  44858vmstat.io.bo
  49.25-9.1%  44.75   2%  vmstat.procs.b
 224125-1.8% 220135vmstat.system.cs
  48903   +10.7%  54134vmstat.system.in
3460654   +10.8%3834883meminfo.Active
3380666   +11.0%3752872meminfo.Active(file)
1853849   -17.4%1530415meminfo.Inactive
1836507   -17.6%1513054meminfo.Inactive(file)
 551311   -10.3% 494265meminfo.SReclaimable
 196525   -12.6% 171775meminfo.SUnreclaim
 747837   -10.9% 666040meminfo.Slab
  8.904e+08   -24.9%  6.683e+08cpuidle.C1.time
   22971020   -12.8%   20035820cpuidle.C1.usage
  2.518e+08   3% -31.7%   1.72e+08cpuidle.C1E.time
 821393   2% -33.3% 548003cpuidle.C1E.usage
   75460078   2% -23.3%   57903768   2%  cpuidle.C3.time
 136506   3% -25.3% 101956   3%  cpuidle.C3.usage
   56892498   4% -23.3%   43608427   4%  cpuidle.C6.time
  85034   3% -33.9%  56184   3%  cpuidle.C6.usage
   24373567   -24.5%   18395538cpuidle.POLL.time
 449033   2% -10.8% 400493cpuidle.POLL.usage
   1832+9.3%   2002turbostat.Avg_MHz
   22967645   -12.8%   20032521turbostat.C1
  18.43-4.6   13.85turbostat.C1%
 821328   2% -33.3% 547948turbostat.C1E
   5.21   3%  -1.63.56turbostat.C1E%
 136377   3% -25.3% 101823   3%  turbostat.C3
   1.56   2%  -0.41.20   3%  turbostat.C3%
  84404   3% -34.0%  

Re: [PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-19 Thread robbieko

fdman...@kernel.org 於 2018-06-19 19:31 寫到:

From: Filipe Manana 

Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero") introduced a regression where we no longer
report 0 as the physical offset for inline extents. This is because it
always sets the variable used to report the physical offset ("disko")
as em->block_start plus some offset, and em->block_start has the value
18446744073709551614 ((u64) -2) for inline extents.

This made the btrfs test 004 (from fstests) often fail, for example, 
for
a file with an inline extent we have the following items in the 
subvolume

tree:

item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
   generation 25 transid 38 size 1525 nbytes 1525
   block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
   sequence 0 flags 0x2(none)
   atime 1529342058.461891730 (2018-06-18 18:14:18)
   ctime 1529342058.461891730 (2018-06-18 18:14:18)
   mtime 1529342058.461891730 (2018-06-18 18:14:18)
   otime 1529342055.869892885 (2018-06-18 18:14:15)
item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
   index 25 namelen 3 name: fc7
item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
   generation 38 type 0 (inline)
   inline extent data size 1525 ram_bytes 1525 compression 0 
(none)


Then when test 004 invoked fiemap against the file it got a non-zero
physical offset:

 $ filefrag -v /mnt/p0/d4/d7/fc7
 Filesystem type is: 9123683e
 File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
  ext: logical_offset:physical_offset: length:   expected: 
flags:

0:0..4095: 18446744073709551614..  4093:   4096:
  last,not_aligned,inline,eof
 /mnt/p0/d4/d7/fc7: 1 extent found

This resulted in the test failing like this:

btrfs/004 49s ... [failed, exit status 1]- output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
+++
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad  2018-06-18
18:15:02.385872155 +0100
@@ -1,3 +1,10 @@
 QA output created by 004
 *** test backref walking
-*** done
+./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer
expression expected
+ERROR: 7.55578637259143e+22 is not a valid numeric value.
+unexpected output from
+   /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal
logical-resolve -s 65536 -P 7.55578637259143e+22
/home/fdmanana/btrfs-tests/scratch_1
...
(Run 'diff -u tests/btrfs/004.out
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see
the entire diff)
Ran: btrfs/004

The large number in scientific notation reported as an invalid numeric
value is the result from the filter passed to perl which multiplies the
physical offset by the block size reported by fiemap.

So fix this by ensuring the physical offset is always set to 0 when we
are processing an inline extent.

Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e4a7cdbc9f5..978327d98fc5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4559,6 +4559,7 @@ int extent_fiemap(struct inode *inode, struct
fiemap_extent_info *fieinfo,
end = 1;
flags |= FIEMAP_EXTENT_LAST;
} else if (em->block_start == EXTENT_MAP_INLINE) {
+   disko = 0;
flags |= (FIEMAP_EXTENT_DATA_INLINE |
  FIEMAP_EXTENT_NOT_ALIGNED);
} else if (em->block_start == EXTENT_MAP_DELALLOC) {



EXTENT_MAP_DELALLOC should have the same problem.

em->block_start has some special values. The following values ​​should 
not be considered disko

#define EXTENT_MAP_LAST_BYTE((u64)-4)
#define EXTENT_MAP_HOLE((u64)-3)
#define EXTENT_MAP_INLINE((u64)-2)
#define EXTENT_MAP_DELALLOC((u64)-1)

Is the following change more suitable?
if (em->block_start >= EXTENT_MAP_LAST_BYTE)
disko = 0;
else
disko = em->block_start + offset_in_extent;

Thanks.
Robbie Ko


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem vs original

2018-06-19 Thread Su Yue




On 06/20/2018 10:19 AM, Chris Murphy wrote:

I've retested btrfs-progs 4.17 lowmem mode and still get a bunch of
'referencer count mismatch' that I don't see with original mode.


My bad. The bug is existed for long time since I changed lowmem mode
to traverse all trees.

Due to laziness then time passed, I have lost images you provided and
links are invalid.
It will be so nice of you if a image can be provided again.

I will look for causes immediately.

Thanks for the reports.


e.g.
ERROR: extent[1299275636736, 2654208] referencer count mismatch (root:
2192, owner: 317900, offset: 0) wanted: 5, have: 21

Complete output attached as a text file, since the MUA will mess up
the line wrap.


--
Chris Murphy





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check lowmem vs original

2018-06-19 Thread Chris Murphy
I've retested btrfs-progs 4.17 lowmem mode and still get a bunch of
'referencer count mismatch' that I don't see with original mode.

e.g.
ERROR: extent[1299275636736, 2654208] referencer count mismatch (root:
2192, owner: 317900, offset: 0) wanted: 5, have: 21

Complete output attached as a text file, since the MUA will mess up
the line wrap.


--
Chris Murphy

[chris@f28s ~]$ sudo ~/Applications/btrfs-progs/btrfs --version
btrfs-progs v4.17 
[chris@f28s ~]$ sudo ~/Applications/btrfs-progs/btrfs check --mode=lowmem 
/dev/mapper/first
Checking filesystem on /dev/mapper/first
UUID: 0f43f49d-6e63-4b1b-bc8c-c54da409872d
checking extents
ERROR: extent[1258793181184, 81903616] referencer count mismatch (root: 2192, 
owner: 323075, offset: 0) wanted: 551, have: 625
ERROR: extent[1258899460096, 82878464] referencer count mismatch (root: 2192, 
owner: 323115, offset: 0) wanted: 521, have: 633
ERROR: extent[1259036663808, 64319488] referencer count mismatch (root: 2192, 
owner: 323125, offset: 0) wanted: 376, have: 491
ERROR: extent[1259102384128, 64552960] referencer count mismatch (root: 2192, 
owner: 323137, offset: 0) wanted: 439, have: 493
ERROR: extent[1259171381248, 113295360] referencer count mismatch (root: 2192, 
owner: 323147, offset: 0) wanted: 763, have: 865
ERROR: extent[1259307835392, 2576384] referencer count mismatch (root: 2192, 
owner: 323191, offset: 0) wanted: 9, have: 20
ERROR: extent[1259327455232, 30089216] referencer count mismatch (root: 2192, 
owner: 323203, offset: 0) wanted: 204, have: 230
ERROR: extent[1259372060672, 1974272] referencer count mismatch (root: 2192, 
owner: 323218, offset: 0) wanted: 13, have: 16
ERROR: extent[1259924090880, 22102016] referencer count mismatch (root: 2192, 
owner: 323199, offset: 427819008) wanted: 52, have: 169
ERROR: extent[1259960025088, 2211840] referencer count mismatch (root: 2192, 
owner: 323244, offset: 0) wanted: 11, have: 17
ERROR: extent[1260158959616, 53563392] referencer count mismatch (root: 2192, 
owner: 323328, offset: 0) wanted: 328, have: 409
ERROR: extent[1260258131968, 134217728] referencer count mismatch (root: 2192, 
owner: 323609, offset: 0) wanted: 959, have: 1024
ERROR: extent[1260410531840, 1716224] referencer count mismatch (root: 2192, 
owner: 323625, offset: 0) wanted: 4, have: 14
ERROR: extent[1261505388544, 1839104] referencer count mismatch (root: 2192, 
owner: 323654, offset: 0) wanted: 10, have: 15
ERROR: extent[1261846781952, 56676352] referencer count mismatch (root: 2192, 
owner: 323731, offset: 0) wanted: 354, have: 433
ERROR: extent[1262246526976, 2256896] referencer count mismatch (root: 2192, 
owner: 323750, offset: 0) wanted: 10, have: 18
ERROR: extent[1262765875200, 765952] referencer count mismatch (root: 2192, 
owner: 319045, offset: 0) wanted: 5, have: 6
ERROR: extent[1262984384512, 1642496] referencer count mismatch (root: 2192, 
owner: 322213, offset: 0) wanted: 6, have: 13
ERROR: extent[1263573057536, 1159168] referencer count mismatch (root: 2192, 
owner: 323009, offset: 0) wanted: 2, have: 9
ERROR: extent[1263615098880, 1220608] referencer count mismatch (root: 2192, 
owner: 323062, offset: 0) wanted: 1, have: 10
ERROR: extent[1263647588352, 2310144] referencer count mismatch (root: 2192, 
owner: 330026, offset: 0) wanted: 6, have: 18
ERROR: extent[1263665254400, 3538944] referencer count mismatch (root: 2192, 
owner: 330035, offset: 0) wanted: 16, have: 27
ERROR: extent[1263695650816, 1773568] referencer count mismatch (root: 2192, 
owner: 330054, offset: 0) wanted: 8, have: 14
ERROR: extent[1263894016000, 1536000] referencer count mismatch (root: 2192, 
owner: 319139, offset: 0) wanted: 9, have: 12
ERROR: extent[1263904493568, 1122304] referencer count mismatch (root: 2192, 
owner: 319151, offset: 0) wanted: 3, have: 9
ERROR: extent[1263915298816, 540672] referencer count mismatch (root: 2192, 
owner: 319176, offset: 0) wanted: 1, have: 5
ERROR: extent[1264166404096, 679936] referencer count mismatch (root: 2192, 
owner: 319672, offset: 0) wanted: 3, have: 6
ERROR: extent[1264205729792, 2166784] referencer count mismatch (root: 2192, 
owner: 319706, offset: 0) wanted: 13, have: 17
ERROR: extent[1264443080704, 327680] referencer count mismatch (root: 2192, 
owner: 319874, offset: 0) wanted: 2, have: 3
ERROR: extent[1264474132480, 1277952] referencer count mismatch (root: 2192, 
owner: 319926, offset: 0) wanted: 6, have: 10
ERROR: extent[1264494592000, 1622016] referencer count mismatch (root: 2192, 
owner: 319943, offset: 0) wanted: 12, have: 13
ERROR: extent[1264528855040, 2527232] referencer count mismatch (root: 2192, 
owner: 319965, offset: 0) wanted: 16, have: 20
ERROR: extent[1264556707840, 901120] referencer count mismatch (root: 2192, 
owner: 319974, offset: 0) wanted: 3, have: 7
ERROR: extent[1265866297344, 651264] referencer count mismatch (root: 2192, 
owner: 320062, offset: 0) wanted: 2, have: 5
ERROR: extent[1266110365696, 929792] referencer count mismatch (root: 2192, 
owner: 320074, 

Re: [PATCH] btrfs-progs: Check factor out root parsing from check_chunks_and_extents

2018-06-19 Thread Su Yue




On 06/19/2018 01:34 PM, Nikolay Borisov wrote:



On 19.06.2018 05:18, Su Yue wrote:



On 06/18/2018 07:10 PM, Nikolay Borisov wrote:

check_chunks_and_extents does quite a number of distinct things. The
first of those is going through all root items in the root tree and
classify every root depending on whether they have a dropping operation
in progress or not. Lets factor out this code and move the variables
specific to this in a separate function so clean up
check_chunks_and_extents
a bit. Accidentally, this patch fixes some reference leaks since
in error conditions in the loop the code does "goto out" but at that
label we don't really release the path. Having this code extracted in a
separate function which always releases the path avoids this problem
entirely.

Signed-off-by: Nikolay Borisov 


Code looks okay. One nitpick belows.


---
   check/main.c | 141
+++
   1 file changed, 85 insertions(+), 56 deletions(-)

diff --git a/check/main.c b/check/main.c
index a4d6855dccbf..fb5c86df21c9 100644
--- a/check/main.c
+++ b/check/main.c
@@ -8069,6 +8069,89 @@ static int deal_root_from_list(struct list_head
*list,
   return ret;
   }
   +/**
+ * parse_tree_roots - Go over all roots in the tree root and add each
one to
+ *  a list.
+ *
+ * @fs_info    - pointer to fs_info struct of the file system.
+ *
+ * @normal_trees   - list to contains all roots which don't have a drop
+ * operation in progress
+ *
+ * @dropping_trees - list containing all roots which have a drop
operation
+ * pending
+ *
+ * Returns 0 on success or a negative value indicating an error.
+ *
+ */
+static int parse_tree_roots(struct btrfs_fs_info *fs_info,
+   struct list_head *normal_trees,
+   struct list_head *dropping_trees)
+{
+    struct btrfs_path path;
+    struct btrfs_key key;
+    struct btrfs_key found_key;
+    struct btrfs_root_item ri;
+    struct extent_buffer *leaf;
+    int slot;
+    int ret = 0;
+
+    btrfs_init_path();
+    key.offset = 0;
+    key.objectid = 0;
+    key.type = BTRFS_ROOT_ITEM_KEY;
+    ret = btrfs_search_slot(NULL, fs_info->tree_root, , , 0,
0);
+    if (ret < 0)
+    goto out;
+    while (1) {
+    leaf = path.nodes[0];
+    slot = path.slots[0];
+    if (slot >= btrfs_header_nritems(path.nodes[0])) {
+    ret = btrfs_next_leaf(fs_info->tree_root, );
+    if (ret != 0)
+    break;
+    leaf = path.nodes[0];
+    slot = path.slots[0];
+    }


Nit:
I know the segment is just moved. The @slot is unused.


It's used in the if (slot >= btrfs_header_nritems(path.nodes[0])) check.
I guess the definition could be moved inside the while to reduce the
scope but it's a minor thing.



Since it's so small. (Annoying duplicate mails blame to my personal
gmail.)
So,

Reviewed-by: Su Yue 


+    btrfs_item_key_to_cpu(leaf, _key, path.slots[0]);
+    if (found_key.type == BTRFS_ROOT_ITEM_KEY) {
+    unsigned long offset;
+    u64 last_snapshot;
+    u8 level;
+
+    offset = btrfs_item_ptr_offset(leaf, path.slots[0]);
+    read_extent_buffer(leaf, , offset, sizeof(ri));
+    last_snapshot = btrfs_root_last_snapshot();
+    level = btrfs_root_level();
+    if (btrfs_disk_key_objectid(_progress) == 0) {
+    ret = add_root_item_to_list(normal_trees,
+    found_key.objectid,
+    btrfs_root_bytenr(),
+    last_snapshot, level,
+    0, NULL);
+    if (ret < 0)
+    break;
+    } else {
+    u64 objectid = found_key.objectid;
+    btrfs_disk_key_to_cpu(_key,
+  _progress);
+    ret = add_root_item_to_list(dropping_trees,
+    objectid,
+    btrfs_root_bytenr(),
+    last_snapshot, level,
+    ri.drop_level, _key);
+    if (ret < 0)
+    break;
+    }
+    }
+    path.slots[0]++;
+    }
+
+out:
+    btrfs_release_path();
+    return ret;
+}
+
   static int check_chunks_and_extents(struct btrfs_fs_info *fs_info)
   {
   struct rb_root dev_cache;
@@ -8082,20 +8165,13 @@ static int check_chunks_and_extents(struct
btrfs_fs_info *fs_info)
   struct cache_tree nodes;
   struct extent_io_tree excluded_extents;
   struct cache_tree corrupt_blocks;
-    struct btrfs_path path;
-    struct btrfs_key key;
-    struct btrfs_key found_key;
   int ret, err = 0;
   struct block_info *bits;
   int bits_nr;
-    struct extent_buffer *leaf;
-    int slot;
-    struct btrfs_root_item ri;
   struct list_head dropping_trees;
   struct list_head normal_trees;
   struct btrfs_root *root1;
   struct btrfs_root *root;
-    u64 

[PATCH v2 1/2] btrfs-progs: Fix wrong optind re-initialization to allow mixed option and non-option

2018-06-19 Thread Qu Wenruo
In function handle_global_options(), we reset @optind to 1.
However according to man page of getopt(3) NOTES section, if we need to
rescan options later, @optind should be reset to 0 to initialize the
internal variables correctly.

This explains the reason why in cmd_check(), getopt_long() doesn't
handle the following command correctly:
"btrfs check /dev/data/btrfs --check-data-csum"

While mkfs.btrfs handles mixed non-option and option correctly:
"mkfs.btrfs -f /dev/data/disk1 --data raid1 /dev/data/disk2"

Cc: Paul Jones 
Cc: Hugo Mills 
Fixes: 010ceab56e06 ("btrfs-progs: rework option parser to use getopt for 
global options")
Signed-off-by: Qu Wenruo 
---
changelog:
v2:
  Instead of resetting @optind at handle_global_options(), reset @optind
  before each later getopt() call. Since there are cases uses @optind and
  expects it to be starting from 1.
---
 cmds-balance.c| 2 ++
 cmds-device.c | 3 +++
 cmds-fi-du.c  | 1 +
 cmds-fi-usage.c   | 1 +
 cmds-filesystem.c | 2 ++
 cmds-inspect-dump-tree.c  | 1 +
 cmds-inspect-tree-stats.c | 1 +
 cmds-inspect.c| 3 +++
 cmds-qgroup.c | 3 +++
 cmds-quota.c  | 1 +
 cmds-receive.c| 1 +
 cmds-replace.c| 3 +++
 cmds-rescue.c | 2 ++
 cmds-restore.c| 1 +
 cmds-send.c   | 1 +
 cmds-subvolume.c  | 6 ++
 16 files changed, 32 insertions(+)

diff --git a/cmds-balance.c b/cmds-balance.c
index 0c91bdf13ada..6cc26c358f95 100644
--- a/cmds-balance.c
+++ b/cmds-balance.c
@@ -528,6 +528,7 @@ static int cmd_balance_start(int argc, char **argv)
 
memset(, 0, sizeof(args));
 
+   optind = 0;
while (1) {
enum { GETOPT_VAL_FULL_BALANCE = 256,
GETOPT_VAL_BACKGROUND = 257 };
@@ -831,6 +832,7 @@ static int cmd_balance_status(int argc, char **argv)
int verbose = 0;
int ret;
 
+   optind = 0;
while (1) {
int opt;
static const struct option longopts[] = {
diff --git a/cmds-device.c b/cmds-device.c
index 86459d1b9564..cd689b5017c8 100644
--- a/cmds-device.c
+++ b/cmds-device.c
@@ -57,6 +57,7 @@ static int cmd_device_add(int argc, char **argv)
int force = 0;
int last_dev;
 
+   optind = 0;
while (1) {
int c;
static const struct option long_options[] = {
@@ -267,6 +268,7 @@ static int cmd_device_scan(int argc, char **argv)
int all = 0;
int ret = 0;
 
+   optind = 0;
while (1) {
int c;
static const struct option long_options[] = {
@@ -403,6 +405,7 @@ static int cmd_device_stats(int argc, char **argv)
__u64 flags = 0;
DIR *dirstream = NULL;
 
+   optind = 0;
while (1) {
int c;
static const struct option long_options[] = {
diff --git a/cmds-fi-du.c b/cmds-fi-du.c
index 7e6bb7f6a570..4e639f6dc231 100644
--- a/cmds-fi-du.c
+++ b/cmds-fi-du.c
@@ -565,6 +565,7 @@ int cmd_filesystem_du(int argc, char **argv)
 
unit_mode = get_unit_mode_from_arg(, argv, 1);
 
+   optind = 0;
while (1) {
static const struct option long_options[] = {
{ "summarize", no_argument, NULL, 's'},
diff --git a/cmds-fi-usage.c b/cmds-fi-usage.c
index de7ad668ba2d..cbb3cd191b3d 100644
--- a/cmds-fi-usage.c
+++ b/cmds-fi-usage.c
@@ -975,6 +975,7 @@ int cmd_filesystem_usage(int argc, char **argv)
 
unit_mode = get_unit_mode_from_arg(, argv, 1);
 
+   optind = 0;
while (1) {
int c;
 
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 30a50bf55e38..06c8311bdfe3 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -685,6 +685,7 @@ static int cmd_filesystem_show(int argc, char **argv)
 
unit_mode = get_unit_mode_from_arg(, argv, 0);
 
+   optind = 0;
while (1) {
int c;
static const struct option long_options[] = {
@@ -924,6 +925,7 @@ static int cmd_filesystem_defrag(int argc, char **argv)
defrag_global_errors = 0;
defrag_global_verbose = 0;
defrag_global_errors = 0;
+   optind = 0;
while(1) {
int c = getopt(argc, argv, "vrc::fs:l:t:");
if (c < 0)
diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index 92a2a45b267e..c8acd55a0c3a 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -235,6 +235,7 @@ int cmd_inspect_dump_tree(int argc, char **argv)
 * tree blocks as possible.
 */
open_ctree_flags = OPEN_CTREE_PARTIAL | OPEN_CTREE_NO_BLOCK_GROUPS;
+   optind = 0;
while (1) {
int c;
enum { GETOPT_VAL_FOLLOW = 256 };
diff --git a/cmds-inspect-tree-stats.c b/cmds-inspect-tree-stats.c
index eced0db9f840..dfa34c52ff5f 100644
--- a/cmds-inspect-tree-stats.c
+++ 

[PATCH 2/2] btrfs-progs: misc-tests: Fix 029 test cases for sudo test environment

2018-06-19 Thread Qu Wenruo
Test misc/029 only works if the test case is executed as root, while for
sudo usage, it doesn't work as initial mkdir and final cleanup doesn't
use $SUDO_HELPER.

Add "run_check $SUDO_HELPER" for such cases to allow it works under sudo
usage.

Signed-off-by: Qu Wenruo 
---
 tests/misc-tests/029-send-p-different-mountpoints/test.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/misc-tests/029-send-p-different-mountpoints/test.sh 
b/tests/misc-tests/029-send-p-different-mountpoints/test.sh
index 0b42b772e3d7..90465e1d08fd 100755
--- a/tests/misc-tests/029-send-p-different-mountpoints/test.sh
+++ b/tests/misc-tests/029-send-p-different-mountpoints/test.sh
@@ -14,7 +14,7 @@ prepare_test_dev
 SUBVOL_MNT="$TEST_MNT/subvol"
 TOPLEVEL_MNT="$TEST_MNT/toplevel"
 TEST_MNT="$TOPLEVEL_MNT"
-mkdir -p "$TOPLEVEL_MNT" "$SUBVOL_MNT"
+run_check $SUDO_HELPER mkdir -p "$TOPLEVEL_MNT" "$SUBVOL_MNT"
 
 run_check $SUDO_HELPER "$TOP/mkfs.btrfs" -f "$TEST_DEV"
 run_check_mount_test_dev
@@ -47,5 +47,5 @@ run_mustfail_stdout "send -p on 2 mount points" \
 run_check_umount_test_dev "$SUBVOL_MNT"
 run_check_umount_test_dev "$TOPLEVEL_MNT"
 
-rmdir "$SUBVOL_MNT"
-rmdir "$TOPLEVEL_MNT"
+run_check $SUDO_HELPER rmdir "$SUBVOL_MNT"
+run_check $SUDO_HELPER rmdir "$TOPLEVEL_MNT"
-- 
2.17.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About more loose parameter sequence requirement

2018-06-19 Thread Hans van Kranenburg
On 06/18/2018 03:05 PM, Qu Wenruo wrote:
> 
> 
> On 2018年06月18日 20:02, David Sterba wrote:
>> On Mon, Jun 18, 2018 at 07:40:44PM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2018年06月18日 19:34, David Sterba wrote:
 On Thu, Jun 14, 2018 at 03:17:45PM +0800, Qu Wenruo wrote:
> I understand that btrfs-progs introduced restrict parameter/option order
> to distinguish global and sub-command parameter/option.
>
> However it's really annoying if one just want to append some new options
> to previous command:
>
> E.g.
> # btrfs check /dev/data/btrfs
> # !! --check-data-csum
>
> The last command will fail as current btrfs-progs doesn't allow any
> option after parameter.
>
>
> Despite the requirement to distinguish global and subcommand
> option/parameter, is there any other requirement for such restrict
> option-first-parameter-last policy?

 I'd say that it's a common and recommended pattern. Getopt is able to
 reorder the parameters so mixed options and non-options are accepted,
 unless POSIXLY_CORRECT (see man getopt(3)) is not set. With the more
 strict requirement, 'btrfs' option parser works the same regardless of
 that.
>>>
>>> But currently it doesn't work.
>>> Just as the example, btrfs check /dev/data/btrfs --check-data-sum won't
>>> work.
>>> It's different from a lot of other common programs.
>>
>> With POSIXLY_CORRECT set, it expectedly won't work. With POSIXLY_CORRECT
>> unset, it would work in general, but not for 'btrfs'.
>>
>> As this is per-application decision I find it ok, besides that I also
>> find the 'options anywhere' pattern bad. Does it really save you that
>> much time while typing the commands with new arguments?
> 
> Personally speaking, yes.
> 
>> There are
>> movement shortcuts to jump by words, you get the previous command by
>> up-arrow. About the same number of keystrokes.
> 
> Not really, normally just "!! ", where I don't even need to
> move my fingers to arrow keys.
> (I'm all ears about extra bash shortcuts on this)

It's called `set -o vi` ;-]

If you remap capslock to be escape, there's not much moving around of
fingers going on. CAPS-k-b-b-a and type your new option. No reaching to
shift and 1 necessary at all.

:]

>>
>> New code needs to be tested, documented and maintained, that's the cost
>> I find too high for something that's convenience for people who are used
>> to some shell builtins.
> 
> The biggest problem is, the behavior isn't even consistent across
> btrfs-progs.
> mkfs.btrfs accept such out-of-order parameters while btrfs not.
> 
> And most common tools, like commands provided by coretutil, they don't
> care about the order.
> The only daily exception is 'scp', which I found it pretty unhandy.
> 
> And just as Paul and Hugo, I think there are quite some users preferring
> out-of-order parameter/options.
> 
> 
> I also understand the maintenance burden, but at least let's try if
> there are better solution for this.
> 
> Thanks,
> Qu
> 


-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About more loose parameter sequence requirement

2018-06-19 Thread Qu Wenruo


On 2018年06月20日 07:18, Qu Wenruo wrote:
> 
> 
> On 2018年06月19日 22:47, David Sterba wrote:
>> On Mon, Jun 18, 2018 at 09:05:59PM +0800, Qu Wenruo wrote:
 New code needs to be tested, documented and maintained, that's the cost
 I find too high for something that's convenience for people who are used
 to some shell builtins.
>>>
>>> The biggest problem is, the behavior isn't even consistent across
>>> btrfs-progs.
>>> mkfs.btrfs accept such out-of-order parameters while btrfs not.
>>>
>>> And most common tools, like commands provided by coretutil, they don't
>>> care about the order.
>>> The only daily exception is 'scp', which I found it pretty unhandy.
>>>
>>> And just as Paul and Hugo, I think there are quite some users preferring
>>> out-of-order parameter/options.
>>
>> Because of the feedback and interest of the relaxed mixed
>> option/argument syntax, I don't object anymore.
> 
> And in fact, the behavior of btrfs is caused by @optind wrongly initialized.
> The fix is just one line (along some comments).
> 
> https://patchwork.kernel.org/patch/10473173/
> 
> So no or little pressure on maintenance.

I'm totally wrong.
Since there are cases we call check_argc_*() directly using @optind,
reinitialize it to 0 would cause them to fail.

I'll update the patch to manually reset optind to 0 only for cases we
call getopt().
And indeed, it adds extra maintenance effort.

Thanks,
Qu

> 
> Thanks,
> Qu
> 



signature.asc
Description: OpenPGP digital signature


Re: RAID56

2018-06-19 Thread waxhead

Gandalf Corvotempesta wrote:

Another kernel release was made.
Any improvements in RAID56?

I didn't see any changes in that sector, is something still being
worked on or it's stuck waiting for something ?

Based on official BTRFS status page, RAID56 is the only "unstable"
item marked in red.
No interested from Suse in fixing that?

I think it's the real missing part for a feature-complete filesystem.
Nowadays parity raid is mandatory, we can't only rely on mirroring.


First of all: I am not a BTRFS developer, but I follow the mailing list 
closely and I too have a particular interest in the "RAID"5/6 feature 
which realistically is probably about 3-4 years (if not more) in the future.


From what I am able to understand the pesky write hole is one of the 
major obstacles for having BTRFS "RAID"5/6 work reliably. There was 
patches to fix this a while ago, but if these patches are to be 
classified as a workaround or actually as "the darn thing done right" is 
perhaps up for discussion.


In general there seems to be a lot more momentum on the "RAID"5/6 
feature now compared to earlier. There also seem to be a lot of focus on 
fixing bugs and running tests as well. This is why I am guessing that 
3-4 years ahead is a absolute minimum until "RAID"5/6 might be somewhat 
reliable and usable.


There are a few other basics missing that may be acceptable for you as 
long as you know about it. For example as far as I know BTRFS does still 
not use the "device-id" or "BTRFS internal number" for storage devices 
to keep track of the storage device.


This means that if you have a multi storage device filesystem with for 
example /dev/sda /dev/sdb /dev/sdc etc... and /dev/sdc disappears and 
show up again as /dev/sdx then BTRFS would not recoginize this and 
happily try to continue to write on /dev/sdc even if it does not exist.


...and perhaps even worse - I can imagine that if you swap device 
ordering and a different device takes /dev/sdc's place then BTRFS 
*could* overwrite data on this device - possibly making a real mess of 
things. I am not sure if this holds true, but if it does it's for sure a 
real nugget of basic functionality missing right there.


BTRFS also so far have no automatic "drop device" function e.g. it will 
not automatically kick out a storage device that is throwing lots of 
errors and causing delays etc. There may be benefits to keeping this 
design of course, but for some dropping the device might be desirable.


And no hot-spare "or hot-(reserved-)space" (which would be more accurate 
in BTRFS terms) is implemented either, and that is one good reason to 
keep an eye on your storage pool.


What you *might* consider is to have your metadata in "RAID"1 or 
"RAID"10 and your data in "RAID5" or even "RAID6" so that if you run 
into problems then you might in worst case loose some data, but since 
"RAID"1/10 is beginning to be rather mature then it is likely that your 
filesystem might survive a disk failure.


So if you are prepared to perhaps loose a file or two, but want to feel 
confident that your filesystem is surviving and will give you a report 
about what file(s) are toast then this may be acceptable for you as you 
can always restore from backups (because you do have backups right? If 
not, read 'any' of Duncan's posts - he explains better than most people 
why you need and should have backups!)


Now keep in mind that this is just a humble users analysis of the 
situation based on whatever I have picked up from the mailing list which 
may or may not be entirely accurate so take it for what it is!

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About more loose parameter sequence requirement

2018-06-19 Thread Qu Wenruo


On 2018年06月19日 22:47, David Sterba wrote:
> On Mon, Jun 18, 2018 at 09:05:59PM +0800, Qu Wenruo wrote:
>>> New code needs to be tested, documented and maintained, that's the cost
>>> I find too high for something that's convenience for people who are used
>>> to some shell builtins.
>>
>> The biggest problem is, the behavior isn't even consistent across
>> btrfs-progs.
>> mkfs.btrfs accept such out-of-order parameters while btrfs not.
>>
>> And most common tools, like commands provided by coretutil, they don't
>> care about the order.
>> The only daily exception is 'scp', which I found it pretty unhandy.
>>
>> And just as Paul and Hugo, I think there are quite some users preferring
>> out-of-order parameter/options.
> 
> Because of the feedback and interest of the relaxed mixed
> option/argument syntax, I don't object anymore.

And in fact, the behavior of btrfs is caused by @optind wrongly initialized.
The fix is just one line (along some comments).

https://patchwork.kernel.org/patch/10473173/

So no or little pressure on maintenance.

Thanks,
Qu



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 1/3] btrfs: Remove fs_info argument from __btrfs_inc_extent_ref

2018-06-19 Thread Nikolay Borisov



On 19.06.2018 22:31, Jeff Mahoney wrote:
> I like the idea here.  I wasn't sold at first, but I think if we can
> standardize on taking only a trans handle when one is required and both
> a trans and fs_info when it's optional, it'll make the code clearer.
> This cleanup can percolate up the stack to cover pretty much all of
> delayed refs.

I have a 25-something patches which do exactly this. Still WIP, will
likely send it tomorrow.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: Remove fs_info argument from __btrfs_inc_extent_ref

2018-06-19 Thread Jeff Mahoney
On 6/18/18 7:59 AM, Nikolay Borisov wrote:
> This function already takes a transaction which holds a reference to
> the fs_info struct. Use that reference and remove the extra arg. No
> functional changes.

I like the idea here.  I wasn't sold at first, but I think if we can
standardize on taking only a trans handle when one is required and both
a trans and fs_info when it's optional, it'll make the code clearer.
This cleanup can percolate up the stack to cover pretty much all of
delayed refs.

Reviewed-by: Jeff Mahoney 

> Signed-off-by: Nikolay Borisov 
> ---
>  fs/btrfs/extent-tree.c | 15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 4850e538ab10..59645ced6fbc 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -2208,12 +2208,12 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle 
> *trans,
>  }
>  
>  static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
> -   struct btrfs_fs_info *fs_info,
> struct btrfs_delayed_ref_node *node,
> u64 parent, u64 root_objectid,
> u64 owner, u64 offset, int refs_to_add,
> struct btrfs_delayed_extent_op *extent_op)
>  {
> + struct btrfs_fs_info *fs_info = trans->fs_info;
>   struct btrfs_path *path;
>   struct extent_buffer *leaf;
>   struct btrfs_extent_item *item;
> @@ -2297,10 +2297,9 @@ static int run_delayed_data_ref(struct 
> btrfs_trans_handle *trans,
>ref->objectid, ref->offset,
>, node->ref_mod);
>   } else if (node->action == BTRFS_ADD_DELAYED_REF) {
> - ret = __btrfs_inc_extent_ref(trans, fs_info, node, parent,
> -  ref_root, ref->objectid,
> -  ref->offset, node->ref_mod,
> -  extent_op);
> + ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root,
> +  ref->objectid, ref->offset,
> +  node->ref_mod, extent_op);
>   } else if (node->action == BTRFS_DROP_DELAYED_REF) {
>   ret = __btrfs_free_extent(trans, fs_info, node, parent,
> ref_root, ref->objectid,
> @@ -2450,10 +2449,8 @@ static int run_delayed_tree_ref(struct 
> btrfs_trans_handle *trans,
>   BUG_ON(!extent_op || !extent_op->update_flags);
>   ret = alloc_reserved_tree_block(trans, node, extent_op);
>   } else if (node->action == BTRFS_ADD_DELAYED_REF) {
> - ret = __btrfs_inc_extent_ref(trans, fs_info, node,
> -  parent, ref_root,
> -  ref->level, 0, 1,
> -  extent_op);
> + ret = __btrfs_inc_extent_ref(trans, node, parent, ref_root,
> +  ref->level, 0, 1, extent_op);
>   } else if (node->action == BTRFS_DROP_DELAYED_REF) {
>   ret = __btrfs_free_extent(trans, fs_info, node,
> parent, ref_root,
> 


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-19 Thread David Sterba
On Tue, Jun 19, 2018 at 12:31:42PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> So fix this by ensuring the physical offset is always set to 0 when we
> are processing an inline extent.
> 
> Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count 
> is zero")
> Signed-off-by: Filipe Manana 

Added to 4.18 queue, thanks.

This is a fix for a patch that was in for-next for a few weeks but the
bug was discovered only after the patch got merged to master. I wonder
if there's something to be improved in the patch flow.

I think it should take less time to catch bugs while the patches are
unmerged, either in the mailinglist or in the development branches.
Post-merge fixes will happen of course, but in this particular case it
looks like something that slipped too easily. I did the "fallback"
review and checked that tests regarding fiemap pass, the occasional
failure you mention has not happened on any of my testing setups.

There's another patch for fiemap that seems to have bigger impact and
for that reason I have postponed merging it to 4.18 unlike the first
one, but now I think this requires a testcase.

https://patchwork.kernel.org/patch/10383491/

The patch will be in for-next topic branch until then.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H

2018-06-19 Thread Austin S. Hemmelgarn

On 2018-06-19 12:30, james harvey wrote:

On Tue, Jun 19, 2018 at 11:47 AM, Marc MERLIN  wrote:

On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote:

So, I ran this:
gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v .  &
[1] 24450
Dumping filters: flags 0x1, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=60
gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done
0 out of about 0 chunks balanced (0 considered), -nan% left


This (0/0/0, -nan%) seems alarming.  I had this output once when the
system spontaneously rebooted during a balance.  I didn't have any bad
effects afterward.


Balance on '.' is running
0 out of about 73 chunks balanced (2 considered), 100% left
Balance on '.' is running

After about 20mn, it changed to this:
1 out of about 73 chunks balanced (6724 considered),  99% left


This seems alarming.  I wouldn't think # considered should ever exceed
# chunks.  Although, it does say "about", so maybe it can a little
bit, but I wouldn't expect it to exceed it by this much.
Actually, output like this is not unusual.  In the above line, the 1 is 
how many chunks have been actually processed, the 73 is how many the 
command expects to process (that is, the count of chunks that fit the 
filtering requirements, in this case, ones which are 60% or less full), 
and the 6724 is how many it has checked against the filtering 
requirements.  So, if you've got a very large number of chunks, and are 
selecting a small number with filters, then the considered value is 
likely to be significantly higher than the first two.



Balance on '.' is running

Now, 12H later, it's still there, only 1 out of 73.

gargamel:/mnt/btrfs_pool2# btrfs fi show .
Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
 Total devices 1 FS bytes used 12.72TiB
 devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2

gargamel:/mnt/btrfs_pool2# btrfs fi df .
Data, single: total=13.57TiB, used=12.60TiB
System, DUP: total=32.00MiB, used=1.55MiB
Metadata, DUP: total=121.50GiB, used=116.53GiB
GlobalReserve, single: total=512.00MiB, used=848.00KiB

kernel: 4.16.8

Is that expected? Should I be ready to wait days possibly for this
balance to finish?


It's now beeen 2 days, and it's still stuck at 1%
1 out of about 73 chunks balanced (6724 considered),  99% left


First, my disclaimer.  I'm not a btrfs developer, and although I've
ran balance many times, I haven't really studied its output beyond the
% left.  I don't know why it says "about", and I don't know if it
should ever be that far off.

In your situation, I would run "btrfs pause ", wait to hear from
a btrfs developer, and not use the volume whatsoever in the meantime.
I would say this is probably good advice.  I don't really know what's 
going on here myself actually, though it looks like the balance got 
stuck (the output hasn't changed for over 36 hours, unless you've got an 
insanely slow storage array, that's extremely unusual (it should only be 
moving at most 3GB of data per chunk)).


That said, I would question the value of repacking chunks that are 
already more than half full.  Anything above a 50% usage filter 
generally takes a long time, and has limited value in most cases (higher 
values are less likely to reduce the total number of allocated chunks). 
With `-duszge=50` or less, you're guaranteed to reduce the number of 
chunk if at least two match, and it isn't very time consuming for the 
allocator, all because you can pack at least two matching chunks into 
one 'new' chunk (new in quotes because it may re-pack them into existing 
slack space on the FS).  Additionally, `-dusage=50` is usually 
sufficient to mitigate the typical ENOSPC issues that regular balancing 
is supposed to help with.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H

2018-06-19 Thread james harvey
On Tue, Jun 19, 2018 at 11:47 AM, Marc MERLIN  wrote:
> On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote:
>> So, I ran this:
>> gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v .  &
>> [1] 24450
>> Dumping filters: flags 0x1, state 0x0, force is off
>>   DATA (flags 0x2): balancing, usage=60
>> gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done
>> 0 out of about 0 chunks balanced (0 considered), -nan% left

This (0/0/0, -nan%) seems alarming.  I had this output once when the
system spontaneously rebooted during a balance.  I didn't have any bad
effects afterward.

>> Balance on '.' is running
>> 0 out of about 73 chunks balanced (2 considered), 100% left
>> Balance on '.' is running
>>
>> After about 20mn, it changed to this:
>> 1 out of about 73 chunks balanced (6724 considered),  99% left

This seems alarming.  I wouldn't think # considered should ever exceed
# chunks.  Although, it does say "about", so maybe it can a little
bit, but I wouldn't expect it to exceed it by this much.

>> Balance on '.' is running
>>
>> Now, 12H later, it's still there, only 1 out of 73.
>>
>> gargamel:/mnt/btrfs_pool2# btrfs fi show .
>> Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
>> Total devices 1 FS bytes used 12.72TiB
>> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
>>
>> gargamel:/mnt/btrfs_pool2# btrfs fi df .
>> Data, single: total=13.57TiB, used=12.60TiB
>> System, DUP: total=32.00MiB, used=1.55MiB
>> Metadata, DUP: total=121.50GiB, used=116.53GiB
>> GlobalReserve, single: total=512.00MiB, used=848.00KiB
>>
>> kernel: 4.16.8
>>
>> Is that expected? Should I be ready to wait days possibly for this
>> balance to finish?
>
> It's now beeen 2 days, and it's still stuck at 1%
> 1 out of about 73 chunks balanced (6724 considered),  99% left

First, my disclaimer.  I'm not a btrfs developer, and although I've
ran balance many times, I haven't really studied its output beyond the
% left.  I don't know why it says "about", and I don't know if it
should ever be that far off.

In your situation, I would run "btrfs pause ", wait to hear from
a btrfs developer, and not use the volume whatsoever in the meantime.
I can make some guesses where to go from here, but won't, as I don't
want to screw things up for you.

What version of btrfs-progs do you have?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs balance did not progress after 12H

2018-06-19 Thread Marc MERLIN
On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote:
> So, I ran this:
> gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v .  &
> [1] 24450
> Dumping filters: flags 0x1, state 0x0, force is off
>   DATA (flags 0x2): balancing, usage=60
> gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done
> 0 out of about 0 chunks balanced (0 considered), -nan% left
> Balance on '.' is running
> 0 out of about 73 chunks balanced (2 considered), 100% left
> Balance on '.' is running
> 
> After about 20mn, it changed to this:
> 1 out of about 73 chunks balanced (6724 considered),  99% left
> Balance on '.' is running
> 
> Now, 12H later, it's still there, only 1 out of 73.
> 
> gargamel:/mnt/btrfs_pool2# btrfs fi show .
> Label: 'dshelf2'  uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
> Total devices 1 FS bytes used 12.72TiB
> devid1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
> 
> gargamel:/mnt/btrfs_pool2# btrfs fi df .
> Data, single: total=13.57TiB, used=12.60TiB
> System, DUP: total=32.00MiB, used=1.55MiB
> Metadata, DUP: total=121.50GiB, used=116.53GiB
> GlobalReserve, single: total=512.00MiB, used=848.00KiB
> 
> kernel: 4.16.8
> 
> Is that expected? Should I be ready to wait days possibly for this
> balance to finish?
 
It's now beeen 2 days, and it's still stuck at 1%
1 out of about 73 chunks balanced (6724 considered),  99% left

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID56

2018-06-19 Thread Gandalf Corvotempesta
Another kernel release was made.
Any improvements in RAID56?

I didn't see any changes in that sector, is something still being
worked on or it's stuck waiting for something ?

Based on official BTRFS status page, RAID56 is the only "unstable"
item marked in red.
No interested from Suse in fixing that?

I think it's the real missing part for a feature-complete filesystem.
Nowadays parity raid is mandatory, we can't only rely on mirroring.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About more loose parameter sequence requirement

2018-06-19 Thread David Sterba
On Mon, Jun 18, 2018 at 09:05:59PM +0800, Qu Wenruo wrote:
> > New code needs to be tested, documented and maintained, that's the cost
> > I find too high for something that's convenience for people who are used
> > to some shell builtins.
> 
> The biggest problem is, the behavior isn't even consistent across
> btrfs-progs.
> mkfs.btrfs accept such out-of-order parameters while btrfs not.
> 
> And most common tools, like commands provided by coretutil, they don't
> care about the order.
> The only daily exception is 'scp', which I found it pretty unhandy.
> 
> And just as Paul and Hugo, I think there are quite some users preferring
> out-of-order parameter/options.

Because of the feedback and interest of the relaxed mixed
option/argument syntax, I don't object anymore.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs: fix race between mkfs and mount

2018-06-19 Thread David Sterba
On Thu, Jun 07, 2018 at 06:39:32PM +0200, David Sterba wrote:
> On Mon, Jun 04, 2018 at 11:00:30PM +0800, Anand Jain wrote:
> > In an instrumented testing it is possible that the mount and
> > a newer mkfs.btrfs thread on the same device can race and if the new
> > mkfs.btrfs wins it will free the older fs_devices, then the mount thread
> > will lead to oops.
> > 
> > Thread1 Thread2
> > --- ---
> > mkfs.btrfs -fq /dev/sdb
> > mount /dev/sdb /btrfs
> > |_btrfs_mount_root()
> >   |_btrfs_scan_one_device(... _devices)
> > 
> > mkfs.btrfs -fq /dev/sdb
> > |_btrfs_contol_ioctl()
> >   |_btrfs_scan_one_device(... 
> > _devices)
> > |_::
> >   
> > |_btrfs_free_stale_devices()
> > 
> >   |_btrfs_open_devices(fs_devices ..) <-- stale fs_devices.
> > 
> > Fix this with a mutually exclusive flag BTRFS_VOL_FLAG_EXCL_OPS.
> 
> I'm not sure this is the right way to fix it, adding another bit to the
> uuid_mutex and device_list_mutex combo.
> 
> To fix the race between mount and mkfs we can add a bit of logic to the
> device scanning so that mount calling scan will track the purpose and
> mkfs scan will not be allowed to free that device.

Last version of the proposed fix is to extend the uuid_mutex over the
whole mount callback and use it around calls to btrfs_scan_one_device.
That way we'll be sure the mount will always get to the device it
scanned and that will not be freed by the parallel scan.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Btrfs: fix return value on rename exchange failure

2018-06-19 Thread David Sterba
On Mon, Jun 11, 2018 at 07:24:16PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> If we failed during a rename exchange operation after starting/joining a
> transaction, we would end up replacing the return value, stored in the
> local 'ret' variable, with the return value from btrfs_end_transaction().
> So this could end up returning 0 (success) to user space despite the
> operation having failed and aborted the transaction, because if there are
> multiple tasks having a reference on the transaction at the time
> btrfs_end_transaction() is called by the rename exchange, that function
> returns 0 (otherwise it returns -EIO and not the original error value).
> So fix this by not overwriting the return value on error after getting
> a transaction handle.
> 
> Signed-off-by: Filipe Manana 

1 and 2 queued for 4.18, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Cleanups around extent increment

2018-06-19 Thread David Sterba
On Mon, Jun 18, 2018 at 02:59:23PM +0300, Nikolay Borisov wrote:
> This series improves the functions involved around extent reference 
> increment. 
> The first patch just removes a redundant argument, the second one documents 
> the 
> parameters of __btrfs_inc_extent_ref. It can be considered v2 of the 
> standalone
> version which Jeff had some input to. The final patch fixes a comment in 
> lookup_inline_extent_backref which transpired while Jeff was revieweing the 
> documentation patch. 
> 
> Nikolay Borisov (3):
>   btrfs: Remove fs_info argument from __btrfs_inc_extent_ref

>   btrfs: Document __btrfs_inc_extent_ref
>   btrfs: Fix comment in lookup_inline_extent_backref

2 and 3 added to misc-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix invalid-free in btrfs_extent_same

2018-06-19 Thread David Sterba
On Tue, Jun 19, 2018 at 02:54:38PM +0800, Lu Fengqi wrote:
> If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
>  (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
> is hit, we will go to free the uninitialized cmp.src_pages and
> cmp.dst_pages.
> 
> Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
> Signed-off-by: Lu Fengqi 
> ---
>  fs/btrfs/ioctl.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index c2837a32d689..43ecbe620dea 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -3577,7 +3577,7 @@ static int btrfs_extent_same(struct inode *src, u64 
> loff, u64 olen,
>   ret = btrfs_extent_same_range(src, loff, BTRFS_MAX_DEDUPE_LEN,
> dst, dst_loff, );
>   if (ret)
> - goto out_unlock;
> + goto out_free;
>  
>   loff += BTRFS_MAX_DEDUPE_LEN;
>   dst_loff += BTRFS_MAX_DEDUPE_LEN;
> @@ -3587,16 +3587,16 @@ static int btrfs_extent_same(struct inode *src, u64 
> loff, u64 olen,
>   ret = btrfs_extent_same_range(src, loff, tail_len, dst,
> dst_loff, );

The labels now switch order and there's one more 'goto out_free' that
actually also wants to unlock the pages, after error of
btrfs_extent_same_range in the for loop. So this needs to be update too.

>  
> +out_free:
> + kvfree(cmp.src_pages);
> + kvfree(cmp.dst_pages);
> +
>  out_unlock:
>   if (same_inode)
>   inode_unlock(src);
>   else
>   btrfs_double_inode_unlock(src, dst);
>  
> -out_free:
> - kvfree(cmp.src_pages);
> - kvfree(cmp.dst_pages);
> -
>   return ret;
>  }
>  
> -- 
> 2.17.1
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs: Remove fs_info argument from __btrfs_inc_extent_ref

2018-06-19 Thread Nikolay Borisov



On 18.06.2018 14:59, Nikolay Borisov wrote:
> This function already takes a transaction which holds a reference to
> the fs_info struct. Use that reference and remove the extra arg. No
> functional changes.
> 
> Signed-off-by: Nikolay Borisov 
> --- 

Please ignore this patch, since I'm going to re-send it as apart of a
larger series dealing specifically with fs_info cleanup. The other 2 are
good.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-19 Thread fdmanana
From: Filipe Manana 

Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero") introduced a regression where we no longer
report 0 as the physical offset for inline extents. This is because it
always sets the variable used to report the physical offset ("disko")
as em->block_start plus some offset, and em->block_start has the value
18446744073709551614 ((u64) -2) for inline extents.

This made the btrfs test 004 (from fstests) often fail, for example, for
a file with an inline extent we have the following items in the subvolume
tree:

item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
   generation 25 transid 38 size 1525 nbytes 1525
   block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
   sequence 0 flags 0x2(none)
   atime 1529342058.461891730 (2018-06-18 18:14:18)
   ctime 1529342058.461891730 (2018-06-18 18:14:18)
   mtime 1529342058.461891730 (2018-06-18 18:14:18)
   otime 1529342055.869892885 (2018-06-18 18:14:15)
item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
   index 25 namelen 3 name: fc7
item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
   generation 38 type 0 (inline)
   inline extent data size 1525 ram_bytes 1525 compression 0 (none)

Then when test 004 invoked fiemap against the file it got a non-zero
physical offset:

 $ filefrag -v /mnt/p0/d4/d7/fc7
 Filesystem type is: 9123683e
 File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
  ext: logical_offset:physical_offset: length:   expected: flags:
0:0..4095: 18446744073709551614..  4093:   4096:
 last,not_aligned,inline,eof
 /mnt/p0/d4/d7/fc7: 1 extent found

This resulted in the test failing like this:

btrfs/004 49s ... [failed, exit status 1]- output mismatch (see 
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad  
2018-06-18 18:15:02.385872155 +0100
@@ -1,3 +1,10 @@
 QA output created by 004
 *** test backref walking
-*** done
+./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression 
expected
+ERROR: 7.55578637259143e+22 is not a valid numeric value.
+unexpected output from
+   /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal 
logical-resolve -s 65536 -P 7.55578637259143e+22 
/home/fdmanana/btrfs-tests/scratch_1
...
(Run 'diff -u tests/btrfs/004.out 
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see the entire 
diff)
Ran: btrfs/004

The large number in scientific notation reported as an invalid numeric
value is the result from the filter passed to perl which multiplies the
physical offset by the block size reported by fiemap.

So fix this by ensuring the physical offset is always set to 0 when we
are processing an inline extent.

Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count 
is zero")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e4a7cdbc9f5..978327d98fc5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4559,6 +4559,7 @@ int extent_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
end = 1;
flags |= FIEMAP_EXTENT_LAST;
} else if (em->block_start == EXTENT_MAP_INLINE) {
+   disko = 0;
flags |= (FIEMAP_EXTENT_DATA_INLINE |
  FIEMAP_EXTENT_NOT_ALIGNED);
} else if (em->block_start == EXTENT_MAP_DELALLOC) {
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix invalid-free in btrfs_extent_same

2018-06-19 Thread Lu Fengqi
If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
   (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
is hit, we will go to free the uninitialized cmp.src_pages and
cmp.dst_pages.

Fixes: 67b07bd4bec5 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
Signed-off-by: Lu Fengqi 
---
 fs/btrfs/ioctl.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index c2837a32d689..43ecbe620dea 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3577,7 +3577,7 @@ static int btrfs_extent_same(struct inode *src, u64 loff, 
u64 olen,
ret = btrfs_extent_same_range(src, loff, BTRFS_MAX_DEDUPE_LEN,
  dst, dst_loff, );
if (ret)
-   goto out_unlock;
+   goto out_free;
 
loff += BTRFS_MAX_DEDUPE_LEN;
dst_loff += BTRFS_MAX_DEDUPE_LEN;
@@ -3587,16 +3587,16 @@ static int btrfs_extent_same(struct inode *src, u64 
loff, u64 olen,
ret = btrfs_extent_same_range(src, loff, tail_len, dst,
  dst_loff, );
 
+out_free:
+   kvfree(cmp.src_pages);
+   kvfree(cmp.dst_pages);
+
 out_unlock:
if (same_inode)
inode_unlock(src);
else
btrfs_double_inode_unlock(src, dst);
 
-out_free:
-   kvfree(cmp.src_pages);
-   kvfree(cmp.dst_pages);
-
return ret;
 }
 
-- 
2.17.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html