Re: [PATCH for 4.7-rc 1/2] btrfs: Remove first key verification since it's causing false alerts

2018-04-12 Thread Qu Wenruo


On 2018年04月12日 21:35, David Sterba wrote:
> On Thu, Apr 12, 2018 at 06:00:02PM +0800, Qu Wenruo wrote:
>> When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
>> kernel warning due to first key verification:
>>
>> [ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 
>> btree_read_extent_buffer_pages+0x1ad/0x210
>> [ 4239.523830] Modules linked in:
>> [ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
>> [ 4239.527101] Call Trace:
>> [ 4239.527251]  read_tree_block+0x42/0x70
>> [ 4239.527434]  read_node_slot+0xd2/0x110
>> [ 4239.527632]  push_leaf_right+0xad/0x1b0
>> [ 4239.527809]  split_leaf+0x4ea/0x700
>> [ 4239.527988]  ? leaf_space_used+0xbc/0xe0
>> [ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
>> [ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
>> [ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
>> [ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
>> [ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
>> [ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
>> [ 4239.529445]  ? start_transaction+0xa8/0x4f0
>> [ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
>> [ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
>> [ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
>> [ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
>> [ 4239.532098]  generic_file_direct_write+0xcb/0x170
>> [ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
>> [ 4239.532491]  aio_write+0xe2/0x180
>> [ 4239.532669]  ? lock_acquire+0xac/0x1e0
>> [ 4239.532839]  ? __might_fault+0x3e/0x90
>> [ 4239.533032]  do_io_submit+0x594/0x860
>> [ 4239.533223]  ? do_io_submit+0x594/0x860
>> [ 4239.533398]  SyS_io_submit+0x10/0x20
>> [ 4239.533560]  ? SyS_io_submit+0x10/0x20
>> [ 4239.533729]  do_syscall_64+0x75/0x1d0
>> [ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
>> [ 4239.534182] RIP: 0033:0x7f8519741697
>>
>> The possibility is low, around 4~7/128 runs with 8 cores.
>> The problem here is, at btree_read_extent_buffer_pages() we don't have
>> acquired read/write lock on that extent buffer, only basic info like
>> level/bytenr is reliable.
>>
>> To get correct first key, we must require at least read lock for that
>> extent buffer, which can't be done in btree_read_extent_buffer_pages(),
>> but deep into core btree operation code.
>>
>> This patch will remove the unreliable first key check to avoid false
>> alerts, and allow later patch to re-implement first key check correctly.
> 
> We'll have to disable the first key check in a less intrusive way, eg.
> taking a shortcut in verify_level_key.

OK, I'll take the alternative solution (method 3) in v4.17 release cycle
to bring a small but with less coverage.

Originally I have 3 ways to fix the false alerts

1) Proper lock handler
   The one I used in 2nd patch, as you can see it needs proper lock
   context and a lot of new error handlers.
   But with best coverage, not only covers new tree blocks read from
   disk, but also any live tree blocks.
   (But helps me to understand the whole tree locking mechanism)

2) Move check to btree_readpage_end_io_hook()
   So no race, and only covers tree block read from disk. (and only
   done once)
   The problem is, we need pass key and level all the way to
   submit_bio_hook(), I don't think it would be an easy work.

3) Skip check for tree blocks of current trans
   Similar coverage of 2), but may still do extra (duplicated) check
   each time we call read_tree_block().
   The idea is to only check first_key if the transid <=
   fs_info->last_committed_transid.
   So we shouldn't meet any race.

For 1) or 2) I'll double consider their possibility and benefit in next
release cycle.

> The patch has was added in
> recent pull, I'm not going to do full a revert now.

Well, this patch is not completely reverting, but still pretty much
reverts half of original patch.
I'll use method 3 to bring the modification to minimal.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for 4.7-rc 1/2] btrfs: Remove first key verification since it's causing false alerts

2018-04-12 Thread David Sterba
On Thu, Apr 12, 2018 at 06:00:02PM +0800, Qu Wenruo wrote:
> When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
> kernel warning due to first key verification:
> 
> [ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 
> btree_read_extent_buffer_pages+0x1ad/0x210
> [ 4239.523830] Modules linked in:
> [ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
> [ 4239.527101] Call Trace:
> [ 4239.527251]  read_tree_block+0x42/0x70
> [ 4239.527434]  read_node_slot+0xd2/0x110
> [ 4239.527632]  push_leaf_right+0xad/0x1b0
> [ 4239.527809]  split_leaf+0x4ea/0x700
> [ 4239.527988]  ? leaf_space_used+0xbc/0xe0
> [ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
> [ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
> [ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
> [ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
> [ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
> [ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
> [ 4239.529445]  ? start_transaction+0xa8/0x4f0
> [ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
> [ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
> [ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
> [ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
> [ 4239.532098]  generic_file_direct_write+0xcb/0x170
> [ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
> [ 4239.532491]  aio_write+0xe2/0x180
> [ 4239.532669]  ? lock_acquire+0xac/0x1e0
> [ 4239.532839]  ? __might_fault+0x3e/0x90
> [ 4239.533032]  do_io_submit+0x594/0x860
> [ 4239.533223]  ? do_io_submit+0x594/0x860
> [ 4239.533398]  SyS_io_submit+0x10/0x20
> [ 4239.533560]  ? SyS_io_submit+0x10/0x20
> [ 4239.533729]  do_syscall_64+0x75/0x1d0
> [ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [ 4239.534182] RIP: 0033:0x7f8519741697
> 
> The possibility is low, around 4~7/128 runs with 8 cores.
> The problem here is, at btree_read_extent_buffer_pages() we don't have
> acquired read/write lock on that extent buffer, only basic info like
> level/bytenr is reliable.
> 
> To get correct first key, we must require at least read lock for that
> extent buffer, which can't be done in btree_read_extent_buffer_pages(),
> but deep into core btree operation code.
> 
> This patch will remove the unreliable first key check to avoid false
> alerts, and allow later patch to re-implement first key check correctly.

We'll have to disable the first key check in a less intrusive way, eg.
taking a shortcut in verify_level_key. The patch has was added in
recent pull, I'm not going to do full a revert now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for 4.7-rc 1/2] btrfs: Remove first key verification since it's causing false alerts

2018-04-12 Thread Qu Wenruo
When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
kernel warning due to first key verification:

[ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 
btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.523830] Modules linked in:
[ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.527101] Call Trace:
[ 4239.527251]  read_tree_block+0x42/0x70
[ 4239.527434]  read_node_slot+0xd2/0x110
[ 4239.527632]  push_leaf_right+0xad/0x1b0
[ 4239.527809]  split_leaf+0x4ea/0x700
[ 4239.527988]  ? leaf_space_used+0xbc/0xe0
[ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
[ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
[ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
[ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
[ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
[ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
[ 4239.529445]  ? start_transaction+0xa8/0x4f0
[ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
[ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
[ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
[ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
[ 4239.532098]  generic_file_direct_write+0xcb/0x170
[ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
[ 4239.532491]  aio_write+0xe2/0x180
[ 4239.532669]  ? lock_acquire+0xac/0x1e0
[ 4239.532839]  ? __might_fault+0x3e/0x90
[ 4239.533032]  do_io_submit+0x594/0x860
[ 4239.533223]  ? do_io_submit+0x594/0x860
[ 4239.533398]  SyS_io_submit+0x10/0x20
[ 4239.533560]  ? SyS_io_submit+0x10/0x20
[ 4239.533729]  do_syscall_64+0x75/0x1d0
[ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 4239.534182] RIP: 0033:0x7f8519741697

The possibility is low, around 4~7/128 runs with 8 cores.
The problem here is, at btree_read_extent_buffer_pages() we don't have
acquired read/write lock on that extent buffer, only basic info like
level/bytenr is reliable.

To get correct first key, we must require at least read lock for that
extent buffer, which can't be done in btree_read_extent_buffer_pages(),
but deep into core btree operation code.

This patch will remove the unreliable first key check to avoid false
alerts, and allow later patch to re-implement first key check correctly.

Reported-by: Nikolay Borisov 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/backref.c |  4 +--
 fs/btrfs/ctree.c   | 20 +--
 fs/btrfs/disk-io.c | 56 ++
 fs/btrfs/disk-io.h |  6 ++---
 fs/btrfs/extent-tree.c |  6 +
 fs/btrfs/print-tree.c  |  4 +--
 fs/btrfs/qgroup.c  |  6 ++---
 fs/btrfs/ref-verify.c  |  6 +
 fs/btrfs/relocation.c  | 17 +++--
 fs/btrfs/tree-log.c| 11 +++--
 10 files changed, 36 insertions(+), 100 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 571024bc632e..a0a3ccf922e8 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -739,7 +739,7 @@ static int add_missing_keys(struct btrfs_fs_info *fs_info,
BUG_ON(!ref->wanted_disk_byte);
 
eb = read_tree_block(fs_info, ref->wanted_disk_byte, 0,
-ref->level - 1, NULL);
+ref->level - 1);
if (IS_ERR(eb)) {
free_pref(ref);
return PTR_ERR(eb);
@@ -1290,7 +1290,7 @@ static int find_parent_nodes(struct btrfs_trans_handle 
*trans,
struct extent_buffer *eb;
 
eb = read_tree_block(fs_info, ref->parent, 0,
-ref->level, NULL);
+ref->level);
if (IS_ERR(eb)) {
ret = PTR_ERR(eb);
goto out;
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a2c9d21176e2..e044b51a2789 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1375,7 +1375,7 @@ get_old_root(struct btrfs_root *root, u64 time_seq)
if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) {
btrfs_tree_read_unlock(eb_root);
free_extent_buffer(eb_root);
-   old = read_tree_block(fs_info, logical, 0, level, NULL);
+   old = read_tree_block(fs_info, logical, 0, level);
if (WARN_ON(IS_ERR(old) || !extent_buffer_uptodate(old))) {
if (!IS_ERR(old))
free_extent_buffer(old);
@@ -1595,7 +1595,6 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
btrfs_set_lock_blocking(parent);
 
for (i = start_slot; i <= end_slot; i++) {
-   struct btrfs_key first_key;
int close = 1;
 
btrfs_node_key(parent, _key, i);
@@ -1605,7 +1604,6 @@ int btrfs_realloc_node(struct btrfs_trans_handle *trans,
progress_passed = 1;