Re: [PATCH 3/3] btrfs-progs: convert-test: trigger chunk allocation after convert

2016-12-13 Thread Qu Wenruo



At 12/14/2016 07:51 AM, Tsutomu Itoh wrote:

On 2016/12/13 18:44, Qu Wenruo wrote:

Populate fs after convert so we can trigger data chunk allocation.
This can expose too restrict old rollback condition

Reported-by: Chandan Rajendra 
Signed-off-by: Qu Wenruo 
---
 tests/common | 4 
 tests/common.convert | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/tests/common b/tests/common
index 571118a..4a1330f 100644
--- a/tests/common
+++ b/tests/common
@@ -486,6 +486,10 @@ generate_dataset() {
run_check $SUDO_HELPER ln -s "$dirpath/$long_filename" 
"$dirpath/slow_slink.$num"
done
;;
+   large)
+   run_check $SUDO_HELPER dd if=/dev/urandom bs=32M 
count=1 \
+   of="$dirpath/$dataset_type" bs=32M >/dev/null 2>&1
+   ;;


Too many bs=.

Thanks,
Tsutomu


Oh, right.

Thanks for pointing it out, I'll update it soon.

Thanks,
Qu



esac
 }

diff --git a/tests/common.convert b/tests/common.convert
index a2d3152..8c9242e 100644
--- a/tests/common.convert
+++ b/tests/common.convert
@@ -160,6 +160,9 @@ convert_test_post_checks_all() {
convert_test_post_check_checksums "$1"
convert_test_post_check_permissions "$2"
convert_test_post_check_acl "$3"
+
+   # Create a large file to trigger data chunk allocation
+   generate_dataset "large"
run_check_umount_test_dev
 }









--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs-progs: convert-test: trigger chunk allocation after convert

2016-12-13 Thread Tsutomu Itoh
On 2016/12/13 18:44, Qu Wenruo wrote:
> Populate fs after convert so we can trigger data chunk allocation.
> This can expose too restrict old rollback condition
> 
> Reported-by: Chandan Rajendra 
> Signed-off-by: Qu Wenruo 
> ---
>  tests/common | 4 
>  tests/common.convert | 3 +++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/tests/common b/tests/common
> index 571118a..4a1330f 100644
> --- a/tests/common
> +++ b/tests/common
> @@ -486,6 +486,10 @@ generate_dataset() {
>   run_check $SUDO_HELPER ln -s 
> "$dirpath/$long_filename" "$dirpath/slow_slink.$num"
>   done
>   ;;
> + large)
> + run_check $SUDO_HELPER dd if=/dev/urandom bs=32M 
> count=1 \
> + of="$dirpath/$dataset_type" bs=32M >/dev/null 2>&1
> + ;;

Too many bs=.

Thanks,
Tsutomu

>   esac
>  }
>  
> diff --git a/tests/common.convert b/tests/common.convert
> index a2d3152..8c9242e 100644
> --- a/tests/common.convert
> +++ b/tests/common.convert
> @@ -160,6 +160,9 @@ convert_test_post_checks_all() {
>   convert_test_post_check_checksums "$1"
>   convert_test_post_check_permissions "$2"
>   convert_test_post_check_acl "$3"
> +
> + # Create a large file to trigger data chunk allocation
> + generate_dataset "large"
>   run_check_umount_test_dev
>  }
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: fix lockdep warning about log_mutex

2016-12-13 Thread Liu Bo
While checking INODE_REF/INODE_EXTREF for a corner case, we may acquire a
different inode's log_mutex with holding the current inode's log_mutex, and
lockdep has complained this with a possilble deadlock warning.

Fix this by using mutex_lock_nested() when processing the other inode's
log_mutex.

Reviewed-by: Filipe Manana 
Signed-off-by: Liu Bo 
---
v2: Use SINGLE_DEPTH_NESTING to avoid magic number.

 fs/btrfs/tree-log.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3d33c4e..298ab3b 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -37,6 +37,7 @@
  */
 #define LOG_INODE_ALL 0
 #define LOG_INODE_EXISTS 1
+#define LOG_OTHER_INODE 2
 
 /*
  * directory trouble cases
@@ -4624,7 +4625,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle 
*trans,
if (S_ISDIR(inode->i_mode) ||
(!test_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
   _I(inode)->runtime_flags) &&
-inode_only == LOG_INODE_EXISTS))
+inode_only >= LOG_INODE_EXISTS))
max_key.type = BTRFS_XATTR_ITEM_KEY;
else
max_key.type = (u8)-1;
@@ -4648,7 +4649,13 @@ static int btrfs_log_inode(struct btrfs_trans_handle 
*trans,
return ret;
}
 
-   mutex_lock(_I(inode)->log_mutex);
+   if (inode_only == LOG_OTHER_INODE) {
+   inode_only = LOG_INODE_EXISTS;
+   mutex_lock_nested(_I(inode)->log_mutex,
+ SINGLE_DEPTH_NESTING);
+   } else {
+   mutex_lock(_I(inode)->log_mutex);
+   }
 
/*
 * a brute force approach to making sure we get the most uptodate
@@ -4800,7 +4807,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle 
*trans,
 * unpin it.
 */
err = btrfs_log_inode(trans, root, other_inode,
- LOG_INODE_EXISTS,
+ LOG_OTHER_INODE,
  0, LLONG_MAX, ctx);
iput(other_inode);
if (err)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: use down_read_nested to make lockdep silent

2016-12-13 Thread Liu Bo
If @block_group is not @used_bg, it'll try to get @used_bg's lock without
droping @block_group 's lock and lockdep has throwed a scary deadlock warning
about it.
Fix it by using down_read_nested.

Signed-off-by: Liu Bo 
---
v2: Use 'SINGLE_DEPTH_NESTING' to avoid magic number.

 fs/btrfs/extent-tree.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4607af3..68e9d25 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7397,7 +7397,8 @@ btrfs_lock_cluster(struct btrfs_block_group_cache 
*block_group,
 
spin_unlock(>refill_lock);
 
-   down_read(_bg->data_rwsem);
+   /* We should only have one-level nested. */
+   down_read_nested(_bg->data_rwsem, SINGLE_DEPTH_NESTING);
 
spin_lock(>refill_lock);
if (used_bg == cluster->block_group)
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: add 'inode' for extent map tracepoint

2016-12-13 Thread Liu Bo
'inode' is an important field for btrfs_get_extent, lets trace it.

Signed-off-by: Liu Bo 
---
v2: add 'unsigned long long' for consistence.

 fs/btrfs/inode.c |  2 +-
 include/trace/events/btrfs.h | 12 
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8e3a5a2..1cdd23c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7081,7 +7081,7 @@ struct extent_map *btrfs_get_extent(struct inode *inode, 
struct page *page,
write_unlock(_tree->lock);
 out:
 
-   trace_btrfs_get_extent(root, em);
+   trace_btrfs_get_extent(root, inode, em);
 
btrfs_free_path(path);
if (trans) {
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index e030d6f..ef80740 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -184,14 +184,16 @@ DEFINE_EVENT(btrfs__inode, btrfs_inode_evict,
 
 TRACE_EVENT_CONDITION(btrfs_get_extent,
 
-   TP_PROTO(struct btrfs_root *root, struct extent_map *map),
+   TP_PROTO(struct btrfs_root *root, struct inode *inode,
+struct extent_map *map),
 
-   TP_ARGS(root, map),
+   TP_ARGS(root, inode, map),
 
TP_CONDITION(map),
 
TP_STRUCT__entry_btrfs(
__field(u64,  root_objectid )
+   __field(u64,  ino   )
__field(u64,  start )
__field(u64,  len   )
__field(u64,  orig_start)
@@ -204,7 +206,8 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
 
TP_fast_assign_btrfs(root->fs_info,
__entry->root_objectid  = root->root_key.objectid;
-   __entry->start  = map->start;
+   __entry->ino= btrfs_ino(inode);
+   __entry->start  = map->start;
__entry->len= map->len;
__entry->orig_start = map->orig_start;
__entry->block_start= map->block_start;
@@ -214,11 +217,12 @@ TRACE_EVENT_CONDITION(btrfs_get_extent,
__entry->compress_type  = map->compress_type;
),
 
-   TP_printk_btrfs("root = %llu(%s), start = %llu, len = %llu, "
+   TP_printk_btrfs("root = %llu(%s), ino = %llu start = %llu, len = %llu, "
  "orig_start = %llu, block_start = %llu(%s), "
  "block_len = %llu, flags = %s, refs = %u, "
  "compress_type = %u",
  show_root_type(__entry->root_objectid),
+ (unsigned long long)__entry->ino,
  (unsigned long long)__entry->start,
  (unsigned long long)__entry->len,
  (unsigned long long)__entry->orig_start,
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another

2016-12-13 Thread David Arendt
Hi,

unfortunately I did not dump meminfo before the crash.

Here is the actual meminfo as of now with the copy running for about 3
hours.

MemTotal:   32806572 kB
MemFree:  197336 kB
MemAvailable:   31226888 kB
Buffers:  52 kB
Cached: 30603160 kB
SwapCached:11880 kB
Active: 29015008 kB
Inactive:2017292 kB
Active(anon): 162124 kB
Inactive(anon):   285104 kB
Active(file):   28852884 kB
Inactive(file):  1732188 kB
Unevictable:7092 kB
Mlocked:7092 kB
SwapTotal:  62522692 kB
SwapFree:   62460464 kB
Dirty:231944 kB
Writeback: 0 kB
AnonPages:425160 kB
Mapped:   227656 kB
Shmem: 12160 kB
Slab:1380280 kB
SReclaimable: 774584 kB
SUnreclaim:   605696 kB
KernelStack:7840 kB
PageTables:12800 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit:78925976 kB
Committed_AS:1883256 kB
VmallocTotal:   34359738367 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB
HugePages_Total:   0
HugePages_Free:0
HugePages_Rsvd:0
HugePages_Surp:0
Hugepagesize:   2048 kB
DirectMap4k:20220592 kB
DirectMap2M:13238272 kB
DirectMap1G: 1048576 kB

I will write a cronjob that dumps meminfo every 5 minutes to a file, so
I will have more info on the next crash.

The crash is not an isolated one as I already had this crash multiple
times with -rc7 and -rc8. It seems only to occur when copying from
7200rpm harddisks to 5600rpm ones, and never when copying between two
7200rpm or two 5400rpm.

Thanks,
David Arendt

On 12/13/2016 08:55 PM, Xin Zhou wrote:
> Hi David,
>
> It has GFP_NOFS flags, according to definition,
> the issue might have happened during initial DISK/IO.
>
> By the way, did you get a chance to dump the meminfo and run "top" before the 
> system hang?
> It seems more info about the system running state needed to know the issue. 
> Thanks.
>
> Xin
>
>  
>
> Sent: Tuesday, December 13, 2016 at 9:11 AM
> From: "David Arendt" 
> To: linux-btrfs@vger.kernel.org, linux-ker...@vger.kernel.org
> Subject: page allocation stall in kernel 4.9 when copying files from one 
> btrfs hdd to another
> Hi,
>
> I receive the following page allocation stall while copying lots of
> large files from one btrfs hdd to another.
>
> Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for
> 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8
> Tainted: P O 4.9.0 #1
> Dec 13 13:04:29 server kernel: Hardware name: ASUS All Series/H87M-PRO,
> BIOS 2102 10/28/2014
> Dec 13 13:04:29 server kernel: Workqueue: btrfs-extent-refs
> btrfs_extent_refs_helper
> Dec 13 13:04:29 server kernel:  813f3a59
> 81976b28 c90011093750
> Dec 13 13:04:29 server kernel: 81114fc1 02400840f39b6bc0
> 81976b28 c900110936f8
> Dec 13 13:04:29 server kernel: 88070010 c90011093760
> c90011093710 02400840
> Dec 13 13:04:29 server kernel: Call Trace:
> Dec 13 13:04:29 server kernel: [] ? dump_stack+0x46/0x5d
> Dec 13 13:04:29 server kernel: [] ?
> warn_alloc+0x111/0x130
> Dec 13 13:04:33 server kernel: [] ?
> __alloc_pages_nodemask+0xbe8/0xd30
> Dec 13 13:04:33 server kernel: [] ?
> pagecache_get_page+0xe4/0x230
> Dec 13 13:04:33 server kernel: [] ?
> alloc_extent_buffer+0x10b/0x400
> Dec 13 13:04:33 server kernel: [] ?
> btrfs_alloc_tree_block+0x125/0x560
> Dec 13 13:04:33 server kernel: [] ?
> read_extent_buffer_pages+0x21f/0x280
> Dec 13 13:04:33 server kernel: [] ?
> __btrfs_cow_block+0x141/0x580
> Dec 13 13:04:33 server kernel: [] ?
> btrfs_cow_block+0x100/0x150
> Dec 13 13:04:33 server kernel: [] ?
> btrfs_search_slot+0x1e9/0x9c0
> Dec 13 13:04:33 server kernel: [] ?
> __set_extent_bit+0x512/0x550
> Dec 13 13:04:33 server kernel: [] ?
> lookup_inline_extent_backref+0xf5/0x5e0
> Dec 13 13:04:34 server kernel: [] ?
> set_extent_bit+0x24/0x30
> Dec 13 13:04:34 server kernel: [] ?
> update_block_group.isra.34+0x114/0x380
> Dec 13 13:04:34 server kernel: [] ?
> __btrfs_free_extent.isra.35+0xf4/0xd20
> Dec 13 13:04:34 server kernel: [] ?
> btrfs_merge_delayed_refs+0x61/0x5d0
> Dec 13 13:04:34 server kernel: [] ?
> __btrfs_run_delayed_refs+0x902/0x10a0
> Dec 13 13:04:34 server kernel: [] ?
> btrfs_run_delayed_refs+0x90/0x2a0
> Dec 13 13:04:34 server kernel: [] ?
> delayed_ref_async_start+0x84/0xa0
> Dec 13 13:04:34 server kernel: [] ?
> process_one_work+0x11d/0x3b0
> Dec 13 13:04:34 server kernel: [] ?
> worker_thread+0x42/0x4b0
> Dec 13 13:04:34 server kernel: [] ?
> process_one_work+0x3b0/0x3b0
> Dec 13 13:04:34 server kernel: [] ?
> process_one_work+0x3b0/0x3b0
> Dec 13 13:04:34 server kernel: [] ?
> do_group_exit+0x2e/0xa0
> Dec 13 13:04:34 server kernel: [] ? kthread+0xb9/0xd0
> Dec 13 13:04:34 server kernel: [] ?
> 

[PATCH] Btrfs: fix comment in btrfs_page_mkwrite

2016-12-13 Thread Liu Bo
The comment about "page_mkwrite gets called every time the page is dirtied" in
btrfs_page_mkwrite is not correct, it only gets called the first time the page
gets dirtied after the page faults in.

However, we don't need to touch the code because it works well, although the
proper logic is to check if delalloc bits has been set and if so, go free
reserved space, if not, set the delalloc bits for dirty page range.

Signed-off-by: Liu Bo 
---
 fs/btrfs/inode.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8e3a5a2..0bec9cc 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9063,11 +9063,11 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
}
 
/*
-* XXX - page_mkwrite gets called every time the page is dirtied, even
-* if it was already dirty, so for space accounting reasons we need to
-* clear any delalloc bits for the range we are fixing to save.  There
-* is probably a better way to do this, but for now keep consistent with
-* prepare_pages in the normal write path.
+* page_mkwrite gets called when the page is firstly dirtied after it's
+* faulted in, but write(2) could also dirty a page and set delalloc
+* bits, thus in this case for space account reason, we still need to
+* clear any delalloc bits within this page range since we have to
+* reserve data space before lock_page() (see above comments).
 */
clear_extent_bit(_I(inode)->io_tree, page_start, end,
  EXTENT_DIRTY | EXTENT_DELALLOC |
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another

2016-12-13 Thread Xin Zhou
Hi David,

It has GFP_NOFS flags, according to definition,
the issue might have happened during initial DISK/IO.

By the way, did you get a chance to dump the meminfo and run "top" before the 
system hang?
It seems more info about the system running state needed to know the issue. 
Thanks.

Xin

 

Sent: Tuesday, December 13, 2016 at 9:11 AM
From: "David Arendt" 
To: linux-btrfs@vger.kernel.org, linux-ker...@vger.kernel.org
Subject: page allocation stall in kernel 4.9 when copying files from one btrfs 
hdd to another
Hi,

I receive the following page allocation stall while copying lots of
large files from one btrfs hdd to another.

Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for
12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8
Tainted: P O 4.9.0 #1
Dec 13 13:04:29 server kernel: Hardware name: ASUS All Series/H87M-PRO,
BIOS 2102 10/28/2014
Dec 13 13:04:29 server kernel: Workqueue: btrfs-extent-refs
btrfs_extent_refs_helper
Dec 13 13:04:29 server kernel:  813f3a59
81976b28 c90011093750
Dec 13 13:04:29 server kernel: 81114fc1 02400840f39b6bc0
81976b28 c900110936f8
Dec 13 13:04:29 server kernel: 88070010 c90011093760
c90011093710 02400840
Dec 13 13:04:29 server kernel: Call Trace:
Dec 13 13:04:29 server kernel: [] ? dump_stack+0x46/0x5d
Dec 13 13:04:29 server kernel: [] ?
warn_alloc+0x111/0x130
Dec 13 13:04:33 server kernel: [] ?
__alloc_pages_nodemask+0xbe8/0xd30
Dec 13 13:04:33 server kernel: [] ?
pagecache_get_page+0xe4/0x230
Dec 13 13:04:33 server kernel: [] ?
alloc_extent_buffer+0x10b/0x400
Dec 13 13:04:33 server kernel: [] ?
btrfs_alloc_tree_block+0x125/0x560
Dec 13 13:04:33 server kernel: [] ?
read_extent_buffer_pages+0x21f/0x280
Dec 13 13:04:33 server kernel: [] ?
__btrfs_cow_block+0x141/0x580
Dec 13 13:04:33 server kernel: [] ?
btrfs_cow_block+0x100/0x150
Dec 13 13:04:33 server kernel: [] ?
btrfs_search_slot+0x1e9/0x9c0
Dec 13 13:04:33 server kernel: [] ?
__set_extent_bit+0x512/0x550
Dec 13 13:04:33 server kernel: [] ?
lookup_inline_extent_backref+0xf5/0x5e0
Dec 13 13:04:34 server kernel: [] ?
set_extent_bit+0x24/0x30
Dec 13 13:04:34 server kernel: [] ?
update_block_group.isra.34+0x114/0x380
Dec 13 13:04:34 server kernel: [] ?
__btrfs_free_extent.isra.35+0xf4/0xd20
Dec 13 13:04:34 server kernel: [] ?
btrfs_merge_delayed_refs+0x61/0x5d0
Dec 13 13:04:34 server kernel: [] ?
__btrfs_run_delayed_refs+0x902/0x10a0
Dec 13 13:04:34 server kernel: [] ?
btrfs_run_delayed_refs+0x90/0x2a0
Dec 13 13:04:34 server kernel: [] ?
delayed_ref_async_start+0x84/0xa0
Dec 13 13:04:34 server kernel: [] ?
process_one_work+0x11d/0x3b0
Dec 13 13:04:34 server kernel: [] ?
worker_thread+0x42/0x4b0
Dec 13 13:04:34 server kernel: [] ?
process_one_work+0x3b0/0x3b0
Dec 13 13:04:34 server kernel: [] ?
process_one_work+0x3b0/0x3b0
Dec 13 13:04:34 server kernel: [] ?
do_group_exit+0x2e/0xa0
Dec 13 13:04:34 server kernel: [] ? kthread+0xb9/0xd0
Dec 13 13:04:34 server kernel: [] ?
kthread_park+0x50/0x50
Dec 13 13:04:34 server kernel: [] ?
ret_from_fork+0x22/0x30
Dec 13 13:04:34 server kernel: Mem-Info:
Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
isolated_anon:0\x0a active_file:7370032 inactive_file:450105
isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
free_cma:0
Dec 13 13:04:34 server kernel: Node 0 active_anon:80kB
inactive_anon:136kB active_file:29480128kB inactive_file:1800420kB
unevictable:0kB isolated(anon):0kB isolated(file):1280kB mapped:16944kB
dirty:2090992kB writeback:756kB shmem:0kB writeback_tmp:0kB unstable:0kB
pages_scanned:258821 all_unreclaimable? no
Dec 13 13:04:34 server kernel: DMA free:15868kB min:8kB low:20kB
high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB
managed:15892kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:24kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 3428 32019 32019
Dec 13 13:04:34 server kernel: DMA32 free:116800kB min:2448kB low:5956kB
high:9464kB active_anon:0kB inactive_anon:0kB active_file:3087928kB
inactive_file:191336kB unevictable:0kB writepending:221828kB
present:3590832kB managed:3513936kB mlocked:0kB slab_reclaimable:93252kB
slab_unreclaimable:20520kB kernel_stack:48kB pagetables:212kB bounce:0kB
free_pcp:4kB local_pcp:0kB free_cma:0kB
Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 0 0 0
Dec 13 13:04:34 server kernel: DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U)
1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
1*2048kB (M) 3*4096kB (M) = 15868kB
Dec 13 13:04:34 server kernel: DMA32: 940*4kB (UME) 4006*8kB (UME)
3308*16kB (UME) 791*32kB 

[PATCH] btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref

2016-12-13 Thread Jeff Mahoney
btrfs_add_delayed_data_ref is always called with a NULL extent_op,
so let's drop the argument.

Signed-off-by: Jeff Mahoney 
---
 fs/btrfs/delayed-ref.c | 6 ++
 fs/btrfs/delayed-ref.h | 3 +--
 fs/btrfs/extent-tree.c | 7 +++
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 8d93854..299fb2e 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -811,15 +811,13 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
   struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes,
   u64 parent, u64 ref_root,
-  u64 owner, u64 offset, u64 reserved, int action,
-  struct btrfs_delayed_extent_op *extent_op)
+  u64 owner, u64 offset, u64 reserved, int action)
 {
struct btrfs_delayed_data_ref *ref;
struct btrfs_delayed_ref_head *head_ref;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_qgroup_extent_record *record = NULL;
 
-   BUG_ON(extent_op && !extent_op->is_data);
ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
if (!ref)
return -ENOMEM;
@@ -841,7 +839,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
}
}
 
-   head_ref->extent_op = extent_op;
+   head_ref->extent_op = NULL;
 
delayed_refs = >transaction->delayed_refs;
spin_lock(_refs->lock);
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 43f3629..a4be1cc 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -248,8 +248,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
   struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes,
   u64 parent, u64 ref_root,
-  u64 owner, u64 offset, u64 reserved, int action,
-  struct btrfs_delayed_extent_op *extent_op);
+  u64 owner, u64 offset, u64 reserved, int action);
 int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes,
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 73a8d31..4471632 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2099,7 +2099,7 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
num_bytes, parent, root_objectid,
owner, offset, 0,
-   BTRFS_ADD_DELAYED_REF, NULL);
+   BTRFS_ADD_DELAYED_REF);
}
return ret;
 }
@@ -7255,7 +7255,7 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, 
struct btrfs_root *root,
num_bytes,
parent, root_objectid, owner,
offset, 0,
-   BTRFS_DROP_DELAYED_REF, NULL);
+   BTRFS_DROP_DELAYED_REF);
}
return ret;
 }
@@ -8205,8 +8205,7 @@ int btrfs_alloc_reserved_file_extent(struct 
btrfs_trans_handle *trans,
ret = btrfs_add_delayed_data_ref(root->fs_info, trans, ins->objectid,
 ins->offset, 0,
 root_objectid, owner, offset,
-ram_bytes, BTRFS_ADD_DELAYED_EXTENT,
-NULL);
+ram_bytes, BTRFS_ADD_DELAYED_EXTENT); 
return ret;
 }
 


-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix error handling when run_delayed_extent_op fails

2016-12-13 Thread Jeff Mahoney
In __btrfs_run_delayed_refs, the error path when run_delayed_extent_op
fails sets locked_ref->processing = 0 but doesn't re-increment
delayed_refs->num_heads_ready.  As a result, we can end up triggering
the WARN_ON in btrfs_select_ref_head since the head remains in the tree
with ->processing = 0 and the ready count is off.

Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Reported-by: Jon Nelson 
Signed-off-by: Jeff Mahoney 
---
 fs/btrfs/extent-tree.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4607af3..73a8d31 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2588,6 +2588,7 @@ static noinline int __btrfs_run_delayed_refs(struct 
btrfs_trans_handle *trans,
if (must_insert_reserved)

locked_ref->must_insert_reserved = 1;
locked_ref->processing = 0;
+   delayed_refs->num_heads_ready++;
btrfs_debug(fs_info,
"run_delayed_extent_op 
returned %d",
ret);


-- 
Jeff Mahoney
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: limit async_work allocation and worker func duration

2016-12-13 Thread Chris Mason

On 12/12/2016 03:35 PM, Maxim Patlasov wrote:

On 12/12/2016 06:54 AM, David Sterba wrote:

As far as we don't have any NO_THRESHOLD users of
btrfs_workqueue_normal_congested for now, I tend to think it's better to
add a descriptive comment and simply return "false" from
btrfs_workqueue_normal_congested rather than trying to address some
future needs now. See please v2 of the patch.



Thanks, I've got v2 and added a cc for stable to v3.15+, which isn't 
exactly right, but its when the new workqueue system was put in place.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another

2016-12-13 Thread David Arendt
Hi,

I receive the following page allocation stall while copying lots of
large files from one btrfs hdd to another.

Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for
12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8
Tainted: P   O4.9.0 #1
Dec 13 13:04:29 server kernel: Hardware name: ASUS All Series/H87M-PRO,
BIOS 2102 10/28/2014
Dec 13 13:04:29 server kernel: Workqueue: btrfs-extent-refs
btrfs_extent_refs_helper
Dec 13 13:04:29 server kernel:   813f3a59
81976b28 c90011093750
Dec 13 13:04:29 server kernel:  81114fc1 02400840f39b6bc0
81976b28 c900110936f8
Dec 13 13:04:29 server kernel:  88070010 c90011093760
c90011093710 02400840
Dec 13 13:04:29 server kernel: Call Trace:
Dec 13 13:04:29 server kernel:  [] ? dump_stack+0x46/0x5d
Dec 13 13:04:29 server kernel:  [] ?
warn_alloc+0x111/0x130
Dec 13 13:04:33 server kernel:  [] ?
__alloc_pages_nodemask+0xbe8/0xd30
Dec 13 13:04:33 server kernel:  [] ?
pagecache_get_page+0xe4/0x230
Dec 13 13:04:33 server kernel:  [] ?
alloc_extent_buffer+0x10b/0x400
Dec 13 13:04:33 server kernel:  [] ?
btrfs_alloc_tree_block+0x125/0x560
Dec 13 13:04:33 server kernel:  [] ?
read_extent_buffer_pages+0x21f/0x280
Dec 13 13:04:33 server kernel:  [] ?
__btrfs_cow_block+0x141/0x580
Dec 13 13:04:33 server kernel:  [] ?
btrfs_cow_block+0x100/0x150
Dec 13 13:04:33 server kernel:  [] ?
btrfs_search_slot+0x1e9/0x9c0
Dec 13 13:04:33 server kernel:  [] ?
__set_extent_bit+0x512/0x550
Dec 13 13:04:33 server kernel:  [] ?
lookup_inline_extent_backref+0xf5/0x5e0
Dec 13 13:04:34 server kernel:  [] ?
set_extent_bit+0x24/0x30
Dec 13 13:04:34 server kernel:  [] ?
update_block_group.isra.34+0x114/0x380
Dec 13 13:04:34 server kernel:  [] ?
__btrfs_free_extent.isra.35+0xf4/0xd20
Dec 13 13:04:34 server kernel:  [] ?
btrfs_merge_delayed_refs+0x61/0x5d0
Dec 13 13:04:34 server kernel:  [] ?
__btrfs_run_delayed_refs+0x902/0x10a0
Dec 13 13:04:34 server kernel:  [] ?
btrfs_run_delayed_refs+0x90/0x2a0
Dec 13 13:04:34 server kernel:  [] ?
delayed_ref_async_start+0x84/0xa0
Dec 13 13:04:34 server kernel:  [] ?
process_one_work+0x11d/0x3b0
Dec 13 13:04:34 server kernel:  [] ?
worker_thread+0x42/0x4b0
Dec 13 13:04:34 server kernel:  [] ?
process_one_work+0x3b0/0x3b0
Dec 13 13:04:34 server kernel:  [] ?
process_one_work+0x3b0/0x3b0
Dec 13 13:04:34 server kernel:  [] ?
do_group_exit+0x2e/0xa0
Dec 13 13:04:34 server kernel:  [] ? kthread+0xb9/0xd0
Dec 13 13:04:34 server kernel:  [] ?
kthread_park+0x50/0x50
Dec 13 13:04:34 server kernel:  [] ?
ret_from_fork+0x22/0x30
Dec 13 13:04:34 server kernel: Mem-Info:
Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
isolated_anon:0\x0a active_file:7370032 inactive_file:450105
isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
free_cma:0
Dec 13 13:04:34 server kernel: Node 0 active_anon:80kB
inactive_anon:136kB active_file:29480128kB inactive_file:1800420kB
unevictable:0kB isolated(anon):0kB isolated(file):1280kB mapped:16944kB
dirty:2090992kB writeback:756kB shmem:0kB writeback_tmp:0kB unstable:0kB
pages_scanned:258821 all_unreclaimable? no
Dec 13 13:04:34 server kernel: DMA free:15868kB min:8kB low:20kB
high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB
managed:15892kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:24kB
kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB
free_cma:0kB
Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 3428 32019 32019
Dec 13 13:04:34 server kernel: DMA32 free:116800kB min:2448kB low:5956kB
high:9464kB active_anon:0kB inactive_anon:0kB active_file:3087928kB
inactive_file:191336kB unevictable:0kB writepending:221828kB
present:3590832kB managed:3513936kB mlocked:0kB slab_reclaimable:93252kB
slab_unreclaimable:20520kB kernel_stack:48kB pagetables:212kB bounce:0kB
free_pcp:4kB local_pcp:0kB free_cma:0kB
Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 0 0 0
Dec 13 13:04:34 server kernel: DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U)
1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U)
1*2048kB (M) 3*4096kB (M) = 15868kB
Dec 13 13:04:34 server kernel: DMA32: 940*4kB (UME) 4006*8kB (UME)
3308*16kB (UME) 791*32kB (UME) 41*64kB (UE) 1*128kB (U) 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 116800kB
Dec 13 13:04:34 server kernel: Normal: 75*4kB (E) 192*8kB (UE) 94*16kB
(UME) 57*32kB (U) 33*64kB (UM) 16*128kB (UM) 10*256kB (UM) 4*512kB (U)
0*1024kB 1*2048kB (U) 1*4096kB (U) = 20076kB
Dec 13 13:04:34 server kernel: Node 0 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
Dec 13 13:04:34 server kernel: 7820441 total pagecache pages
Dec 13 13:04:34 server kernel: 69 pages in swap cache

Re: [PATCH] Btrfs: Coding style fixes

2016-12-13 Thread David Sterba
On Mon, Dec 12, 2016 at 07:28:46PM +0100, Seraphime Kirkovski wrote:
> On Mon, Dec 12, 2016 at 05:11:56PM +0100, David Sterba wrote:
> > This type of change is more like a cleanup and you can find more
> > instances where the type is applied to just one of the operands, while
> > min_t/max_t would be better. Feel free to send a separate patch for
> > that.
> 
> Thanks for the feedback. I will try to do the sweep in the following 
> days.
> 
> I'm sorry, but I didn't quite understand. Should I resend the min/min_t 
> change of this patch in a separate patch ?

Remove the hunk that changes min -> mit_t from this patch, send it in a
separate patch with more changes of that kind. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges

2016-12-13 Thread Chandan Rajendra
On Friday, December 02, 2016 10:03:07 AM Qu Wenruo wrote:
> [BUG]
> For the following case, btrfs can underflow qgroup reserved space
> at error path:
> (Page size 4K, function name without "btrfs_" prefix)
> 
>  Task A  | Task B
> --
> Buffered_write [0, 2K)   |
> |- check_data_free_space()   |
> |  |- qgroup_reserve_data()  |
> | Range aligned to page  |
> | range [0, 4K)  <<< |
> | 4K bytes reserved  <<< |
> |- copy pages to page cache  |
>  | Buffered_write [2K, 4K)
>  | |- check_data_free_space()
>  | |  |- qgroup_reserved_data()
>  | | Range alinged to page
>  | | range [0, 4K)
>  | | Already reserved by A <<<
>  | | 0 bytes reserved  <<<
>  | |- delalloc_reserve_metadata()
>  | |  And it *FAILED* (Maybe EQUOTA)
>  | |- free_reserved_data_space()
>   |- qgroup_free_data()
>  Range aligned to page range
>  [0, 4K)
>  Freeing 4K
> (Special thanks to Chandan for the detailed report and analyse)
> 
> [CAUSE]
> Above Task B is freeing reserved data range [0, 4K) which is actually
> reserved by Task A.
> 
> And at write back time, page dirty by Task A will go through writeback
> routine, which will free 4K reserved data space at file extent insert
> time, causing the qgroup underflow.
> 
> [FIX]
> For btrfs_qgroup_free_data(), add @reserved parameter to only free
> data ranges reserved by previous btrfs_qgroup_reserve_data().
> So in above case, Task B will try to free 0 byte, so no underflow.
>

The changes look good to me. Also, I did not notice any regressions when
executing fstests with the patch applied.

Reviewed-by: Chandan Rajendra 
Tested-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions

2016-12-13 Thread Chandan Rajendra
On Friday, December 02, 2016 10:03:06 AM Qu Wenruo wrote:
> Introduce a new parameter, struct extent_changeset for
> btrfs_qgroup_reserved_data() and its callers.
> 
> Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
> which range it reserved in current reserve, so it can free it at error
> path.
> 
> The reason we need to export it to callers is, at buffered write error
> path, without knowing what exactly which range we reserved in current
> allocation, we can free space which is not reserved by us.
> 
> This will lead to qgroup reserved space underflow.

The changes look good to me.

Reviewed-by: Chandan Rajendra 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs check --repair question

2016-12-13 Thread bepi
Hi.


I had two cases of 'ref mismatch on extents  ..', like you.

Any attempt at recovery has much worsened the problem.

I suggest you save importanto data and delete and recreate the partition.

I always have a partition for re-install from scratch, so that I can recover
data from damaged file system, without being forced to try to repair it.


Gdb


This mail has been sent using Alpikom webmail system
http://www.alpikom.it

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: file-item: Fix wrong file extents inserted

2016-12-13 Thread Qu Wenruo
If we specify NO_HOLES incompat feature when converting, the result
image still uses hole file extents.
And further more, the hole is incorrect as its disk_num_bytes is not
zero.

The problem is at btrfs_insert_file_extent() which doesn't check if we
are going to insert hole file extent.

Modify it to skip hole file extents to allow it follow restrict NO_HOLES
flag.

And since no_holes flag can be triggered on half-way, so current fsck
won't report such error, as it consider it as old file holes.

Signed-off-by: Qu Wenruo 
---
 convert/main.c |  2 +-
 file-item.c| 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/convert/main.c b/convert/main.c
index 4b4cea4..e6d8b3e 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -336,7 +336,7 @@ static int record_file_blocks(struct blk_iterate_data *data,
   key.offset > cur_off);
fi = btrfs_item_ptr(node, slot, struct btrfs_file_extent_item);
extent_disk_bytenr = btrfs_file_extent_disk_bytenr(node, fi);
-   extent_num_bytes = btrfs_file_extent_disk_num_bytes(node, fi);
+   extent_num_bytes = btrfs_file_extent_num_bytes(node, fi);
BUG_ON(cur_off - key.offset >= extent_num_bytes);
btrfs_release_path();
 
diff --git a/file-item.c b/file-item.c
index 67c0b4f..e462b4b 100644
--- a/file-item.c
+++ b/file-item.c
@@ -36,11 +36,22 @@ int btrfs_insert_file_extent(struct btrfs_trans_handle 
*trans,
 u64 disk_num_bytes, u64 num_bytes)
 {
int ret = 0;
+   int is_hole = 0;
struct btrfs_file_extent_item *item;
struct btrfs_key file_key;
struct btrfs_path *path;
struct extent_buffer *leaf;
 
+   if (offset == 0)
+   is_hole = 1;
+   /* For NO_HOLES, we don't insert hole file extent */
+   if (btrfs_fs_incompat(root->fs_info, NO_HOLES) && is_hole)
+   return 0;
+
+   /* For hole, its disk_bytenr and disk_num_bytes must be 0 */
+   if (is_hole)
+   disk_num_bytes = 0;
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-- 
2.10.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: convert-test: trigger chunk allocation after convert

2016-12-13 Thread Qu Wenruo
Populate fs after convert so we can trigger data chunk allocation.
This can expose too restrict old rollback condition

Reported-by: Chandan Rajendra 
Signed-off-by: Qu Wenruo 
---
 tests/common | 4 
 tests/common.convert | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/tests/common b/tests/common
index 571118a..4a1330f 100644
--- a/tests/common
+++ b/tests/common
@@ -486,6 +486,10 @@ generate_dataset() {
run_check $SUDO_HELPER ln -s 
"$dirpath/$long_filename" "$dirpath/slow_slink.$num"
done
;;
+   large)
+   run_check $SUDO_HELPER dd if=/dev/urandom bs=32M 
count=1 \
+   of="$dirpath/$dataset_type" bs=32M >/dev/null 2>&1
+   ;;
esac
 }
 
diff --git a/tests/common.convert b/tests/common.convert
index a2d3152..8c9242e 100644
--- a/tests/common.convert
+++ b/tests/common.convert
@@ -160,6 +160,9 @@ convert_test_post_checks_all() {
convert_test_post_check_checksums "$1"
convert_test_post_check_permissions "$2"
convert_test_post_check_acl "$3"
+
+   # Create a large file to trigger data chunk allocation
+   generate_dataset "large"
run_check_umount_test_dev
 }
 
-- 
2.10.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: convert: Rework rollback to handle new convert image

2016-12-13 Thread Qu Wenruo
Although commit 9c4b820412746b3 tried to make the rollback condition
less restrict, to co-operate with new rollback behavior, it's still too
restrict.

If btrfs allocates a new data chunk, it's highly possible that the new
chunk will not be 1:1 mapped anymore.

And this makes old rollback check fails, and refuse to rollback.

This patch rework it by checking rollback condition more accurately.

1) Rollback condition
   Unlike old chunk level check, we use file extent level check.
   So we manually check every file extents of convert image file.

   Only when all file extents except ones in btrfs relocated ranges(*)
   are mapped 1:1 we allow rollback.

   This behavior make both old and new behavior happy.
*:
   [0, 1M)
   [btrfs_sb_offset(1), +64K)
   [btrfs_sb_offset(2), +64K)

2) Rollback method
   Old rollback method is quite complex, using extent_io tree to mark
   every checked ranges.
   And do extra chunk tree operation before rollback.

   The new rollback method is quite simple.
   1) open btrfs
   2) read and save relocated data
   3) close btrfs
   4) write relocated into place.

Such rework fixes the following problem
1) rollback failure after new data chunk allocation
2) rollback failure after correct NO_HOLES convert

Reported-by: Chandan Rajendra 
Signed-off-by: Qu Wenruo 
---
 convert/main.c | 691 ++---
 1 file changed, 266 insertions(+), 425 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index e6d8b3e..0052d80 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1411,36 +1411,6 @@ fail:
return ret;
 }
 
-static int prepare_system_chunk_sb(struct btrfs_super_block *super)
-{
-   struct btrfs_chunk *chunk;
-   struct btrfs_disk_key *key;
-   u32 sectorsize = btrfs_super_sectorsize(super);
-
-   key = (struct btrfs_disk_key *)(super->sys_chunk_array);
-   chunk = (struct btrfs_chunk *)(super->sys_chunk_array +
-  sizeof(struct btrfs_disk_key));
-
-   btrfs_set_disk_key_objectid(key, BTRFS_FIRST_CHUNK_TREE_OBJECTID);
-   btrfs_set_disk_key_type(key, BTRFS_CHUNK_ITEM_KEY);
-   btrfs_set_disk_key_offset(key, 0);
-
-   btrfs_set_stack_chunk_length(chunk, btrfs_super_total_bytes(super));
-   btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID);
-   btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN);
-   btrfs_set_stack_chunk_type(chunk, BTRFS_BLOCK_GROUP_SYSTEM);
-   btrfs_set_stack_chunk_io_align(chunk, sectorsize);
-   btrfs_set_stack_chunk_io_width(chunk, sectorsize);
-   btrfs_set_stack_chunk_sector_size(chunk, sectorsize);
-   btrfs_set_stack_chunk_num_stripes(chunk, 1);
-   btrfs_set_stack_chunk_sub_stripes(chunk, 0);
-   chunk->stripe.devid = super->dev_item.devid;
-   btrfs_set_stack_stripe_offset(>stripe, 0);
-   memcpy(chunk->stripe.dev_uuid, super->dev_item.uuid, BTRFS_UUID_SIZE);
-   btrfs_set_super_sys_array_size(super, sizeof(*key) + sizeof(*chunk));
-   return 0;
-}
-
 #if BTRFSCONVERT_EXT2
 
 /*
@@ -2544,479 +2514,350 @@ fail:
 }
 
 /*
- * Check if a non 1:1 mapped chunk can be rolled back.
- * For new convert, it's OK while for old convert it's not.
+ * If [start1, start1 + len1) is a subset of [start2, start2 + len2)
+ * return 1.
+ * Else return 0.
  */
-static int may_rollback_chunk(struct btrfs_fs_info *fs_info, u64 bytenr)
+static int is_range_subset(u64 start1, u64 len1, u64 start2, u64 len2)
 {
-   struct btrfs_block_group_cache *bg;
-   struct btrfs_key key;
-   struct btrfs_path path;
-   struct btrfs_root *extent_root = fs_info->extent_root;
-   u64 bg_start;
-   u64 bg_end;
-   int ret;
-
-   bg = btrfs_lookup_first_block_group(fs_info, bytenr);
-   if (!bg)
-   return -ENOENT;
-   bg_start = bg->key.objectid;
-   bg_end = bg->key.objectid + bg->key.offset;
+   if (start1 >= start2 && start1 + len1 <= start2 + len2)
+   return 1;
+   return 0;
+}
 
-   key.objectid = bg_end;
-   key.type = BTRFS_METADATA_ITEM_KEY;
-   key.offset = 0;
-   btrfs_init_path();
+/*
+ * If [start1, start1 + len2) intersects with [start2, start2 + len2)
+ * return 1.
+ * Else return 0.
+ */
+static int is_range_intersect(u64 start1, u64 len1, u64 start2, u64 len2)
+{
+   if (start1 >= start2 + len2 || start1 + len1 <= start2)
+   return 0;
+   return 1;
+}
 
-   ret = btrfs_search_slot(NULL, extent_root, , , 0, 0);
-   if (ret < 0)
-   return ret;
+/*
+ * Check if a range is a subset of btrfs convert reloc space.
+ */
+static int is_range_subset_of_reloc_space(u64 start, u64 len)
+{
+   /*
+* Must be in one of the ranges, or it's not in btrfs reloc space
+* [0, 1M)
+* [sb_offset(1), + 64K)
+* [sb_offset(2), + 64K)
+*/
+   if (is_range_subset(start, len, 

[PATCH 0/3] Convert rollback rework and test enhancement

2016-12-13 Thread Qu Wenruo
Current convert rollback condition is too restrict for new convert
behavior, and has several problems.

1) Can't rollback new convert image with new data chunk
Chunk level check can't handle newly allocated data chunk, which is not
1:1 mapped but completely valid in new behavior.
The last patch will enhance the test case to handle it.

2) Can't rollback real no-hole image
Since it assumes hole file extent as requirement.
And due to the possibility to enable no_holes halfway, btrfsck won't
report such error, since it's acceptable.

3) Too complex logical, and RW btrfs tree operations
In fact, considering how small data we need to rewrite (1M + 128K), we
don't really need to open btrfs read-write.
Just copy needed data and re-fill. Simple and easy.

Thanks Chandan, his report on failure of rollback leads to this rework.

Qu Wenruo (3):
  btrfs-progs: file-item: Fix wrong file extents inserted
  btrfs-progs: convert: Rework rollback to handle new convert image
  btrfs-progs: convert-test: trigger chunk allocation after convert

 convert/main.c   | 693 ---
 file-item.c  |  11 +
 tests/common |   4 +
 tests/common.convert |   3 +
 4 files changed, 285 insertions(+), 426 deletions(-)

-- 
2.10.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html