Re: [PATCH] duperemove: test presence of dedupe ioctl

2016-12-15 Thread Christoph Hellwig
On Thu, Dec 15, 2016 at 05:20:14PM -0800, Darrick J. Wong wrote:
> Fair enough, no need to pollute the namespace.

Or confuse poor readers with the 'fake' name :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] duperemove: test presence of dedupe ioctl

2016-12-15 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OOM: Better, but still there on 4.9

2016-12-15 Thread Michal Hocko
[CC linux-mm and btrfs guys]

On Thu 15-12-16 23:57:04, Nils Holland wrote:
[...]
> Of course, none of this are workloads that are new / special in any
> way - prior to 4.8, I never experienced any issues doing the exact
> same things.
> 
> Dec 15 19:02:16 teela kernel: kworker/u4:5 invoked oom-killer: 
> gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, 
> oom_score_adj=0
> Dec 15 19:02:18 teela kernel: kworker/u4:5 cpuset=/ mems_allowed=0
> Dec 15 19:02:18 teela kernel: CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 
> 4.9.0-gentoo #2
> Dec 15 19:02:18 teela kernel: Hardware name: Hewlett-Packard Compaq 15 
> Notebook PC/21F7, BIOS F.22 08/06/2014
> Dec 15 19:02:18 teela kernel: Workqueue: writeback wb_workfn (flush-btrfs-1)
> Dec 15 19:02:18 teela kernel:  eff0b604 c142bcce eff0b734  eff0b634 
> c1163332  0292
> Dec 15 19:02:18 teela kernel:  eff0b634 c1431876 eff0b638 e7fb0b00 e7fa2900 
> e7fa2900 c1b58785 eff0b734
> Dec 15 19:02:18 teela kernel:  eff0b678 c110795f c1043895 eff0b664 c11075c7 
> 0007  
> Dec 15 19:02:18 teela kernel: Call Trace:
> Dec 15 19:02:18 teela kernel:  [] dump_stack+0x47/0x69
> Dec 15 19:02:18 teela kernel:  [] dump_header+0x60/0x178
> Dec 15 19:02:18 teela kernel:  [] ? ___ratelimit+0x86/0xe0
> Dec 15 19:02:18 teela kernel:  [] oom_kill_process+0x20f/0x3d0
> Dec 15 19:02:18 teela kernel:  [] ? has_capability_noaudit+0x15/0x20
> Dec 15 19:02:18 teela kernel:  [] ? oom_badness.part.13+0xb7/0x130
> Dec 15 19:02:18 teela kernel:  [] out_of_memory+0xd9/0x260
> Dec 15 19:02:18 teela kernel:  [] __alloc_pages_nodemask+0xbfb/0xc80
> Dec 15 19:02:18 teela kernel:  [] pagecache_get_page+0xad/0x270
> Dec 15 19:02:18 teela kernel:  [] alloc_extent_buffer+0x116/0x3e0
> Dec 15 19:02:18 teela kernel:  [] 
> btrfs_find_create_tree_block+0xe/0x10
> Dec 15 19:02:18 teela kernel:  [] btrfs_alloc_tree_block+0x1ef/0x5f0
> Dec 15 19:02:18 teela kernel:  [] __btrfs_cow_block+0x143/0x5f0
> Dec 15 19:02:18 teela kernel:  [] btrfs_cow_block+0x13a/0x220
> Dec 15 19:02:18 teela kernel:  [] btrfs_search_slot+0x1d1/0x870
> Dec 15 19:02:18 teela kernel:  [] btrfs_lookup_file_extent+0x4d/0x60
> Dec 15 19:02:18 teela kernel:  [] __btrfs_drop_extents+0x176/0x1070
> Dec 15 19:02:18 teela kernel:  [] ? kmem_cache_alloc+0xb7/0x190
> Dec 15 19:02:18 teela kernel:  [] ? start_transaction+0x65/0x4b0
> Dec 15 19:02:18 teela kernel:  [] ? __kmalloc+0x147/0x1e0
> Dec 15 19:02:18 teela kernel:  [] cow_file_range_inline+0x215/0x6b0
> Dec 15 19:02:18 teela kernel:  [] cow_file_range.isra.49+0x55c/0x6d0
> Dec 15 19:02:18 teela kernel:  [] ? lock_extent_bits+0x75/0x1e0
> Dec 15 19:02:18 teela kernel:  [] run_delalloc_range+0x441/0x470
> Dec 15 19:02:18 teela kernel:  [] 
> writepage_delalloc.isra.47+0x144/0x1e0
> Dec 15 19:02:18 teela kernel:  [] __extent_writepage+0xd8/0x2b0
> Dec 15 19:02:18 teela kernel:  [] extent_writepages+0x25c/0x380
> Dec 15 19:02:18 teela kernel:  [] ? btrfs_real_readdir+0x610/0x610
> Dec 15 19:02:18 teela kernel:  [] btrfs_writepages+0x1f/0x30
> Dec 15 19:02:18 teela kernel:  [] do_writepages+0x15/0x40
> Dec 15 19:02:18 teela kernel:  [] 
> __writeback_single_inode+0x35/0x2f0
> Dec 15 19:02:18 teela kernel:  [] writeback_sb_inodes+0x16e/0x340
> Dec 15 19:02:18 teela kernel:  [] wb_writeback+0xaa/0x280
> Dec 15 19:02:18 teela kernel:  [] wb_workfn+0xd8/0x3e0
> Dec 15 19:02:18 teela kernel:  [] process_one_work+0x114/0x3e0
> Dec 15 19:02:18 teela kernel:  [] worker_thread+0x2f/0x4b0
> Dec 15 19:02:18 teela kernel:  [] ? create_worker+0x180/0x180
> Dec 15 19:02:18 teela kernel:  [] kthread+0x97/0xb0
> Dec 15 19:02:18 teela kernel:  [] ? __kthread_parkme+0x60/0x60
> Dec 15 19:02:18 teela kernel:  [] ret_from_fork+0x1b/0x28
> Dec 15 19:02:18 teela kernel: Mem-Info:
> Dec 15 19:02:18 teela kernel: active_anon:58685 inactive_anon:90 
> isolated_anon:0
>active_file:274324 inactive_file:281962 
> isolated_file:0

OK, so there is still some anonymous memory that could be swapped out
and quite a lot of page cache. This might be harder to reclaim because
the allocation is a GFP_NOFS request which is limited in its reclaim
capabilities. It might be possible that those pagecache pages are pinned
in some way by the the filesystem.

>unevictable:0 dirty:649 writeback:0 unstable:0
>slab_reclaimable:40662 slab_unreclaimable:17754
>mapped:7382 shmem:202 pagetables:351 bounce:0
>free:206736 free_pcp:332 free_cma:0
> Dec 15 19:02:18 teela kernel: Node 0 active_anon:234740kB inactive_anon:360kB 
> active_file:1097296kB inactive_file:1127848kB unevictable:0kB 
> isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB 
> writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 
> 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
> Dec 15 19:02:18 teela 

[PATCH] Btrfs: add another missing end_page_writeback on submit_extent_page failure

2016-12-15 Thread Takafumi Kubota
This is actually inspired by Filipe's patch(55e3bd2e0c2e1).

When submit_extent_page() in __extent_writepage_io() fails,
Btrfs misses clearing a writeback bit of the failed page.
This causes the false under-writeback page.
Then, another sync task hangs in filemap_fdatawait_range(),
because it waits the false under-writeback page.

CPU0CPU1

__extent_writepage_io()
  ret = submit_extent_page() // fail

  if (ret)
SetPageError(page)
// miss clearing the writeback bit

sync()
  ...
  filemap_fdatawait_range()
wait_on_page_writeback(page);
// wait the false under-writeback page

Signed-off-by: Takafumi Kubota 
---
 fs/btrfs/extent_io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 1e67723..ef9793b 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3443,8 +3443,10 @@ static noinline_for_stack int 
__extent_writepage_io(struct inode *inode,
 bdev, >bio, max_nr,
 end_bio_extent_writepage,
 0, 0, 0, false);
-   if (ret)
+   if (ret) {
SetPageError(page);
+   end_page_writeback(page);
+   }
 
cur = cur + iosize;
pg_offset += iosize;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] btrfs-progs: convert: Rework rollback to handle new convert image

2016-12-15 Thread Chandan Rajendra
On Thursday, December 15, 2016 05:03:30 PM Qu Wenruo wrote:
> Although commit 9c4b820412746b3 tried to make the rollback condition
> less restrict, to co-operate with new rollback behavior, it's still too
> restrict.
> 
> If btrfs allocates a new data chunk, it's highly possible that the new
> chunk will not be 1:1 mapped anymore.
> 
> And this makes old rollback check fails, and refuse to rollback.
> 
> This patch rework it by checking rollback condition more accurately.
> 
> 1) Rollback condition
>Unlike old chunk level check, we use file extent level check.
>So we manually check every file extents of convert image file.
> 
>Only when all file extents except ones in btrfs relocated ranges(*)
>are mapped 1:1 we allow rollback.
> 
>This behavior make both old and new behavior happy.
> *:
>[0, 1M)
>[btrfs_sb_offset(1), +64K)
>[btrfs_sb_offset(2), +64K)
> 
> 2) Rollback method
>Old rollback method is quite complex, using extent_io tree to mark
>every checked ranges.
>And do extra chunk tree operation before rollback.
> 
>The new rollback method is quite simple.
>1) open btrfs
>2) read and save relocated data
>3) close btrfs
>4) write relocated into place.
> 
> Such rework fixes the following problem
> 1) rollback failure after new data chunk allocation
> 2) rollback failure after correct NO_HOLES convert

Hi Qu,

With this patch applied, I get the following on an x86_64 machine,

[root@localhost]~/btrfs-progs# btrfs-convert -r /dev/loop0
ctree.c:1112: btrfs_search_slot: Warning: assertion `p->nodes[0] != NULL` 
failed, value 1
btrfs-convert(btrfs_search_slot+0x117)[0x40c906]
btrfs-convert(btrfs_lookup_dir_item+0x70)[0x41d902]
btrfs-convert(main+0x5e2)[0x43af50]
/lib64/libc.so.6(__libc_start_main+0xf0)[0x7f7fb6168700]
btrfs-convert(_start+0x29)[0x408a69]
extent buffer leak: start 67305472 len 16384
rollback complete

The same error occurs on a ppc64 machine when using 64k sectorsize.

The three 'rollback' patches were applied on top of commit
9ce512ac57cb08edf2f742da085c383834f804dd (i.e. btrfs-progs: check: Fix false
alert on generation mismatch for tree reloc tree) that is available on David's
devel branch.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] duperemove: test presence of dedupe ioctl

2016-12-15 Thread Darrick J. Wong
Since a zero-length dedupe operation is guaranteed to succeed, use that
to test whether or not this filesystem supports dedupe.

Signed-off-by: Darrick J. Wong 
---
v2: Don't declare a new type; just declare the struct on the stack.
---
 file_scan.c |   45 +++--
 1 file changed, 35 insertions(+), 10 deletions(-)

diff --git a/file_scan.c b/file_scan.c
index 617f166..2708bfe 100644
--- a/file_scan.c
+++ b/file_scan.c
@@ -45,11 +45,7 @@
 #include "file_scan.h"
 #include "dbfile.h"
 #include "util.h"
-
-/* This is not in linux/magic.h */
-#ifndefXFS_SB_MAGIC
-#defineXFS_SB_MAGIC0x58465342  /* 'XFSB' */
-#endif
+#include "btrfs-ioctl.h"
 
 static char path[PATH_MAX] = { 0, };
 static char *pathp = path;
@@ -189,6 +185,37 @@ static int walk_dir(const char *name)
return ret;
 }
 
+/*
+ * A zero-length dedupe between two files should always succeed,
+ * so we can use this to test the presence of dedupe functionality.
+ */
+static bool check_ioctl_works(int fd)
+{
+   struct {
+   struct btrfs_ioctl_same_args args;
+   struct btrfs_ioctl_same_extent_info info;
+   } sa = {0};
+   struct stat sb;
+   static int cached = -1;
+   int ret;
+
+   if (cached >= 0)
+   return cached != 0;
+
+   ret = fstat(fd, );
+   if (ret)
+   return false;
+
+   sa.args.dest_count = 1;
+   sa.args.length = 0;
+   sa.info.fd = fd;
+   sa.info.logical_offset = 0;
+   errno = 0;
+   ret = btrfs_extent_same(fd, );
+   cached = !ret && !errno && !sa.info.status;
+   return cached;
+}
+
 static int __add_file(const char *name, struct stat *st,
  struct filerec **ret_file)
 {
@@ -235,12 +262,10 @@ static int __add_file(const char *name, struct stat *st,
goto out;
}
 
-   if (run_dedupe &&
-   ((fs.f_type != BTRFS_SUPER_MAGIC &&
- fs.f_type != XFS_SB_MAGIC))) {
+   if (run_dedupe && !check_ioctl_works(fd)) {
close(fd);
-   fprintf(stderr, "\"%s\": Can only dedupe files on btrfs or xfs "
-   "(experimental)\n", name);
+   fprintf(stderr, "\"%s\": dedupe ioctl not supported on this "
+   "filesystem.\n", name);
return ENOSYS;
}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] duperemove: test presence of dedupe ioctl

2016-12-15 Thread Darrick J. Wong
On Wed, Dec 14, 2016 at 11:26:07AM -0800, Christoph Hellwig wrote:
> On Wed, Dec 14, 2016 at 10:38:45AM -0800, Darrick J. Wong wrote:
> > > > +struct fake_btrfs_ioctl_same_args {
> > > > +   struct btrfs_ioctl_same_args args;
> > > > +   struct btrfs_ioctl_same_extent_info info;
> > > > +};
> > > 
> > > Why does this need a fake structure here?
> > 
> > In order to test the ioctl we have to fill out at least one
> > btrfs_ioctl_same_extent_info so that we get far enough into the fs-specific
> > dedupe_range handler that we've verified that the fs is capable of dedupe 
> > and
> > that the fs is willing to try to satisfy the request.
> 
> Oh, got it, it's just the fake that tripped me up.
> 
> > We could just malloc sizeof(_same_args) + sizeof(_same_extent_info)...
> 
> Either that, or more simply just don't give the structure a name
> by just declaring it locally on the stack:
> 
>   struct {
>   struct btrfs_ioctl_same_args args;
>   struct btrfs_ioctl_same_extent_info info;
>   } sa = { 0 };

Fair enough, no need to pollute the namespace.

--D

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: Fix NULL pointer when receive clone operation

2016-12-15 Thread Tsutomu Itoh
On 2016/12/15 17:37, Qu Wenruo wrote:
> Regression introduced by:
> commit a2f7af94abe4a3491ca1280a2ae1d63edc0d62ab
> Author: Prasanth K S R 
> Date:   Sat Dec 10 19:17:43 2016 +0530
> 
> btrfs-progs: subvol_uuid_search: return error encoded pointer
> 
> IS_ERR() will only check if it's an error code, won't check if it's
> NULL.
> And for all the caller the commit modifies, it can return NULL and makes
> cause segfault.
> 
> Fix it by introducing new IS_ERR_OR_NULL() macro, and for NULL pointer
> and needs to return int case, convert NULL pointer to -ENOENT.

This patch also passed xfstests btrfs/{108,109,117}. Thanks for your work.

Thanks,
Tsutomu

> 
> Reported-by: Tsutomu Itoh 
> Signed-off-by: Qu Wenruo 
> ---
>  cmds-receive.c | 16 +++-
>  cmds-send.c| 26 ++
>  kerncompat.h   |  7 ++-
>  3 files changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/cmds-receive.c b/cmds-receive.c
> index cb42aa2..166d37d 100644
> --- a/cmds-receive.c
> +++ b/cmds-receive.c
> @@ -287,13 +287,16 @@ static int process_snapshot(const char *path, const u8 
> *uuid, u64 ctransid,
>   parent_subvol = subvol_uuid_search(>sus, 0, parent_uuid,
>  parent_ctransid, NULL,
>  subvol_search_by_received_uuid);
> - if (IS_ERR(parent_subvol)) {
> + if (IS_ERR_OR_NULL(parent_subvol)) {
>   parent_subvol = subvol_uuid_search(>sus, 0, parent_uuid,
>  parent_ctransid, NULL,
>  subvol_search_by_uuid);
>   }
> - if (IS_ERR(parent_subvol)) {
> - ret = PTR_ERR(parent_subvol);
> + if (IS_ERR_OR_NULL(parent_subvol)) {
> + if (!parent_subvol)
> + ret = -ENOENT;
> + else
> + ret = PTR_ERR(parent_subvol);
>   error("cannot find parent subvolume");
>   goto out;
>   }
> @@ -750,13 +753,16 @@ static int process_clone(const char *path, u64 offset, 
> u64 len,
>   si = subvol_uuid_search(>sus, 0, clone_uuid, clone_ctransid,
>   NULL,
>   subvol_search_by_received_uuid);
> - if (IS_ERR(si)) {
> + if (IS_ERR_OR_NULL(si)) {
>   if (memcmp(clone_uuid, rctx->cur_subvol.received_uuid,
>   BTRFS_UUID_SIZE) == 0) {
>   /* TODO check generation of extent */
>   subvol_path = strdup(rctx->cur_subvol_path);
>   } else {
> - ret = PTR_ERR(si);
> + if (!si)
> + ret = -ENOENT;
> + else
> + ret = PTR_ERR(si);
>   error("clone: did not find source subvol");
>   goto out;
>   }
> diff --git a/cmds-send.c b/cmds-send.c
> index 5da64d8..cec11e6 100644
> --- a/cmds-send.c
> +++ b/cmds-send.c
> @@ -70,8 +70,12 @@ static int get_root_id(struct btrfs_send *sctx, const char 
> *path, u64 *root_id)
>  
>   si = subvol_uuid_search(>sus, 0, NULL, 0, path,
>   subvol_search_by_path);
> - if (IS_ERR(si))
> - return PTR_ERR(si);
> + if (IS_ERR_OR_NULL(si)) {
> + if (!si)
> + return -ENOENT;
> + else
> + return PTR_ERR(si);
> + }
>   *root_id = si->root_id;
>   free(si->path);
>   free(si);
> @@ -85,7 +89,7 @@ static struct subvol_info *get_parent(struct btrfs_send 
> *sctx, u64 root_id)
>  
>   si_tmp = subvol_uuid_search(>sus, root_id, NULL, 0, NULL,
>   subvol_search_by_root_id);
> - if (IS_ERR(si_tmp))
> + if (IS_ERR_OR_NULL(si_tmp))
>   return si_tmp;
>  
>   si = subvol_uuid_search(>sus, 0, si_tmp->parent_uuid, 0, NULL,
> @@ -105,8 +109,11 @@ static int find_good_parent(struct btrfs_send *sctx, u64 
> root_id, u64 *found)
>   int i;
>  
>   parent = get_parent(sctx, root_id);
> - if (IS_ERR(parent)) {
> - ret = PTR_ERR(parent);
> + if (IS_ERR_OR_NULL(parent)) {
> + if (!parent)
> + ret = -ENOENT;
> + else
> + ret = PTR_ERR(parent);
>   goto out;
>   }
>  
> @@ -122,7 +129,7 @@ static int find_good_parent(struct btrfs_send *sctx, u64 
> root_id, u64 *found)
>   s64 tmp;
>  
>   parent2 = get_parent(sctx, sctx->clone_sources[i]);
> - if (IS_ERR(parent2))
> + if (IS_ERR_OR_NULL(parent2))
>   continue;
>   if (parent2->root_id != parent->root_id) {
>   free(parent2->path);
> @@ -136,8 +143,11 @@ static int find_good_parent(struct btrfs_send *sctx, u64 

Re: Server hangs when mount BTRFS filesystem.

2016-12-15 Thread Xin Zhou
Hi Кравцов,

>From the log message, it seems dm-22 has been running out space, probably some 
>checksum did not get committed to disk.
And when trying to repair, it reports checksum missing.


merge_reloc_roots:2426: errno=-28 No space left
Dec 15 00:05:47 OraCI2 kernel: BTRFS warning (device dm-22): Skipping
commit of aborted transaction.
Dec 15 00:05:47 OraCI2 kernel: BTRFS: error (device dm-22) in
cleanup_transaction:1854: errno=-28 No space left
Dec 15 00:05:57 OraCI2 kernel: pending csums is 34287616

...
ERROR: errors found in extent allocation tree or chunk allocation
Fixed 0 roots.
checking free space cache [.]
root 5 inode 28350 errors 1000, some csum missing
root 5 inode 28351 errors 1000, some csum missing

Thanks,
Xin
 
 

Sent: Thursday, December 15, 2016 at 12:58 AM
From: "Кравцов Роман Владимирович" 
To: linux-btrfs@vger.kernel.org
Subject: Server hangs when mount BTRFS filesystem.
Hello.

First, server is hangs when btrfs balance working (see logs below).
After server reset can't mount filesystem.

When trying to execute command

# mount -t btrfs /dev/OraCI2/pes.isuse_bp.stands
/var/lib/docker/db/pes.isuse_bp.stands/pes.isuse_bp.standby.base/

server hangs without any messages and log records.


# btrfs --version
btrfs-progs v4.8.3

# btrfs fi show /dev/mapper/OraCI2-pes.isuse_bp.stands
Label: 'pes.isuse_bp.stands' uuid: ada5d777-565b-48e7-87dc-c58c8ad13466
Total devices 1 FS bytes used 2.24TiB
devid 1 size 3.49TiB used 2.35TiB path
/dev/mapper/OraCI2-pes.isuse_bp.stands



# btrfsck --repair -p /dev/OraCI2/pes.isuse_bp.stands
enabling repair mode
Checking filesystem on /dev/OraCI2/pes.isuse_bp.stands
UUID: ada5d777-565b-48e7-87dc-c58c8ad13466
parent transid verify failed on 2651226128384 wanted 136007 found 136176
parent transid verify failed on 2651226128384 wanted 136007 found 136176
Ignoring transid failure
leaf parent key incorrect 2651226128384
bad block 2651226128384

ERROR: errors found in extent allocation tree or chunk allocation
Fixed 0 roots.
checking free space cache [.]
root 5 inode 28350 errors 1000, some csum missing
root 5 inode 28351 errors 1000, some csum missing
root 5 inode 28354 errors 1000, some csum missing
root 5 inode 28358 errors 1000, some csum missing
root 5 inode 28360 errors 1000, some csum missing
root 5 inode 28361 errors 1000, some csum missing
root 5 inode 28368 errors 1000, some csum missing
root 5 inode 28369 errors 1000, some csum missing
root 5 inode 28370 errors 1000, some csum missing
root 5 inode 28371 errors 1000, some csum missing
root 5 inode 28372 errors 1000, some csum missing
root 5 inode 28373 errors 1000, some csum missing
root 5 inode 28376 errors 1000, some csum missing
root 5 inode 28377 errors 1000, some csum missing
root 5 inode 28378 errors 1000, some csum missing
root 5 inode 28379 errors 1000, some csum missing
root 5 inode 28380 errors 1000, some csum missing
root 5 inode 28381 errors 1000, some csum missing
root 5 inode 28382 errors 1000, some csum missing
root 5 inode 28383 errors 1000, some csum missing
root 5 inode 28384 errors 1000, some csum missing
root 5 inode 28385 errors 1000, some csum missing
root 5 inode 28386 errors 1000, some csum missing
root 5 inode 28387 errors 1000, some csum missing
root 5 inode 28388 errors 1000, some csum missing
root 5 inode 28389 errors 1000, some csum missing
root 5 inode 28390 errors 1000, some csum missing
root 5 inode 28391 errors 1000, some csum missing
root 5 inode 28392 errors 1000, some csum missing
root 5 inode 28393 errors 1000, some csum missing
root 5 inode 28394 errors 1000, some csum missing
root 5 inode 28395 errors 1000, some csum missing
root 5 inode 28396 errors 1000, some csum missing
root 5 inode 55108 errors 1000, some csum missing
root 5 inode 55313 errors 1000, some csum missing
root 5 inode 55314 errors 1000, some csum missing
root 5 inode 55315 errors 1000, some csum missing
root 5 inode 55316 errors 1000, some csum missing
root 5 inode 55317 errors 1000, some csum missing
root 5 inode 55318 errors 1000, some csum missing

checking csums
checking root refs
Recowing metadata block 2651226128384
found 2462630760448 bytes used err is 0
total csum bytes: 2398866488
total tree bytes: 5910593536
total fs tree bytes: 1679392768
total extent tree bytes: 1436450816
btree space waste bytes: 887715010
file data blocks allocated: 459312458981376
referenced 2199769403392
extent buffer leak: start 2651226128384 len 16384


# cat /var/log/messages | grep 'Dec 15 00'
Dec 15 00:02:35 OraCI2 kernel: BTRFS info (device dm-22): found 41156
extents
Dec 15 00:02:35 OraCI2 kernel: BTRFS info (device dm-22): relocating
block group 2568411414528 flags 1
Dec 15 00:02:37 OraCI2 kernel: BTRFS info (device dm-22): found 34939
extents
Dec 15 00:05:47 OraCI2 kernel: use_block_rsv: 20 callbacks suppressed
Dec 15 00:05:47 OraCI2 kernel: [ cut here ]
Dec 15 00:05:47 OraCI2 kernel: WARNING: CPU: 35 PID: 30215 at
fs/btrfs/extent-tree.c:8321 

[PATCH 7/9] jbd2: make the whole kjournald2 kthread NOFS safe

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

kjournald2 is central to the transaction commit processing. As such any
potential allocation from this kernel thread has to be GFP_NOFS. Make
sure to mark the whole kernel thread GFP_NOFS by the memalloc_nofs_save.

Suggested-by: Jan Kara 
Signed-off-by: Michal Hocko 
---
 fs/jbd2/journal.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 8ed971eeab44..6dad8c5d6ddf 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -206,6 +206,13 @@ static int kjournald2(void *arg)
wake_up(>j_wait_done_commit);
 
/*
+* Make sure that no allocations from this kernel thread will ever 
recurse
+* to the fs layer because we are responsible for the transaction commit
+* and any fs involvement might get stuck waiting for the trasn. commit.
+*/
+   memalloc_nofs_save();
+
+   /*
 * And now, wait forever for commit wakeup events.
 */
write_lock(>j_state_lock);
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

xfs has defined PF_FSTRANS to declare a scope GFP_NOFS semantic quite
some time ago. We would like to make this concept more generic and use
it for other filesystems as well. Let's start by giving the flag a
more genric name PF_MEMALLOC_NOFS which is in line with an exiting
PF_MEMALLOC_NOIO already used for the same purpose for GFP_NOIO
contexts. Replace all PF_FSTRANS usage from the xfs code in the first
step before we introduce a full API for it as xfs uses the flag directly
anyway.

This patch doesn't introduce any functional change.

Signed-off-by: Michal Hocko 
---
 fs/xfs/kmem.c |  4 ++--
 fs/xfs/kmem.h |  2 +-
 fs/xfs/libxfs/xfs_btree.c |  2 +-
 fs/xfs/xfs_aops.c |  6 +++---
 fs/xfs/xfs_trans.c| 12 ++--
 include/linux/sched.h |  2 ++
 6 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index 339c696bbc01..a76a05dae96b 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -80,13 +80,13 @@ kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
 * context via PF_MEMALLOC_NOIO to prevent memory reclaim re-entering
 * the filesystem here and potentially deadlocking.
 */
-   if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS))
+   if ((current->flags & PF_MEMALLOC_NOFS) || (flags & KM_NOFS))
noio_flag = memalloc_noio_save();
 
lflags = kmem_flags_convert(flags);
ptr = __vmalloc(size, lflags | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL);
 
-   if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS))
+   if ((current->flags & PF_MEMALLOC_NOFS) || (flags & KM_NOFS))
memalloc_noio_restore(noio_flag);
 
return ptr;
diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index ea3984091d58..e40ddd12900b 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -51,7 +51,7 @@ kmem_flags_convert(xfs_km_flags_t flags)
lflags = GFP_ATOMIC | __GFP_NOWARN;
} else {
lflags = GFP_KERNEL | __GFP_NOWARN;
-   if ((current->flags & PF_FSTRANS) || (flags & KM_NOFS))
+   if ((current->flags & PF_MEMALLOC_NOFS) || (flags & KM_NOFS))
lflags &= ~__GFP_FS;
}
 
diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c
index 21e6a6ab6b9a..a2672ba4dc33 100644
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -2866,7 +2866,7 @@ xfs_btree_split_worker(
struct xfs_btree_split_args *args = container_of(work,
struct xfs_btree_split_args, 
work);
unsigned long   pflags;
-   unsigned long   new_pflags = PF_FSTRANS;
+   unsigned long   new_pflags = PF_MEMALLOC_NOFS;
 
/*
 * we are in a transaction context here, but may also be doing work
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0f56fcd3a5d5..61ca9f9c5a12 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -189,7 +189,7 @@ xfs_setfilesize_trans_alloc(
 * We hand off the transaction to the completion thread now, so
 * clear the flag here.
 */
-   current_restore_flags_nested(>t_pflags, PF_FSTRANS);
+   current_restore_flags_nested(>t_pflags, PF_MEMALLOC_NOFS);
return 0;
 }
 
@@ -252,7 +252,7 @@ xfs_setfilesize_ioend(
 * thus we need to mark ourselves as being in a transaction manually.
 * Similarly for freeze protection.
 */
-   current_set_flags_nested(>t_pflags, PF_FSTRANS);
+   current_set_flags_nested(>t_pflags, PF_MEMALLOC_NOFS);
__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
 
/* we abort the update if there was an IO error */
@@ -1015,7 +1015,7 @@ xfs_do_writepage(
 * Given that we do not allow direct reclaim to call us, we should
 * never be called while in a filesystem transaction.
 */
-   if (WARN_ON_ONCE(current->flags & PF_FSTRANS))
+   if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
goto redirty;
 
/*
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 70f42ea86dfb..f5969c8274fc 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -134,7 +134,7 @@ xfs_trans_reserve(
boolrsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0;
 
/* Mark this thread as being in a transaction */
-   current_set_flags_nested(>t_pflags, PF_FSTRANS);
+   current_set_flags_nested(>t_pflags, PF_MEMALLOC_NOFS);
 
/*
 * Attempt to reserve the needed disk blocks by decrementing
@@ -144,7 +144,7 @@ xfs_trans_reserve(
if (blocks > 0) {
error = xfs_mod_fdblocks(tp->t_mountp, -((int64_t)blocks), 
rsvd);
if (error != 0) {
-   current_restore_flags_nested(>t_pflags, PF_FSTRANS);
+   current_restore_flags_nested(>t_pflags, 
PF_MEMALLOC_NOFS);
   

[PATCH 1/9] lockdep: allow to disable reclaim lockup detection

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

The current implementation of the reclaim lockup detection can lead to
false positives and those even happen and usually lead to tweak the
code to silence the lockdep by using GFP_NOFS even though the context
can use __GFP_FS just fine. See
http://lkml.kernel.org/r/20160512080321.GA18496@dastard as an example.

=
[ INFO: inconsistent lock state ]
4.5.0-rc2+ #4 Tainted: G   O
-
inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:

(_nondir_ilock_class){-+}, at: [] 
xfs_ilock+0x177/0x200 [xfs]

{RECLAIM_FS-ON-R} state was registered at:
  [] mark_held_locks+0x79/0xa0
  [] lockdep_trace_alloc+0xb3/0x100
  [] kmem_cache_alloc+0x33/0x230
  [] kmem_zone_alloc+0x81/0x120 [xfs]
  [] xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
  [] __xfs_refcount_find_shared+0x75/0x580 [xfs]
  [] xfs_refcount_find_shared+0x84/0xb0 [xfs]
  [] xfs_getbmap+0x608/0x8c0 [xfs]
  [] xfs_vn_fiemap+0xab/0xc0 [xfs]
  [] do_vfs_ioctl+0x498/0x670
  [] SyS_ioctl+0x79/0x90
  [] entry_SYSCALL_64_fastpath+0x12/0x6f

   CPU0
   
  lock(_nondir_ilock_class);
  
lock(_nondir_ilock_class);

 *** DEADLOCK ***

3 locks held by kswapd0/543:

stack backtrace:
CPU: 0 PID: 543 Comm: kswapd0 Tainted: G   O4.5.0-rc2+ #4

Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006

 82a34f10 88003aa078d0 813a14f9 88003d8551c0
 88003aa07920 8110ec65  0001
 8801 000b 0008 88003d855aa0
Call Trace:
 [] dump_stack+0x4b/0x72
 [] print_usage_bug+0x215/0x240
 [] mark_lock+0x1f5/0x660
 [] ? print_shortest_lock_dependencies+0x1a0/0x1a0
 [] __lock_acquire+0xa80/0x1e50
 [] ? kmem_cache_alloc+0x15e/0x230
 [] ? kmem_zone_alloc+0x81/0x120 [xfs]
 [] lock_acquire+0xd8/0x1e0
 [] ? xfs_ilock+0x177/0x200 [xfs]
 [] ? xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
 [] down_write_nested+0x5e/0xc0
 [] ? xfs_ilock+0x177/0x200 [xfs]
 [] xfs_ilock+0x177/0x200 [xfs]
 [] xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
 [] xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
 [] evict+0xc5/0x190
 [] dispose_list+0x39/0x60
 [] prune_icache_sb+0x4b/0x60
 [] super_cache_scan+0x14f/0x1a0
 [] shrink_slab.part.63.constprop.79+0x1e9/0x4e0
 [] shrink_zone+0x15e/0x170
 [] kswapd+0x4f1/0xa80
 [] ? zone_reclaim+0x230/0x230
 [] kthread+0xf2/0x110
 [] ? kthread_create_on_node+0x220/0x220
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x220/0x220

To quote Dave:
"
Ignoring whether reflink should be doing anything or not, that's a
"xfs_refcountbt_init_cursor() gets called both outside and inside
transactions" lockdep false positive case. The problem here is
lockdep has seen this allocation from within a transaction, hence a
GFP_NOFS allocation, and now it's seeing it in a GFP_KERNEL context.
Also note that we have an active reference to this inode.

So, because the reclaim annotations overload the interrupt level
detections and it's seen the inode ilock been taken in reclaim
("interrupt") context, this triggers a reclaim context warning where
it thinks it is unsafe to do this allocation in GFP_KERNEL context
holding the inode ilock...
"

This sounds like a fundamental problem of the reclaim lock detection.
It is really impossible to annotate such a special usecase IMHO unless
the reclaim lockup detection is reworked completely. Until then it
is much better to provide a way to add "I know what I am doing flag"
and mark problematic places. This would prevent from abusing GFP_NOFS
flag which has a runtime effect even on configurations which have
lockdep disabled.

Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to
skip the current allocation request.

While we are at it also make sure that the radix tree doesn't
accidentaly override tags stored in the upper part of the gfp_mask.

Suggested-by: Peter Zijlstra 
Acked-by: Peter Zijlstra (Intel) 
Signed-off-by: Michal Hocko 
---
 include/linux/gfp.h  | 10 +-
 kernel/locking/lockdep.c |  4 
 lib/radix-tree.c |  2 ++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 4175dca4ac39..1a934383cc20 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -41,6 +41,11 @@ struct vm_area_struct;
 #define ___GFP_OTHER_NODE  0x80u
 #define ___GFP_WRITE   0x100u
 #define ___GFP_KSWAPD_RECLAIM  0x200u
+#ifdef CONFIG_LOCKDEP
+#define ___GFP_NOLOCKDEP   0x400u
+#else
+#define ___GFP_NOLOCKDEP   0
+#endif
 /* If the above are modified, __GFP_BITS_SHIFT may need updating */
 
 /*
@@ -186,8 +191,11 @@ struct vm_area_struct;
 #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK)
 #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE)
 
+/* Disable lockdep for GFP context 

[PATCH 9/9] Revert "ext4: fix wrong gfp type under transaction"

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

This reverts commit 216553c4b7f3e3e2beb4981cddca9b2027523928. Now that
the transaction context uses memalloc_nofs_save and all allocations
within the this context inherit GFP_NOFS automatically, there is no
reason to mark specific allocations explicitly.

This patch should not introduce any functional change. The main point
of this change is to reduce explicit GFP_NOFS usage inside ext4 code
to make the review of the remaining usage easier.

Signed-off-by: Michal Hocko 
---
 fs/ext4/acl.c | 6 +++---
 fs/ext4/extents.c | 2 +-
 fs/ext4/resize.c  | 4 ++--
 fs/ext4/xattr.c   | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index fd389935ecd1..9e98092c2a4b 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -32,7 +32,7 @@ ext4_acl_from_disk(const void *value, size_t size)
return ERR_PTR(-EINVAL);
if (count == 0)
return NULL;
-   acl = posix_acl_alloc(count, GFP_NOFS);
+   acl = posix_acl_alloc(count, GFP_KERNEL);
if (!acl)
return ERR_PTR(-ENOMEM);
for (n = 0; n < count; n++) {
@@ -94,7 +94,7 @@ ext4_acl_to_disk(const struct posix_acl *acl, size_t *size)
 
*size = ext4_acl_size(acl->a_count);
ext_acl = kmalloc(sizeof(ext4_acl_header) + acl->a_count *
-   sizeof(ext4_acl_entry), GFP_NOFS);
+   sizeof(ext4_acl_entry), GFP_KERNEL);
if (!ext_acl)
return ERR_PTR(-ENOMEM);
ext_acl->a_version = cpu_to_le32(EXT4_ACL_VERSION);
@@ -159,7 +159,7 @@ ext4_get_acl(struct inode *inode, int type)
}
retval = ext4_xattr_get(inode, name_index, "", NULL, 0);
if (retval > 0) {
-   value = kmalloc(retval, GFP_NOFS);
+   value = kmalloc(retval, GFP_KERNEL);
if (!value)
return ERR_PTR(-ENOMEM);
retval = ext4_xattr_get(inode, name_index, "", value, retval);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ef815eb72389..c901d89f0038 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2933,7 +2933,7 @@ int ext4_ext_remove_space(struct inode *inode, 
ext4_lblk_t start,
le16_to_cpu(path[k].p_hdr->eh_entries)+1;
} else {
path = kzalloc(sizeof(struct ext4_ext_path) * (depth + 1),
-  GFP_NOFS);
+  GFP_KERNEL);
if (path == NULL) {
ext4_journal_stop(handle);
return -ENOMEM;
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index cf681004b196..e121f4e048b8 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -816,7 +816,7 @@ static int add_new_gdb(handle_t *handle, struct inode 
*inode,
 
n_group_desc = ext4_kvmalloc((gdb_num + 1) *
 sizeof(struct buffer_head *),
-GFP_NOFS);
+GFP_KERNEL);
if (!n_group_desc) {
err = -ENOMEM;
ext4_warning(sb, "not enough memory for %lu groups",
@@ -943,7 +943,7 @@ static int reserve_backup_gdb(handle_t *handle, struct 
inode *inode,
int res, i;
int err;
 
-   primary = kmalloc(reserved_gdb * sizeof(*primary), GFP_NOFS);
+   primary = kmalloc(reserved_gdb * sizeof(*primary), GFP_KERNEL);
if (!primary)
return -ENOMEM;
 
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 5a94fa52b74f..172317462238 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -875,7 +875,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
 
unlock_buffer(bs->bh);
ea_bdebug(bs->bh, "cloning");
-   s->base = kmalloc(bs->bh->b_size, GFP_NOFS);
+   s->base = kmalloc(bs->bh->b_size, GFP_KERNEL);
error = -ENOMEM;
if (s->base == NULL)
goto cleanup;
@@ -887,7 +887,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
}
} else {
/* Allocate a buffer where we construct the new block. */
-   s->base = kzalloc(sb->s_blocksize, GFP_NOFS);
+   s->base = kzalloc(sb->s_blocksize, GFP_KERNEL);
/* assert(header == s->base) */
error = -ENOMEM;
if (s->base == NULL)
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] xfs: introduce and use KM_NOLOCKDEP to silence reclaim lockdep false positives

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

Now that the page allocator offers __GFP_NOLOCKDEP let's introduce
KM_NOLOCKDEP alias for the xfs allocation APIs. While we are at it
also change KM_NOFS users introduced by b17cb364dbbb ("xfs: fix missing
KM_NOFS tags to keep lockdep happy") and use the new flag for them
instead. There is really no reason to make these allocations contexts
weaker just because of the lockdep which even might not be enabled
in most cases.

Signed-off-by: Michal Hocko 
---
 fs/xfs/kmem.h| 4 
 fs/xfs/libxfs/xfs_da_btree.c | 4 ++--
 fs/xfs/xfs_buf.c | 2 +-
 fs/xfs/xfs_dir2_readdir.c| 2 +-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index 689f746224e7..ea3984091d58 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -33,6 +33,7 @@ typedef unsigned __bitwise xfs_km_flags_t;
 #define KM_NOFS((__force xfs_km_flags_t)0x0004u)
 #define KM_MAYFAIL ((__force xfs_km_flags_t)0x0008u)
 #define KM_ZERO((__force xfs_km_flags_t)0x0010u)
+#define KM_NOLOCKDEP   ((__force xfs_km_flags_t)0x0020u)
 
 /*
  * We use a special process flag to avoid recursive callbacks into
@@ -57,6 +58,9 @@ kmem_flags_convert(xfs_km_flags_t flags)
if (flags & KM_ZERO)
lflags |= __GFP_ZERO;
 
+   if (flags & KM_NOLOCKDEP)
+   lflags |= __GFP_NOLOCKDEP;
+
return lflags;
 }
 
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
index f2dc1a950c85..b8b5f6914863 100644
--- a/fs/xfs/libxfs/xfs_da_btree.c
+++ b/fs/xfs/libxfs/xfs_da_btree.c
@@ -2429,7 +2429,7 @@ xfs_buf_map_from_irec(
 
if (nirecs > 1) {
map = kmem_zalloc(nirecs * sizeof(struct xfs_buf_map),
- KM_SLEEP | KM_NOFS);
+ KM_SLEEP | KM_NOLOCKDEP);
if (!map)
return -ENOMEM;
*mapp = map;
@@ -2488,7 +2488,7 @@ xfs_dabuf_map(
 */
if (nfsb != 1)
irecs = kmem_zalloc(sizeof(irec) * nfsb,
-   KM_SLEEP | KM_NOFS);
+   KM_SLEEP | KM_NOLOCKDEP);
 
nirecs = nfsb;
error = xfs_bmapi_read(dp, (xfs_fileoff_t)bno, nfsb, irecs,
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 7f0a01f7b592..f31ae592dcae 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1785,7 +1785,7 @@ xfs_alloc_buftarg(
 {
xfs_buftarg_t   *btp;
 
-   btp = kmem_zalloc(sizeof(*btp), KM_SLEEP | KM_NOFS);
+   btp = kmem_zalloc(sizeof(*btp), KM_SLEEP | KM_NOLOCKDEP);
 
btp->bt_mount = mp;
btp->bt_dev =  bdev->bd_dev;
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index 003a99b83bd8..033ed65d7ce6 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -503,7 +503,7 @@ xfs_dir2_leaf_getdents(
length = howmany(bufsize + geo->blksize, (1 << geo->fsblog));
map_info = kmem_zalloc(offsetof(struct xfs_dir2_leaf_map_info, map) +
(length * sizeof(struct xfs_bmbt_irec)),
-  KM_SLEEP | KM_NOFS);
+  KM_SLEEP | KM_NOLOCKDEP);
map_info->map_size = length;
 
/*
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] Revert "ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp"

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

This reverts commit c45653c341f5c8a0ce19c8f0ad4678640849cb86 because
sb_getblk_gfp is not really needed as
sb_getblk
  __getblk_gfp
__getblk_slow
  grow_buffers
grow_dev_page
  gfp_mask = mapping_gfp_constraint(inode->i_mapping, ~__GFP_FS) | gfp

so __GFP_FS is cleared unconditionally and therefore the above commit
didn't have any real effect in fact.

This patch should not introduce any functional change. The main point
of this change is to reduce explicit GFP_NOFS usage inside ext4 code to
make the review of the remaining usage easier.

Signed-off-by: Michal Hocko 
---
 fs/ext4/extents.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index b1f8416923ab..ef815eb72389 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -518,7 +518,7 @@ __read_extent_tree_block(const char *function, unsigned int 
line,
struct buffer_head  *bh;
int err;
 
-   bh = sb_getblk_gfp(inode->i_sb, pblk, __GFP_MOVABLE | GFP_NOFS);
+   bh = sb_getblk(inode->i_sb, pblk);
if (unlikely(!bh))
return ERR_PTR(-ENOMEM);
 
@@ -1096,7 +1096,7 @@ static int ext4_ext_split(handle_t *handle, struct inode 
*inode,
err = -EFSCORRUPTED;
goto cleanup;
}
-   bh = sb_getblk_gfp(inode->i_sb, newblock, __GFP_MOVABLE | GFP_NOFS);
+   bh = sb_getblk(inode->i_sb, newblock);
if (unlikely(!bh)) {
err = -ENOMEM;
goto cleanup;
@@ -1290,7 +1290,7 @@ static int ext4_ext_grow_indepth(handle_t *handle, struct 
inode *inode,
if (newblock == 0)
return err;
 
-   bh = sb_getblk_gfp(inode->i_sb, newblock, __GFP_MOVABLE | GFP_NOFS);
+   bh = sb_getblk(inode->i_sb, newblock);
if (unlikely(!bh))
return -ENOMEM;
lock_buffer(bh);
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] mm: introduce memalloc_nofs_{save,restore} API

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

GFP_NOFS context is used for the following 5 reasons currently
- to prevent from deadlocks when the lock held by the allocation
  context would be needed during the memory reclaim
- to prevent from stack overflows during the reclaim because
  the allocation is performed from a deep context already
- to prevent lockups when the allocation context depends on
  other reclaimers to make a forward progress indirectly
- just in case because this would be safe from the fs POV
- silence lockdep false positives

Unfortunately overuse of this allocation context brings some problems
to the MM. Memory reclaim is much weaker (especially during heavy FS
metadata workloads), OOM killer cannot be invoked because the MM layer
doesn't have enough information about how much memory is freeable by the
FS layer.

In many cases it is far from clear why the weaker context is even used
and so it might be used unnecessarily. We would like to get rid of
those as much as possible. One way to do that is to use the flag in
scopes rather than isolated cases. Such a scope is declared when really
necessary, tracked per task and all the allocation requests from within
the context will simply inherit the GFP_NOFS semantic.

Not only this is easier to understand and maintain because there are
much less problematic contexts than specific allocation requests, this
also helps code paths where FS layer interacts with other layers (e.g.
crypto, security modules, MM etc...) and there is no easy way to convey
the allocation context between the layers.

Introduce memalloc_nofs_{save,restore} API to control the scope
of GFP_NOFS allocation context. This is basically copying
memalloc_noio_{save,restore} API we have for other restricted allocation
context GFP_NOIO. The PF_MEMALLOC_NOFS flag already exists and it is
just an alias for PF_FSTRANS which has been xfs specific until recently.
There are no more PF_FSTRANS users anymore so let's just drop it.

PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO. memalloc_noio_flags
is renamed to current_gfp_context because it now cares about both
PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts. Xfs code paths preserve
their semantic. kmem_flags_convert() doesn't need to evaluate the flag
anymore.

This patch shouldn't introduce any functional changes.

Let's hope that filesystems will drop direct GFP_NOFS (resp. ~__GFP_FS)
usage as much as possible and only use a properly documented
memalloc_nofs_{save,restore} checkpoints where they are appropriate.

Signed-off-by: Michal Hocko 
---
 fs/xfs/kmem.h|  2 +-
 include/linux/gfp.h  |  8 
 include/linux/sched.h| 34 ++
 kernel/locking/lockdep.c |  2 +-
 mm/page_alloc.c  |  8 +---
 mm/vmscan.c  |  6 +++---
 6 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h
index e40ddd12900b..afaa3e059076 100644
--- a/fs/xfs/kmem.h
+++ b/fs/xfs/kmem.h
@@ -51,7 +51,7 @@ kmem_flags_convert(xfs_km_flags_t flags)
lflags = GFP_ATOMIC | __GFP_NOWARN;
} else {
lflags = GFP_KERNEL | __GFP_NOWARN;
-   if ((current->flags & PF_MEMALLOC_NOFS) || (flags & KM_NOFS))
+   if (flags & KM_NOFS)
lflags &= ~__GFP_FS;
}
 
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 1a934383cc20..bfe53d95c25b 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -217,8 +217,16 @@ struct vm_area_struct;
  *
  * GFP_NOIO will use direct reclaim to discard clean pages or slab pages
  *   that do not require the starting of any physical IO.
+ *   Please try to avoid using this flag directly and instead use
+ *   memalloc_noio_{save,restore} to mark the whole scope which cannot
+ *   perform any IO with a short explanation why. All allocation requests
+ *   will inherit GFP_NOIO implicitly.
  *
  * GFP_NOFS will use direct reclaim but will not use any filesystem interfaces.
+ *   Please try to avoid using this flag directly and instead use
+ *   memalloc_nofs_{save,restore} to mark the whole scope which 
cannot/shouldn't
+ *   recurse into the FS layer with a short explanation why. All allocation
+ *   requests will inherit GFP_NOFS implicitly.
  *
  * GFP_USER is for userspace allocations that also need to be directly
  *   accessibly by the kernel or hardware. It is typically used by hardware
diff --git a/include/linux/sched.h b/include/linux/sched.h
index baffd340ea82..1c9fbcbcfcc8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2307,9 +2307,9 @@ extern void thread_group_cputime_adjusted(struct 
task_struct *p, cputime_t *ut,
 #define PF_USED_ASYNC  0x4000  /* used async_schedule*(), used by 
module init */
 #define PF_NOFREEZE0x8000  /* this thread should 

[PATCH 5/9] xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio*

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

kmem_zalloc_large and _xfs_buf_map_pages use memalloc_noio_{save,restore}
API to prevent from reclaim recursion into the fs because vmalloc can
invoke unconditional GFP_KERNEL allocations and these functions might be
called from the NOFS contexts. The memalloc_noio_save will enforce
GFP_NOIO context which is even weaker than GFP_NOFS and that seems to be
unnecessary. Let's use memalloc_nofs_{save,restore} instead as it should
provide exactly what we need here - implicit GFP_NOFS context.

Signed-off-by: Michal Hocko 
---
 fs/xfs/kmem.c| 10 +-
 fs/xfs/xfs_buf.c |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
index a76a05dae96b..d69ed5e76621 100644
--- a/fs/xfs/kmem.c
+++ b/fs/xfs/kmem.c
@@ -65,7 +65,7 @@ kmem_alloc(size_t size, xfs_km_flags_t flags)
 void *
 kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
 {
-   unsigned noio_flag = 0;
+   unsigned nofs_flag = 0;
void*ptr;
gfp_t   lflags;
 
@@ -80,14 +80,14 @@ kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
 * context via PF_MEMALLOC_NOIO to prevent memory reclaim re-entering
 * the filesystem here and potentially deadlocking.
 */
-   if ((current->flags & PF_MEMALLOC_NOFS) || (flags & KM_NOFS))
-   noio_flag = memalloc_noio_save();
+   if (flags & KM_NOFS)
+   nofs_flag = memalloc_nofs_save();
 
lflags = kmem_flags_convert(flags);
ptr = __vmalloc(size, lflags | __GFP_HIGHMEM | __GFP_ZERO, PAGE_KERNEL);
 
-   if ((current->flags & PF_MEMALLOC_NOFS) || (flags & KM_NOFS))
-   memalloc_noio_restore(noio_flag);
+   if (flags & KM_NOFS)
+   memalloc_nofs_restore(nofs_flag);
 
return ptr;
 }
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f31ae592dcae..5c6f9bd4d8be 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -441,17 +441,17 @@ _xfs_buf_map_pages(
bp->b_addr = NULL;
} else {
int retried = 0;
-   unsigned noio_flag;
+   unsigned nofs_flag;
 
/*
 * vm_map_ram() will allocate auxillary structures (e.g.
 * pagetables) with GFP_KERNEL, yet we are likely to be under
 * GFP_NOFS context here. Hence we need to tell memory reclaim
-* that we are in such a context via PF_MEMALLOC_NOIO to prevent
+* that we are in such a context via PF_MEMALLOC_NOFS to prevent
 * memory reclaim re-entering the filesystem here and
 * potentially deadlocking.
 */
-   noio_flag = memalloc_noio_save();
+   nofs_flag = memalloc_nofs_save();
do {
bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count,
-1, PAGE_KERNEL);
@@ -459,7 +459,7 @@ _xfs_buf_map_pages(
break;
vm_unmap_aliases();
} while (retried++ <= 1);
-   memalloc_noio_restore(noio_flag);
+   memalloc_noio_restore(nofs_flag);
 
if (!bp->b_addr)
return -ENOMEM;
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9 v2] scope GFP_NOFS api

2016-12-15 Thread Michal Hocko
Hi,
I have posted the previous version here [1]. Since then I have added a
support to suppress reclaim lockdep warnings (__GFP_NOLOCKDEP) to allow
removing GFP_NOFS usage motivated by the lockdep false positives. On top
of that I've tried to convert few KM_NOFS usages to use the new flag in
the xfs code base. This would need a review from somebody familiar with
xfs of course.

Then I've added the new scope API to the jbd/ext transaction code +
reverted some explicit GFP_NOFS usages which are covered by the scope one
now. This also needs a deep review from ext developers. I have some more
patches which remove more explicit GFP_NOFS users but that is not really
ready yet. I would really appreciate if developers for other filesystems
joined me here as well. Maybe ext parts can help to show how to start.
Especially btrfs which uses GFP_NOFS a lot (and not with a good reason
in many cases I suspect).

The patchset is based on linux-next (next-20161214).

I think the GFP_NOIO should be seeing the same clean up but that is not
a part of this patchset.

Any feedback is highly appreciated of course.

Diffstat says
 fs/ext4/acl.c|  6 +++---
 fs/ext4/extents.c|  8 
 fs/ext4/resize.c |  4 ++--
 fs/ext4/xattr.c  |  4 ++--
 fs/jbd2/journal.c|  7 +++
 fs/jbd2/transaction.c| 11 +++
 fs/xfs/kmem.c| 10 +-
 fs/xfs/kmem.h|  6 +-
 fs/xfs/libxfs/xfs_btree.c|  2 +-
 fs/xfs/libxfs/xfs_da_btree.c |  4 ++--
 fs/xfs/xfs_aops.c|  6 +++---
 fs/xfs/xfs_buf.c | 10 +-
 fs/xfs/xfs_dir2_readdir.c|  2 +-
 fs/xfs/xfs_trans.c   | 12 ++--
 include/linux/gfp.h  | 18 +-
 include/linux/jbd2.h |  2 ++
 include/linux/sched.h| 32 ++--
 kernel/locking/lockdep.c |  6 +-
 lib/radix-tree.c |  2 ++
 mm/page_alloc.c  |  8 +---
 mm/vmscan.c  |  6 +++---
 21 files changed, 117 insertions(+), 49 deletions(-)

Shortlog:
Michal Hocko (9):
  lockdep: allow to disable reclaim lockup detection
  xfs: introduce and use KM_NOLOCKDEP to silence reclaim lockdep false 
positives
  xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS
  mm: introduce memalloc_nofs_{save,restore} API
  xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio*
  jbd2: mark the transaction context with the scope GFP_NOFS context
  jbd2: make the whole kjournald2 kthread NOFS safe
  Revert "ext4: avoid deadlocks in the writeback path by using 
sb_getblk_gfp"
  Revert "ext4: fix wrong gfp type under transaction"


[1] http://lkml.kernel.org/r/1461671772-1269-1-git-send-email-mho...@kernel.org


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] jbd2: mark the transaction context with the scope GFP_NOFS context

2016-12-15 Thread Michal Hocko
From: Michal Hocko 

now that we have memalloc_nofs_{save,restore} api we can mark the whole
transaction context as implicitly GFP_NOFS. All allocations will
automatically inherit GFP_NOFS this way. This means that we do not have
to mark any of those requests with GFP_NOFS and moreover all the
ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded
GFP_KERNEL allocations deep inside the vmalloc will be NOFS now.

Signed-off-by: Michal Hocko 
---
 fs/jbd2/transaction.c | 11 +++
 include/linux/jbd2.h  |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index e1652665bd93..35a5d3d76182 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -388,6 +388,11 @@ static int start_this_handle(journal_t *journal, handle_t 
*handle,
 
rwsem_acquire_read(>j_trans_commit_map, 0, 0, _THIS_IP_);
jbd2_journal_free_transaction(new_transaction);
+   /*
+* Make sure that no allocations done while the transaction is
+* open is going to recurse back to the fs layer.
+*/
+   handle->saved_alloc_context = memalloc_nofs_save();
return 0;
 }
 
@@ -466,6 +471,7 @@ handle_t *jbd2__journal_start(journal_t *journal, int 
nblocks, int rsv_blocks,
trace_jbd2_handle_start(journal->j_fs_dev->bd_dev,
handle->h_transaction->t_tid, type,
line_no, nblocks);
+
return handle;
 }
 EXPORT_SYMBOL(jbd2__journal_start);
@@ -1760,6 +1766,11 @@ int jbd2_journal_stop(handle_t *handle)
if (handle->h_rsv_handle)
jbd2_journal_free_reserved(handle->h_rsv_handle);
 free_and_exit:
+   /*
+* scope of th GFP_NOFS context is over here and so we can
+* restore the original alloc context.
+*/
+   memalloc_nofs_restore(handle->saved_alloc_context);
jbd2_free_handle(handle);
return err;
 }
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index dfaa1f4dcb0c..606b6bce3a5b 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -491,6 +491,8 @@ struct jbd2_journal_handle
 
unsigned long   h_start_jiffies;
unsigned inth_requested_credits;
+
+   unsigned intsaved_alloc_context;
 };
 
 
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] Btrfs: code cleanup min/max -> min_t/max_t

2016-12-15 Thread Seraphime Kirkovski
This cleans up the cases where the min/max macros were used with a cast
rather than using directly min_t/max_t.

Signed-off-by: Seraphime Kirkovski 
---
 fs/btrfs/file-item.c | 4 ++--
 fs/btrfs/ioctl.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 5fd5f45..7ae52cf 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -867,8 +867,8 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans,
tmp = min(tmp, (next_offset - file_key.offset) >>
 root->fs_info->sb->s_blocksize_bits);
 
-   tmp = max((u64)1, tmp);
-   tmp = min(tmp, (u64)MAX_CSUM_ITEMS(root, csum_size));
+   tmp = max_t(u64, 1, tmp);
+   tmp = min_t(u64, tmp, MAX_CSUM_ITEMS(root, csum_size));
ins_size = csum_size * tmp;
} else {
ins_size = csum_size;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 8bf0d44..fc6dc14 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -394,7 +394,7 @@ static noinline int btrfs_ioctl_fitrim(struct file *file, 
void __user *arg)
q = bdev_get_queue(device->bdev);
if (blk_queue_discard(q)) {
num_devices++;
-   minlen = min((u64)q->limits.discard_granularity,
+   minlen = min_t(u64, q->limits.discard_granularity,
 minlen);
}
}
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] Btrfs: coding style fixes

2016-12-15 Thread Seraphime Kirkovski
This eliminates the rare uses of `unsigned` instead of `unsigned int`.

Signed-off-by: Seraphime Kirkovski 
---
 fs/btrfs/dir-item.c |  6 ++---
 fs/btrfs/extent-tree.c  | 20 -
 fs/btrfs/extent_io.c| 55 +++--
 fs/btrfs/free-space-cache.c |  4 ++--
 fs/btrfs/inode.c| 12 +-
 fs/btrfs/send.c |  6 ++---
 fs/btrfs/super.c|  4 ++--
 fs/btrfs/volumes.c  |  6 ++---
 8 files changed, 57 insertions(+), 56 deletions(-)

diff --git a/fs/btrfs/dir-item.c b/fs/btrfs/dir-item.c
index ae496fd..2d68c17 100644
--- a/fs/btrfs/dir-item.c
+++ b/fs/btrfs/dir-item.c
@@ -466,7 +466,7 @@ int verify_dir_item(struct btrfs_root *root,
 
if (btrfs_dir_name_len(leaf, dir_item) > namelen) {
btrfs_crit(root->fs_info, "invalid dir item name len: %u",
-  (unsigned)btrfs_dir_data_len(leaf, dir_item));
+  (unsigned int)btrfs_dir_data_len(leaf, dir_item));
return 1;
}
 
@@ -475,8 +475,8 @@ int verify_dir_item(struct btrfs_root *root,
 btrfs_dir_name_len(leaf, dir_item)) > BTRFS_MAX_XATTR_SIZE(root)) {
btrfs_crit(root->fs_info,
   "invalid dir item name + data len: %u + %u",
-  (unsigned)btrfs_dir_name_len(leaf, dir_item),
-  (unsigned)btrfs_dir_data_len(leaf, dir_item));
+  (unsigned int)btrfs_dir_name_len(leaf, dir_item),
+  (unsigned int)btrfs_dir_data_len(leaf, dir_item));
return 1;
}
 
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e5c19d9..cc82eae 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4106,7 +4106,7 @@ static u64 btrfs_reduce_alloc_profile(struct btrfs_root 
*root, u64 flags)
 
 static u64 get_alloc_profile(struct btrfs_root *root, u64 orig_flags)
 {
-   unsigned seq;
+   unsigned int seq;
u64 flags;
 
do {
@@ -5863,13 +5863,13 @@ void btrfs_subvolume_release_metadata(struct btrfs_root 
*root,
  * reserved extents that need to be freed.  This must be called with
  * BTRFS_I(inode)->lock held.
  */
-static unsigned drop_outstanding_extent(struct inode *inode, u64 num_bytes)
+static unsigned int drop_outstanding_extent(struct inode *inode, u64 num_bytes)
 {
-   unsigned drop_inode_space = 0;
-   unsigned dropped_extents = 0;
-   unsigned num_extents = 0;
+   unsigned int drop_inode_space = 0;
+   unsigned int dropped_extents = 0;
+   unsigned int num_extents = 0;
 
-   num_extents = (unsigned)div64_u64(num_bytes +
+   num_extents = (unsigned int)div64_u64(num_bytes +
  BTRFS_MAX_EXTENT_SIZE - 1,
  BTRFS_MAX_EXTENT_SIZE);
ASSERT(num_extents);
@@ -5947,12 +5947,12 @@ int btrfs_delalloc_reserve_metadata(struct inode 
*inode, u64 num_bytes)
struct btrfs_block_rsv *block_rsv = >fs_info->delalloc_block_rsv;
u64 to_reserve = 0;
u64 csum_bytes;
-   unsigned nr_extents = 0;
+   unsigned int nr_extents = 0;
enum btrfs_reserve_flush_enum flush = BTRFS_RESERVE_FLUSH_ALL;
int ret = 0;
bool delalloc_lock = true;
u64 to_free = 0;
-   unsigned dropped;
+   unsigned int dropped;
bool release_extra = false;
 
/* If we are a free space inode we need to not flush since we will be in
@@ -5980,7 +5980,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
num_bytes = ALIGN(num_bytes, root->sectorsize);
 
spin_lock(_I(inode)->lock);
-   nr_extents = (unsigned)div64_u64(num_bytes +
+   nr_extents = (unsigned int)div64_u64(num_bytes +
 BTRFS_MAX_EXTENT_SIZE - 1,
 BTRFS_MAX_EXTENT_SIZE);
BTRFS_I(inode)->outstanding_extents += nr_extents;
@@ -6108,7 +6108,7 @@ void btrfs_delalloc_release_metadata(struct inode *inode, 
u64 num_bytes)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
u64 to_free = 0;
-   unsigned dropped;
+   unsigned int dropped;
 
num_bytes = ALIGN(num_bytes, root->sectorsize);
spin_lock(_I(inode)->lock);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index b92bcf3..4ffb29a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -131,7 +131,7 @@ struct extent_page_data {
unsigned int sync_io:1;
 };
 
-static void add_extent_changeset(struct extent_state *state, unsigned bits,
+static void add_extent_changeset(struct extent_state *state, unsigned int bits,
 struct extent_changeset *changeset,
 int set)
 {
@@ -413,21 +413,21 @@ static void merge_state(struct extent_io_tree *tree,
 }
 
 static void 

[PATCH 3/4] Btrfs: ACCESS_ONCE cleanup

2016-12-15 Thread Seraphime Kirkovski
This replaces ACCESS_ONCE macro with the corresponding
READ|WRITE macros

Signed-off-by: Seraphime Kirkovski 
---
 fs/btrfs/delayed-inode.c |  4 ++--
 fs/btrfs/super.c |  2 +-
 fs/btrfs/transaction.c   | 10 +-
 fs/btrfs/tree-log.h  |  4 ++--
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 6b23ef8..68e0e09 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -85,7 +85,7 @@ static struct btrfs_delayed_node 
*btrfs_get_delayed_node(struct inode *inode)
u64 ino = btrfs_ino(inode);
struct btrfs_delayed_node *node;
 
-   node = ACCESS_ONCE(btrfs_inode->delayed_node);
+   node = READ_ONCE(btrfs_inode->delayed_node);
if (node) {
atomic_inc(>refs);
return node;
@@ -1294,7 +1294,7 @@ void btrfs_remove_delayed_node(struct inode *inode)
 {
struct btrfs_delayed_node *delayed_node;
 
-   delayed_node = ACCESS_ONCE(BTRFS_I(inode)->delayed_node);
+   delayed_node = READ_ONCE(BTRFS_I(inode)->delayed_node);
if (!delayed_node)
return;
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3e25cfe..a445d4cd 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -271,7 +271,7 @@ void __btrfs_abort_transaction(struct btrfs_trans_handle 
*trans,
   function, line, errstr);
return;
}
-   ACCESS_ONCE(trans->transaction->aborted) = errno;
+   WRITE_ONCE(trans->transaction->aborted, errno);
/* Wake up anybody who may be waiting on this transaction */
wake_up(_info->transaction_wait);
wake_up(_info->transaction_blocked_wait);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 1bddef0..e340db1 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -859,14 +859,14 @@ static int __btrfs_end_transaction(struct 
btrfs_trans_handle *trans,
 
if (lock && !atomic_read(>fs_info->open_ioctl_trans) &&
should_end_transaction(trans, root) &&
-   ACCESS_ONCE(cur_trans->state) == TRANS_STATE_RUNNING) {
+   READ_ONCE(cur_trans->state) == TRANS_STATE_RUNNING) {
spin_lock(>trans_lock);
if (cur_trans->state == TRANS_STATE_RUNNING)
cur_trans->state = TRANS_STATE_BLOCKED;
spin_unlock(>trans_lock);
}
 
-   if (lock && ACCESS_ONCE(cur_trans->state) == TRANS_STATE_BLOCKED) {
+   if (lock && READ_ONCE(cur_trans->state) == TRANS_STATE_BLOCKED) {
if (throttle)
return btrfs_commit_transaction(trans, root);
else
@@ -1919,7 +1919,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
int ret;
 
/* Stop the commit early if ->aborted is set */
-   if (unlikely(ACCESS_ONCE(cur_trans->aborted))) {
+   if (unlikely(READ_ONCE(cur_trans->aborted))) {
ret = cur_trans->aborted;
btrfs_end_transaction(trans, root);
return ret;
@@ -2059,7 +2059,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
   atomic_read(_trans->num_writers) == 1);
 
/* ->aborted might be set after the previous check, so check it */
-   if (unlikely(ACCESS_ONCE(cur_trans->aborted))) {
+   if (unlikely(READ_ONCE(cur_trans->aborted))) {
ret = cur_trans->aborted;
goto scrub_continue;
}
@@ -2173,7 +2173,7 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
 * The tasks which save the space cache and inode cache may also
 * update ->aborted, check it.
 */
-   if (unlikely(ACCESS_ONCE(cur_trans->aborted))) {
+   if (unlikely(READ_ONCE(cur_trans->aborted))) {
ret = cur_trans->aborted;
mutex_unlock(>fs_info->tree_log_mutex);
mutex_unlock(>fs_info->reloc_mutex);
diff --git a/fs/btrfs/tree-log.h b/fs/btrfs/tree-log.h
index ab858e3..127eae0 100644
--- a/fs/btrfs/tree-log.h
+++ b/fs/btrfs/tree-log.h
@@ -48,13 +48,13 @@ static inline void btrfs_init_log_ctx(struct btrfs_log_ctx 
*ctx,
 static inline void btrfs_set_log_full_commit(struct btrfs_fs_info *fs_info,
 struct btrfs_trans_handle *trans)
 {
-   ACCESS_ONCE(fs_info->last_trans_log_full_commit) = trans->transid;
+   WRITE_ONCE(fs_info->last_trans_log_full_commit, trans->transid);
 }
 
 static inline int btrfs_need_log_full_commit(struct btrfs_fs_info *fs_info,
 struct btrfs_trans_handle *trans)
 {
-   return ACCESS_ONCE(fs_info->last_trans_log_full_commit) ==
+   return READ_ONCE(fs_info->last_trans_log_full_commit) ==
trans->transid;
 }
 
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to 

Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another

2016-12-15 Thread admin
Hi,

The source is a software raid 5 (md) of 4x 4TB Western Digital RE4 disks. The 
destinations is a hardware raid 5 enclosure containing 4x 8TB Seagate Archival 
disks connected using e-sata. 

I am currently trying Duncans suggestions. With them, the page allocation stall 
doesn't seem to appear and overall system responsiveness is also much better 
during copying.

Thanks,
David Arendt


Xin Zhou – Thu., 15. December 2016 0:24
> Hi,
> 
> The dirty data is in large amount, probably unable to commit to disk.
> And this seems to happen when copying from 7200rpm to 5600rpm disks, 
> according to previous post.
> 
> Probably the I/Os are buffered and pending, unable to get finished in-time.
> It might be helpful to know if this only happens for specific types of 5600 
> rpm disks?
> 
> And are these disks on RAID groups? Thanks.
> Xin
>  
>  
> 
> Sent: Wednesday, December 14, 2016 at 3:38 AM
> From: admin 
> To: "Michal Hocko" 
> Cc: linux-btrfs@vger.kernel.org, linux-ker...@vger.kernel.org, "David Sterba" 
> , "Chris Mason" 
> Subject: Re: page allocation stall in kernel 4.9 when copying files from one 
> btrfs hdd to another
> Hi,
> 
> I verified the log files and see no prior oom killer invocation. 
> Unfortunately the machine has been rebooted since. Next time it happens, I 
> will also look in dmesg.
> 
> Thanks,
> David Arendt
> 
> 
> Michal Hocko – Wed., 14. December 2016 11:31
> > Btw. the stall should be preceded by the OOM killer invocation. Could
> > you share the OOM report please. I am asking because such an OOM killer
> > would be clearly pre-mature as per your meminfo. I am trying to change
> > that code and seeing your numbers might help me.
> >
> > Thanks!
> >
> > On Wed 14-12-16 11:17:43, Michal Hocko wrote:
> > > On Tue 13-12-16 18:11:01, David Arendt wrote:
> > > > Hi,
> > > >
> > > > I receive the following page allocation stall while copying lots of
> > > > large files from one btrfs hdd to another.
> > > >
> > > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls 
> > > > for 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL)
> > > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 
> > > > Tainted: P O 4.9.0 #1
> > > [...]
> > > > Dec 13 13:04:29 server kernel: Call Trace:
> > > > Dec 13 13:04:29 server kernel: [] ? 
> > > > dump_stack+0x46/0x5d
> > > > Dec 13 13:04:29 server kernel: [] ? 
> > > > warn_alloc+0x111/0x130
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > __alloc_pages_nodemask+0xbe8/0xd30
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > pagecache_get_page+0xe4/0x230
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > alloc_extent_buffer+0x10b/0x400
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > btrfs_alloc_tree_block+0x125/0x560
> > >
> > > OK, so this is
> > > find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL)
> > >
> > > The main question is whether this really needs to be NOFS request...
> > >
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > read_extent_buffer_pages+0x21f/0x280
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > __btrfs_cow_block+0x141/0x580
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > btrfs_cow_block+0x100/0x150
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > btrfs_search_slot+0x1e9/0x9c0
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > __set_extent_bit+0x512/0x550
> > > > Dec 13 13:04:33 server kernel: [] ? 
> > > > lookup_inline_extent_backref+0xf5/0x5e0
> > > > Dec 13 13:04:34 server kernel: [] ? 
> > > > set_extent_bit+0x24/0x30
> > > > Dec 13 13:04:34 server kernel: [] ? 
> > > > update_block_group.isra.34+0x114/0x380
> > > > Dec 13 13:04:34 server kernel: [] ? 
> > > > __btrfs_free_extent.isra.35+0xf4/0xd20
> > > > Dec 13 13:04:34 server kernel: [] ? 
> > > > btrfs_merge_delayed_refs+0x61/0x5d0
> > > > Dec 13 13:04:34 server kernel: [] ? 
> > > > __btrfs_run_delayed_refs+0x902/0x10a0
> > > > Dec 13 13:04:34 server kernel: [] ? 
> > > > btrfs_run_delayed_refs+0x90/0x2a0
> > > > Dec 13 13:04:34 server kernel: [] ? 
> > > > delayed_ref_async_start+0x84/0xa0
> > >
> > > What would cause the reclaim recursion?
> > >
> > > > Dec 13 13:04:34 server kernel: Mem-Info:
> > > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34
> > > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105
> > > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189
> > > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a
> > > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241
> > > > free_cma:0
> > >
> > > This speaks for itself. There is a lot of dirty data, basically no
> > > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This
> > > is either a configuraion bug as somebody noted down the thread (setting
> > > the dirty_ratio) or suboptimality of the btrfs code which might request
> > > NOFS even though it is not strictly 

[PATCH v2 3/3] btrfs-progs: convert-test: trigger chunk allocation after convert

2016-12-15 Thread Qu Wenruo
Populate fs after convert so we can trigger data chunk allocation.
This can expose too restrict old rollback condition

Reported-by: Chandan Rajendra 
Signed-off-by: Qu Wenruo 
---
 tests/common | 4 
 tests/common.convert | 3 +++
 2 files changed, 7 insertions(+)

diff --git a/tests/common b/tests/common
index 571118a..4a1330f 100644
--- a/tests/common
+++ b/tests/common
@@ -486,6 +486,10 @@ generate_dataset() {
run_check $SUDO_HELPER ln -s 
"$dirpath/$long_filename" "$dirpath/slow_slink.$num"
done
;;
+   large)
+   run_check $SUDO_HELPER dd if=/dev/urandom bs=32M 
count=1 \
+   of="$dirpath/$dataset_type" bs=32M >/dev/null 2>&1
+   ;;
esac
 }
 
diff --git a/tests/common.convert b/tests/common.convert
index a2d3152..8c9242e 100644
--- a/tests/common.convert
+++ b/tests/common.convert
@@ -160,6 +160,9 @@ convert_test_post_checks_all() {
convert_test_post_check_checksums "$1"
convert_test_post_check_permissions "$2"
convert_test_post_check_acl "$3"
+
+   # Create a large file to trigger data chunk allocation
+   generate_dataset "large"
run_check_umount_test_dev
 }
 
-- 
2.10.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] btrfs-progs: file-item: Fix wrong file extents inserted

2016-12-15 Thread Qu Wenruo
If we specify NO_HOLES incompat feature when converting, the result
image still uses hole file extents.
And further more, the hole is incorrect as its disk_num_bytes is not
zero.

The problem is at btrfs_insert_file_extent() which doesn't check if we
are going to insert hole file extent.

Modify it to skip hole file extents to allow it follow restrict NO_HOLES
flag.

And since no_holes flag can be triggered on half-way, so current fsck
won't report such error, as it consider it as old file holes.

Signed-off-by: Qu Wenruo 
---
 convert/main.c |  2 +-
 file-item.c| 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/convert/main.c b/convert/main.c
index fd6f77b..15f14af 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -336,7 +336,7 @@ static int record_file_blocks(struct blk_iterate_data *data,
   key.offset > cur_off);
fi = btrfs_item_ptr(node, slot, struct btrfs_file_extent_item);
extent_disk_bytenr = btrfs_file_extent_disk_bytenr(node, fi);
-   extent_num_bytes = btrfs_file_extent_disk_num_bytes(node, fi);
+   extent_num_bytes = btrfs_file_extent_num_bytes(node, fi);
BUG_ON(cur_off - key.offset >= extent_num_bytes);
btrfs_release_path();
 
diff --git a/file-item.c b/file-item.c
index 67c0b4f..e462b4b 100644
--- a/file-item.c
+++ b/file-item.c
@@ -36,11 +36,22 @@ int btrfs_insert_file_extent(struct btrfs_trans_handle 
*trans,
 u64 disk_num_bytes, u64 num_bytes)
 {
int ret = 0;
+   int is_hole = 0;
struct btrfs_file_extent_item *item;
struct btrfs_key file_key;
struct btrfs_path *path;
struct extent_buffer *leaf;
 
+   if (offset == 0)
+   is_hole = 1;
+   /* For NO_HOLES, we don't insert hole file extent */
+   if (btrfs_fs_incompat(root->fs_info, NO_HOLES) && is_hole)
+   return 0;
+
+   /* For hole, its disk_bytenr and disk_num_bytes must be 0 */
+   if (is_hole)
+   disk_num_bytes = 0;
+
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
-- 
2.10.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] btrfs-progs: convert: Rework rollback to handle new convert image

2016-12-15 Thread Qu Wenruo
Although commit 9c4b820412746b3 tried to make the rollback condition
less restrict, to co-operate with new rollback behavior, it's still too
restrict.

If btrfs allocates a new data chunk, it's highly possible that the new
chunk will not be 1:1 mapped anymore.

And this makes old rollback check fails, and refuse to rollback.

This patch rework it by checking rollback condition more accurately.

1) Rollback condition
   Unlike old chunk level check, we use file extent level check.
   So we manually check every file extents of convert image file.

   Only when all file extents except ones in btrfs relocated ranges(*)
   are mapped 1:1 we allow rollback.

   This behavior make both old and new behavior happy.
*:
   [0, 1M)
   [btrfs_sb_offset(1), +64K)
   [btrfs_sb_offset(2), +64K)

2) Rollback method
   Old rollback method is quite complex, using extent_io tree to mark
   every checked ranges.
   And do extra chunk tree operation before rollback.

   The new rollback method is quite simple.
   1) open btrfs
   2) read and save relocated data
   3) close btrfs
   4) write relocated into place.

Such rework fixes the following problem
1) rollback failure after new data chunk allocation
2) rollback failure after correct NO_HOLES convert

Reported-by: Chandan Rajendra 
Signed-off-by: Qu Wenruo 
---
 convert/main.c | 697 +++--
 1 file changed, 278 insertions(+), 419 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index 15f14af..425c2f4 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1412,36 +1412,6 @@ fail:
return ret;
 }
 
-static int prepare_system_chunk_sb(struct btrfs_super_block *super)
-{
-   struct btrfs_chunk *chunk;
-   struct btrfs_disk_key *key;
-   u32 sectorsize = btrfs_super_sectorsize(super);
-
-   key = (struct btrfs_disk_key *)(super->sys_chunk_array);
-   chunk = (struct btrfs_chunk *)(super->sys_chunk_array +
-  sizeof(struct btrfs_disk_key));
-
-   btrfs_set_disk_key_objectid(key, BTRFS_FIRST_CHUNK_TREE_OBJECTID);
-   btrfs_set_disk_key_type(key, BTRFS_CHUNK_ITEM_KEY);
-   btrfs_set_disk_key_offset(key, 0);
-
-   btrfs_set_stack_chunk_length(chunk, btrfs_super_total_bytes(super));
-   btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID);
-   btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN);
-   btrfs_set_stack_chunk_type(chunk, BTRFS_BLOCK_GROUP_SYSTEM);
-   btrfs_set_stack_chunk_io_align(chunk, sectorsize);
-   btrfs_set_stack_chunk_io_width(chunk, sectorsize);
-   btrfs_set_stack_chunk_sector_size(chunk, sectorsize);
-   btrfs_set_stack_chunk_num_stripes(chunk, 1);
-   btrfs_set_stack_chunk_sub_stripes(chunk, 0);
-   chunk->stripe.devid = super->dev_item.devid;
-   btrfs_set_stack_stripe_offset(>stripe, 0);
-   memcpy(chunk->stripe.dev_uuid, super->dev_item.uuid, BTRFS_UUID_SIZE);
-   btrfs_set_super_sys_array_size(super, sizeof(*key) + sizeof(*chunk));
-   return 0;
-}
-
 #if BTRFSCONVERT_EXT2
 
 /*
@@ -2548,479 +2518,368 @@ fail:
 }
 
 /*
- * Check if a non 1:1 mapped chunk can be rolled back.
- * For new convert, it's OK while for old convert it's not.
+ * If [start1, start1 + len1) is a subset of [start2, start2 + len2)
+ * return 1.
+ * Else return 0.
  */
-static int may_rollback_chunk(struct btrfs_fs_info *fs_info, u64 bytenr)
+static int is_range_subset(u64 start1, u64 len1, u64 start2, u64 len2)
 {
-   struct btrfs_block_group_cache *bg;
-   struct btrfs_key key;
-   struct btrfs_path path;
-   struct btrfs_root *extent_root = fs_info->extent_root;
-   u64 bg_start;
-   u64 bg_end;
-   int ret;
+   if (start1 >= start2 && start1 + len1 <= start2 + len2)
+   return 1;
+   return 0;
+}
 
-   bg = btrfs_lookup_first_block_group(fs_info, bytenr);
-   if (!bg)
-   return -ENOENT;
-   bg_start = bg->key.objectid;
-   bg_end = bg->key.objectid + bg->key.offset;
+/*
+ * If [start1, start1 + len2) intersects with [start2, start2 + len2)
+ * return 1.
+ * Else return 0.
+ */
+static int is_range_intersect(u64 start1, u64 len1, u64 start2, u64 len2)
+{
+   if (start1 >= start2 + len2 || start1 + len1 <= start2)
+   return 0;
+   return 1;
+}
 
-   key.objectid = bg_end;
-   key.type = BTRFS_METADATA_ITEM_KEY;
-   key.offset = 0;
-   btrfs_init_path();
+/*
+ * Check if a range is a subset of btrfs convert reloc space.
+ */
+static int is_range_subset_of_reloc_space(u64 start, u64 len)
+{
+   /*
+* Must be in one of the ranges, or it's not in btrfs reloc space
+* [0, 1M)
+* [sb_offset(1), + 64K)
+* [sb_offset(2), + 64K)
+*/
+   if (is_range_subset(start, len, 0, 1024 * 1024) ||
+   is_range_subset(start, len, btrfs_sb_offset(1), 64 * 1024) ||
+   

[PATCH v2 0/3] Convert rollback rework for v4.9

2016-12-15 Thread Qu Wenruo
Can be fetched from github:
https://github.com/adam900710/btrfs-progs.git convert_rework_for_4.9

This is mainly to fix problems exposed by Chandan's fix for 64K nodesize.

The problem is, although we're still using old rollback functions, it
has quite a lot of problems to support the new behavior.

1) Can't rollback new convert image with new data chunk
Chunk level check can't handle newly allocated data chunk, which is not
1:1 mapped but completely valid in new behavior.
The last patch will enhance the test case to handle it.

2) Can't rollback real no-hole image
Since it assumes hole file extent as requirement.
And due to the possibility to enable no_holes halfway, btrfsck won't
report such error, since it's acceptable.

3) Too complex logical, and RW btrfs tree operations
In fact, considering how small data we need to rewrite (1M + 128K), we
don't really need to open btrfs read-write.
Just copy needed data and re-fill. Simple and easy.

Thanks Chandan, his report on failure of rollback leads to this rework.

And this is still small fixes, most of the patch are just deleting old codes.
All rollback test cases from btrfs-progs and fstests are passed.

v2:
  Fix a bug that we can still rollback if convert subvolume is just orphaned,
  not deleted. Exposed by btrfs/012.

Qu Wenruo (3):
  btrfs-progs: file-item: Fix wrong file extents inserted
  btrfs-progs: convert: Rework rollback to handle new convert image
  btrfs-progs: convert-test: trigger chunk allocation after convert

 convert/main.c   | 699 ---
 file-item.c  |  11 +
 tests/common |   4 +
 tests/common.convert |   3 +
 4 files changed, 297 insertions(+), 420 deletions(-)

-- 
2.10.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Server hangs when mount BTRFS filesystem.

2016-12-15 Thread Кравцов Роман Владимирович

Hello.

First, server is hangs when btrfs balance working (see logs below).
After server reset can't mount filesystem.

When trying to execute command

# mount -t btrfs /dev/OraCI2/pes.isuse_bp.stands 
/var/lib/docker/db/pes.isuse_bp.stands/pes.isuse_bp.standby.base/


server hangs without any messages and log records.


# btrfs --version
btrfs-progs v4.8.3

# btrfs fi show /dev/mapper/OraCI2-pes.isuse_bp.stands
Label: 'pes.isuse_bp.stands'  uuid: ada5d777-565b-48e7-87dc-c58c8ad13466
Total devices 1 FS bytes used 2.24TiB
	devid1 size 3.49TiB used 2.35TiB path 
/dev/mapper/OraCI2-pes.isuse_bp.stands




# btrfsck --repair -p /dev/OraCI2/pes.isuse_bp.stands
enabling repair mode
Checking filesystem on /dev/OraCI2/pes.isuse_bp.stands
UUID: ada5d777-565b-48e7-87dc-c58c8ad13466
parent transid verify failed on 2651226128384 wanted 136007 found 136176
parent transid verify failed on 2651226128384 wanted 136007 found 136176
Ignoring transid failure
leaf parent key incorrect 2651226128384
bad block 2651226128384

ERROR: errors found in extent allocation tree or chunk allocation
Fixed 0 roots.
checking free space cache [.]
root 5 inode 28350 errors 1000, some csum missing
root 5 inode 28351 errors 1000, some csum missing
root 5 inode 28354 errors 1000, some csum missing
root 5 inode 28358 errors 1000, some csum missing
root 5 inode 28360 errors 1000, some csum missing
root 5 inode 28361 errors 1000, some csum missing
root 5 inode 28368 errors 1000, some csum missing
root 5 inode 28369 errors 1000, some csum missing
root 5 inode 28370 errors 1000, some csum missing
root 5 inode 28371 errors 1000, some csum missing
root 5 inode 28372 errors 1000, some csum missing
root 5 inode 28373 errors 1000, some csum missing
root 5 inode 28376 errors 1000, some csum missing
root 5 inode 28377 errors 1000, some csum missing
root 5 inode 28378 errors 1000, some csum missing
root 5 inode 28379 errors 1000, some csum missing
root 5 inode 28380 errors 1000, some csum missing
root 5 inode 28381 errors 1000, some csum missing
root 5 inode 28382 errors 1000, some csum missing
root 5 inode 28383 errors 1000, some csum missing
root 5 inode 28384 errors 1000, some csum missing
root 5 inode 28385 errors 1000, some csum missing
root 5 inode 28386 errors 1000, some csum missing
root 5 inode 28387 errors 1000, some csum missing
root 5 inode 28388 errors 1000, some csum missing
root 5 inode 28389 errors 1000, some csum missing
root 5 inode 28390 errors 1000, some csum missing
root 5 inode 28391 errors 1000, some csum missing
root 5 inode 28392 errors 1000, some csum missing
root 5 inode 28393 errors 1000, some csum missing
root 5 inode 28394 errors 1000, some csum missing
root 5 inode 28395 errors 1000, some csum missing
root 5 inode 28396 errors 1000, some csum missing
root 5 inode 55108 errors 1000, some csum missing
root 5 inode 55313 errors 1000, some csum missing
root 5 inode 55314 errors 1000, some csum missing
root 5 inode 55315 errors 1000, some csum missing
root 5 inode 55316 errors 1000, some csum missing
root 5 inode 55317 errors 1000, some csum missing
root 5 inode 55318 errors 1000, some csum missing

checking csums
checking root refs
Recowing metadata block 2651226128384
found 2462630760448 bytes used err is 0
total csum bytes: 2398866488
total tree bytes: 5910593536
total fs tree bytes: 1679392768
total extent tree bytes: 1436450816
btree space waste bytes: 887715010
file data blocks allocated: 459312458981376
 referenced 2199769403392
extent buffer leak: start 2651226128384 len 16384


# cat /var/log/messages | grep 'Dec 15 00'
Dec 15 00:02:35 OraCI2 kernel: BTRFS info (device dm-22): found 41156 
extents
Dec 15 00:02:35 OraCI2 kernel: BTRFS info (device dm-22): relocating 
block group 2568411414528 flags 1
Dec 15 00:02:37 OraCI2 kernel: BTRFS info (device dm-22): found 34939 
extents

Dec 15 00:05:47 OraCI2 kernel: use_block_rsv: 20 callbacks suppressed
Dec 15 00:05:47 OraCI2 kernel: [ cut here ]
Dec 15 00:05:47 OraCI2 kernel: WARNING: CPU: 35 PID: 30215 at 
fs/btrfs/extent-tree.c:8321 btrfs_alloc_tree_block+0x3b1/0x4c0 [btrfs]
Dec 15 00:05:47 OraCI2 kernel: Modules linked in: xt_nat veth 
binfmt_misc dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag 
xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter 
nf_nat nf_conntrack bonding sb_edac edac_core x86_pkg_temp_thermal 
coretemp kvm_intel kvm dm_thin_pool irqbypass dm_persistent_data 
dm_bio_prison dm_bufio crct10dif_pclmul crc32_pclmul btrfs 
ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper 
cryptd intel_cstate intel_rapl_perf xor ses enclosure joydev mei_me 
ipmi_devintf scsi_transport_sas iTCO_wdt pcspkr mxm_wmi 
iTCO_vendor_support input_leds mei sg raid6_pq i2c_i801 lpc_ich 
i2c_smbus mfd_core ipmi_si ipmi_msghandler shpchp ioatdma wmi acpi_pad 
acpi_power_meter ip_tables xfs libcrc32c sd_mod
Dec 15 00:05:47 OraCI2 kernel: 

Re: Btrfs progs pre-release 4.9-rc1

2016-12-15 Thread Qu Wenruo



At 12/15/2016 04:41 PM, Tsutomu Itoh wrote:

On 2016/12/15 15:45, Tsutomu Itoh wrote:

On 2016/12/14 23:42, David Sterba wrote:

Hi,

a pre-release has been tagged. Contains almost the entire devel branch from
today. There are small fixes, the lowmem mode of check gets more updates but
still does not work in the --repair mode and is considered experimental.

ETA for 4.9 is in +6 days (2016-12-20).

Minor fixes, docs improvements or more testcases will be still considered for
4.9 release.


xfstests btrfs/{108,109,117} that was working in 4.8.5 will not work properly.

+ ./check btrfs/108
FSTYP -- btrfs
PLATFORM  -- Linux/x86_64 luna 4.9.0
MKFS_OPTIONS  -- /dev/sdb3
MOUNT_OPTIONS -- /dev/sdb3 /test6

btrfs/108 1s ... [failed, exit status 1] - output mismatch (see 
/xfstests/results//btrfs/108.out.bad)
--- tests/btrfs/108.out 2015-10-19 09:55:52.0 +0900
+++ /xfstests/results//btrfs/108.out.bad2016-12-15 15:41:43.771411349 
+0900
@@ -8,6 +8,6 @@
 File digests in the original filesystem:
 fbf36a062ffcbd644b5739c4d683ccc7  SCRATCH_MNT/snap/foo
 5d2c92827a70aad932cfe7363105c55e  SCRATCH_MNT/snap/bar
-File digests in the new filesystem:
-fbf36a062ffcbd644b5739c4d683ccc7  SCRATCH_MNT/snap/foo
-5d2c92827a70aad932cfe7363105c55e  SCRATCH_MNT/snap/bar
+./common/rc: line 2784: 22352 Segmentation fault  (core dumped) "$@" >> 
$seqres.full 2>&1
...
(Run 'diff -u tests/btrfs/108.out /xfstests/results//btrfs/108.out.bad'  to 
see the entire diff)
Ran: btrfs/108
Failures: btrfs/108
Failed 1 of 1 tests


Another problem was found. xfstests btrfs/012 will not succeed.

btrfs/012 58s ... [failed, exit status 1] - output mismatch (see 
/xfstests/results//btrfs/012.out.bad)
--- tests/btrfs/012.out 2015-08-04 16:09:38.0 +0900
+++ /xfstests/results//btrfs/012.out.bad2016-12-15 17:38:10.305009249 
+0900
@@ -1 +1,3 @@
 == QA output created by 012
+btrfs-convert rollback failed
+(see /xfstests/results//btrfs/012.full for details)
...
(Run 'diff -u tests/btrfs/012.out /xfstests/results//btrfs/012.out.bad'  to 
see the entire diff)

Thanks,
Tsutomu


Thanks for the test.

Chandan has reported such regression, after his migrate_super_block() 
fix for 64K nodesize.
But that's not really caused by this patch, just too restrict rollback 
condition.


This can be fixed by the following patches:
[1/3] btrfs-progs: file-item: Fix wrong file extents inserted
[2/3] btrfs-progs: convert: Rework rollback to handle new convert image
[3/3] btrfs-progs: convert-test: trigger chunk allocation after convert

Thanks,
Qu




Thanks,
Tsutomu



Changes:
  * check: many lowmem mode updates
  * send: use splice syscall to copy buffer from kernel
  * receive: new option to dump the stream in textual form
  * convert:
* move sources to own directory
* prevent accounting of blocks beyond end of the device
* make it work with 64k sectorsize
  * mkfs: move sources to own directory
  * defrag: warns if directory used without -r
  * dev stats:
* new option to check stats for non-zero values
* add long option for -z
  * library: version bump to 0.1.2, added subvol_uuid_search2
  * other:
* cleanups
* docs updates

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git

Shortlog:

Adam Borowski (1):
  btrfs-progs: man mkfs: warn about RAID5/6 being experimental

Anand Jain (1):
  btrfs-progs: recursive defrag cleanup duplicate code

Austin S. Hemmelgarn (1):
  btrfs-progs: dev stats: add dev stats returncode option

Chandan Rajendra (3):
  btrfs-progs: Use helper function to access 
btrfs_super_block->sys_chunk_array_size
  btrfs-progs: convert: Prevent accounting blocks beyond end of device
  btrfs-progs: convert: Fix migrate_super_block() to work with 64k 
sectorsize

David Sterba (35):
  btrfs-progs: remove extra newline from messages
  btrfs-progs: use symbolic name for first inode number when searching
  btrfs-progs: send: use splice syscall instead of read/write to transfer 
buffer
  btrfs-progs: send: rename thread callback to read data from kernel
  btrfs-progs: make incompat bit wrappers more compact
  btrfs-progs: receive: rename receive context variable
  btrfs-progs: check: use on-stack path buffer in check_fs_first_inode
  btrfs-progs: check: use on-stack path buffer in check_fs_root_v2
  btrfs-progs: check: use on-stack path buffer in check_fs_roots_v2
  btrfs-progs: send dump: introduce helper for printing escaped path
  btrfs-progs: send dump: print escaped path
  btrfs-progs: send dump: use reentrant variant of localtime
  btrfs-progs: tests: add more gobal option to test 001-btrfs
  btrfs-progs: docs: update receive help and manual page
  btrfs-progs: build: extend pattern rules for standalone directories
  

Re: Btrfs progs pre-release 4.9-rc1

2016-12-15 Thread Tsutomu Itoh
On 2016/12/15 15:45, Tsutomu Itoh wrote:
> On 2016/12/14 23:42, David Sterba wrote:
>> Hi,
>>
>> a pre-release has been tagged. Contains almost the entire devel branch from
>> today. There are small fixes, the lowmem mode of check gets more updates but
>> still does not work in the --repair mode and is considered experimental.
>>
>> ETA for 4.9 is in +6 days (2016-12-20).
>>
>> Minor fixes, docs improvements or more testcases will be still considered for
>> 4.9 release.
> 
> xfstests btrfs/{108,109,117} that was working in 4.8.5 will not work properly.
> 
> + ./check btrfs/108
> FSTYP -- btrfs
> PLATFORM  -- Linux/x86_64 luna 4.9.0
> MKFS_OPTIONS  -- /dev/sdb3
> MOUNT_OPTIONS -- /dev/sdb3 /test6
> 
> btrfs/108 1s ... [failed, exit status 1] - output mismatch (see 
> /xfstests/results//btrfs/108.out.bad)
> --- tests/btrfs/108.out 2015-10-19 09:55:52.0 +0900
> +++ /xfstests/results//btrfs/108.out.bad2016-12-15 15:41:43.771411349 
> +0900
> @@ -8,6 +8,6 @@
>  File digests in the original filesystem:
>  fbf36a062ffcbd644b5739c4d683ccc7  SCRATCH_MNT/snap/foo
>  5d2c92827a70aad932cfe7363105c55e  SCRATCH_MNT/snap/bar
> -File digests in the new filesystem:
> -fbf36a062ffcbd644b5739c4d683ccc7  SCRATCH_MNT/snap/foo
> -5d2c92827a70aad932cfe7363105c55e  SCRATCH_MNT/snap/bar
> +./common/rc: line 2784: 22352 Segmentation fault  (core dumped) "$@" 
> >> $seqres.full 2>&1
> ...
> (Run 'diff -u tests/btrfs/108.out /xfstests/results//btrfs/108.out.bad'  
> to see the entire diff)
> Ran: btrfs/108
> Failures: btrfs/108
> Failed 1 of 1 tests

Another problem was found. xfstests btrfs/012 will not succeed.

btrfs/012 58s ... [failed, exit status 1] - output mismatch (see 
/xfstests/results//btrfs/012.out.bad)
--- tests/btrfs/012.out 2015-08-04 16:09:38.0 +0900
+++ /xfstests/results//btrfs/012.out.bad2016-12-15 17:38:10.305009249 
+0900
@@ -1 +1,3 @@
 == QA output created by 012
+btrfs-convert rollback failed
+(see /xfstests/results//btrfs/012.full for details)
...
(Run 'diff -u tests/btrfs/012.out /xfstests/results//btrfs/012.out.bad'  to 
see the entire diff)

Thanks,
Tsutomu

> 
> Thanks,
> Tsutomu
> 
>>
>> Changes:
>>   * check: many lowmem mode updates
>>   * send: use splice syscall to copy buffer from kernel
>>   * receive: new option to dump the stream in textual form
>>   * convert:
>> * move sources to own directory
>> * prevent accounting of blocks beyond end of the device
>> * make it work with 64k sectorsize
>>   * mkfs: move sources to own directory
>>   * defrag: warns if directory used without -r
>>   * dev stats:
>> * new option to check stats for non-zero values
>> * add long option for -z
>>   * library: version bump to 0.1.2, added subvol_uuid_search2
>>   * other:
>> * cleanups
>> * docs updates
>>
>> Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
>> Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git
>>
>> Shortlog:
>>
>> Adam Borowski (1):
>>   btrfs-progs: man mkfs: warn about RAID5/6 being experimental
>>
>> Anand Jain (1):
>>   btrfs-progs: recursive defrag cleanup duplicate code
>>
>> Austin S. Hemmelgarn (1):
>>   btrfs-progs: dev stats: add dev stats returncode option
>>
>> Chandan Rajendra (3):
>>   btrfs-progs: Use helper function to access 
>> btrfs_super_block->sys_chunk_array_size
>>   btrfs-progs: convert: Prevent accounting blocks beyond end of device
>>   btrfs-progs: convert: Fix migrate_super_block() to work with 64k 
>> sectorsize
>>
>> David Sterba (35):
>>   btrfs-progs: remove extra newline from messages
>>   btrfs-progs: use symbolic name for first inode number when searching
>>   btrfs-progs: send: use splice syscall instead of read/write to 
>> transfer buffer
>>   btrfs-progs: send: rename thread callback to read data from kernel
>>   btrfs-progs: make incompat bit wrappers more compact
>>   btrfs-progs: receive: rename receive context variable
>>   btrfs-progs: check: use on-stack path buffer in check_fs_first_inode
>>   btrfs-progs: check: use on-stack path buffer in check_fs_root_v2
>>   btrfs-progs: check: use on-stack path buffer in check_fs_roots_v2
>>   btrfs-progs: send dump: introduce helper for printing escaped path
>>   btrfs-progs: send dump: print escaped path
>>   btrfs-progs: send dump: use reentrant variant of localtime
>>   btrfs-progs: tests: add more gobal option to test 001-btrfs
>>   btrfs-progs: docs: update receive help and manual page
>>   btrfs-progs: build: extend pattern rules for standalone directories
>>   btrfs-progs: move btrfs-convert to own directory
>>   btrfs-progs: move mkfs.btrfs sources to own directory
>>   btrfs-progs: tests: check for partscan support in 
>> misc/006-partitioned-loopdev
>>   btrfs-progs: run mkfs tests in CI
>>   

[PATCH v2] btrfs-progs: Fix NULL pointer when receive clone operation

2016-12-15 Thread Qu Wenruo
Regression introduced by:
commit a2f7af94abe4a3491ca1280a2ae1d63edc0d62ab
Author: Prasanth K S R 
Date:   Sat Dec 10 19:17:43 2016 +0530

btrfs-progs: subvol_uuid_search: return error encoded pointer

IS_ERR() will only check if it's an error code, won't check if it's
NULL.
And for all the caller the commit modifies, it can return NULL and makes
cause segfault.

Fix it by introducing new IS_ERR_OR_NULL() macro, and for NULL pointer
and needs to return int case, convert NULL pointer to -ENOENT.

Reported-by: Tsutomu Itoh 
Signed-off-by: Qu Wenruo 
---
 cmds-receive.c | 16 +++-
 cmds-send.c| 26 ++
 kerncompat.h   |  7 ++-
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index cb42aa2..166d37d 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -287,13 +287,16 @@ static int process_snapshot(const char *path, const u8 
*uuid, u64 ctransid,
parent_subvol = subvol_uuid_search(>sus, 0, parent_uuid,
   parent_ctransid, NULL,
   subvol_search_by_received_uuid);
-   if (IS_ERR(parent_subvol)) {
+   if (IS_ERR_OR_NULL(parent_subvol)) {
parent_subvol = subvol_uuid_search(>sus, 0, parent_uuid,
   parent_ctransid, NULL,
   subvol_search_by_uuid);
}
-   if (IS_ERR(parent_subvol)) {
-   ret = PTR_ERR(parent_subvol);
+   if (IS_ERR_OR_NULL(parent_subvol)) {
+   if (!parent_subvol)
+   ret = -ENOENT;
+   else
+   ret = PTR_ERR(parent_subvol);
error("cannot find parent subvolume");
goto out;
}
@@ -750,13 +753,16 @@ static int process_clone(const char *path, u64 offset, 
u64 len,
si = subvol_uuid_search(>sus, 0, clone_uuid, clone_ctransid,
NULL,
subvol_search_by_received_uuid);
-   if (IS_ERR(si)) {
+   if (IS_ERR_OR_NULL(si)) {
if (memcmp(clone_uuid, rctx->cur_subvol.received_uuid,
BTRFS_UUID_SIZE) == 0) {
/* TODO check generation of extent */
subvol_path = strdup(rctx->cur_subvol_path);
} else {
-   ret = PTR_ERR(si);
+   if (!si)
+   ret = -ENOENT;
+   else
+   ret = PTR_ERR(si);
error("clone: did not find source subvol");
goto out;
}
diff --git a/cmds-send.c b/cmds-send.c
index 5da64d8..cec11e6 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -70,8 +70,12 @@ static int get_root_id(struct btrfs_send *sctx, const char 
*path, u64 *root_id)
 
si = subvol_uuid_search(>sus, 0, NULL, 0, path,
subvol_search_by_path);
-   if (IS_ERR(si))
-   return PTR_ERR(si);
+   if (IS_ERR_OR_NULL(si)) {
+   if (!si)
+   return -ENOENT;
+   else
+   return PTR_ERR(si);
+   }
*root_id = si->root_id;
free(si->path);
free(si);
@@ -85,7 +89,7 @@ static struct subvol_info *get_parent(struct btrfs_send 
*sctx, u64 root_id)
 
si_tmp = subvol_uuid_search(>sus, root_id, NULL, 0, NULL,
subvol_search_by_root_id);
-   if (IS_ERR(si_tmp))
+   if (IS_ERR_OR_NULL(si_tmp))
return si_tmp;
 
si = subvol_uuid_search(>sus, 0, si_tmp->parent_uuid, 0, NULL,
@@ -105,8 +109,11 @@ static int find_good_parent(struct btrfs_send *sctx, u64 
root_id, u64 *found)
int i;
 
parent = get_parent(sctx, root_id);
-   if (IS_ERR(parent)) {
-   ret = PTR_ERR(parent);
+   if (IS_ERR_OR_NULL(parent)) {
+   if (!parent)
+   ret = -ENOENT;
+   else
+   ret = PTR_ERR(parent);
goto out;
}
 
@@ -122,7 +129,7 @@ static int find_good_parent(struct btrfs_send *sctx, u64 
root_id, u64 *found)
s64 tmp;
 
parent2 = get_parent(sctx, sctx->clone_sources[i]);
-   if (IS_ERR(parent2))
+   if (IS_ERR_OR_NULL(parent2))
continue;
if (parent2->root_id != parent->root_id) {
free(parent2->path);
@@ -136,8 +143,11 @@ static int find_good_parent(struct btrfs_send *sctx, u64 
root_id, u64 *found)
parent2 = subvol_uuid_search(>sus,
sctx->clone_sources[i], NULL, 0, NULL,
subvol_search_by_root_id);
-   if 

Re: [PATCH] btrfs-progs: Fix NULL pointer when receive clone operation

2016-12-15 Thread Qu Wenruo



At 12/15/2016 04:07 PM, Tsutomu Itoh wrote:

On 2016/12/15 16:28, Qu Wenruo wrote:

The subvol_info returned from subvol_uuid_search() can be NULL.
So the branch checking IS_ERR(si) should also check if it's NULL.

Reported-by: Tsutomu Itoh 
Signed-off-by: Qu Wenruo 
---
 cmds-receive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index cb42aa2..c8f2fff 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -750,7 +750,7 @@ static int process_clone(const char *path, u64 offset, u64 
len,
si = subvol_uuid_search(>sus, 0, clone_uuid, clone_ctransid,
NULL,
subvol_search_by_received_uuid);
-   if (IS_ERR(si)) {
+   if (IS_ERR(si) || !si) {


Tested-by: Tsutomu Itoh 

But I like  if (!si || IS_ERR(si)) {

Thanks,
Tsutomu



Thanks for the test.
But it seems that there are more such problems.

I'd better introduce a new macro, IS_ERR_OR_NULL() to handle it.
Since all the modification commit a2f7af94 introduced can return NULL.

I'll update the patchset soon.

Thanks,
Qu


if (memcmp(clone_uuid, rctx->cur_subvol.received_uuid,
BTRFS_UUID_SIZE) == 0) {
/* TODO check generation of extent */








--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Fix NULL pointer when receive clone operation

2016-12-15 Thread Tsutomu Itoh
On 2016/12/15 16:28, Qu Wenruo wrote:
> The subvol_info returned from subvol_uuid_search() can be NULL.
> So the branch checking IS_ERR(si) should also check if it's NULL.
> 
> Reported-by: Tsutomu Itoh 
> Signed-off-by: Qu Wenruo 
> ---
>  cmds-receive.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/cmds-receive.c b/cmds-receive.c
> index cb42aa2..c8f2fff 100644
> --- a/cmds-receive.c
> +++ b/cmds-receive.c
> @@ -750,7 +750,7 @@ static int process_clone(const char *path, u64 offset, 
> u64 len,
>   si = subvol_uuid_search(>sus, 0, clone_uuid, clone_ctransid,
>   NULL,
>   subvol_search_by_received_uuid);
> - if (IS_ERR(si)) {
> + if (IS_ERR(si) || !si) {

Tested-by: Tsutomu Itoh 

But I like  if (!si || IS_ERR(si)) {

Thanks,
Tsutomu

>   if (memcmp(clone_uuid, rctx->cur_subvol.received_uuid,
>   BTRFS_UUID_SIZE) == 0) {
>   /* TODO check generation of extent */
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html