Re: A lot warnings in dmesg while running thunderbird

2016-07-10 Thread Chandan Rajendra
On Friday, July 08, 2016 12:02:35 PM Chris Mason wrote:
> 
> On 07/08/2016 11:02 AM, Gabriel C wrote:
> > On 08.07.2016 14:41, Chris Mason wrote:
> >
> >>
> >>
> >> On 07/08/2016 05:57 AM, Gabriel C wrote:
> >>> 2016-07-07 21:21 GMT+02:00 Chris Mason :
> 
> 
>  On 07/07/2016 06:24 AM, Gabriel C wrote:
> >
> > Hi,
> >
> > while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested
> > other versions )
> > I trigger the following :
> 
> 
>  I definitely thought we had this fixed in v4.7-rc.  Can you easily
>  fsck this filesystem?  Something strange is going on.
> >>>
> >>> Yes , btrfs check and btrfs check  --check-data-csum are fine , no
> >>> errors found.
> >>>
> >>> If you want me to test any patches let me know.
> >>>
> >>
> >> Can you please try a v4.5 stable kernel?  I'm curious if this really
> >> is the same regression that I tried to fix in v4.7
> >>
> >
> > I'm on linux 4.5.7 now and everything is fine. I'm writing this email
> > from thunderbird.. which was not
> > possible in 4.6.3 or 4.7.-rc.
> >
> > Let me know you want me to test other kernels or whatever else may help
> > fixing this problem.
> >
> 
> Can you please run the attached test program:
> 
> gcc -o short-write short-write.c -lpthread
> ./short-write some-new-file-on-btrfs
> 
> I want to see if you're triggering the same problem we've tried to fix, 
> or something else.
>

Hi Chris,

I am able to reproduce the issue with the 'short-write' program. But before
the call trace associated with btrfs_destroy_inode(), I see the following call
trace ...

[ cut here ]
WARNING: CPU: 2 PID: 2311 at 
/home/chandan/repos/linux/fs/btrfs/extent-tree.c:4303 
btrfs_free_reserved_data_space_noquota+0xe8/0x100
Modules linked in:
CPU: 2 PID: 2311 Comm: short-write Not tainted 4.7.0-rc6-ga99cde4 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  8818ceb8ba30 8145c2a1 
  8818ceb8ba70 81056a7c 10cf81346936
 8818bdba4800 1000 8818bdd5ee00 8818bf1bbd84
Call Trace:
 [] dump_stack+0x4d/0x6c
 [] __warn+0xcc/0xf0
 [] warn_slowpath_null+0x18/0x20
 [] btrfs_free_reserved_data_space_noquota+0xe8/0x100
 [] btrfs_clear_bit_hook+0x2f9/0x370
 [] clear_state_bit+0x55/0x1b0
 [] __clear_extent_bit+0x220/0x3b0
 [] ? __btrfs_qgroup_release_data+0x82/0x110
 [] clear_extent_bit+0x25/0x30
 [] btrfs_invalidatepage+0x273/0x2c0
 [] truncate_inode_page+0x83/0x90
 [] truncate_inode_pages_range+0x17a/0x6c0
 [] truncate_pagecache+0x42/0x60
 [] truncate_setsize+0x2d/0x40
 [] btrfs_setattr+0x1ef/0x320
 [] notify_change+0x1dc/0x380
 [] do_truncate+0x61/0xa0
 [] do_sys_ftruncate.constprop.17+0xf9/0x160
 [] SyS_ftruncate+0x9/0x10
 [] entry_SYSCALL_64_fastpath+0x13/0x8f
---[ end trace 5682b0d8e8a631ed ]---


I will continue to debug and find out the root cause.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v12.1 00/15] Btrfs in-band de-duplication

2016-07-10 Thread Qu Wenruo



At 07/11/2016 11:05 AM, Qu Wenruo wrote:

This patchset can be fetched from github:
https://github.com/adam900710/linux.git wang_dedupe_20160711

This version is just a small update, rebased to David's for-next-20160704
branch, and adds a small non-functional patch to co-ordinate dedupe and
subpage size patchset.

With this update, conflict for both dedupe and subpage patchset could be
solved quite easily.


Just forgot changlog:
Changelog:
v2:
  Totally reworked to handle multiple backends
v3:
  Fix a stupid but deadly on-disk backend bug
  Add handle for multiple hash on same bytenr corner case to fix abort
  trans error
  Increase dedup rate by enhancing delayed ref handler for both backend.
  Move dedup_add() to run_delayed_ref() time, to fix abort trans error.
  Increase dedup block size up limit to 8M.
v4:
  Add dedup prop for disabling dedup for given files/dirs.
  Merge inmem_search() and ondisk_search() into generic_search() to save
  some code
  Fix another delayed_ref related bug.
  Use the same mutex for both inmem and ondisk backend.
  Move dedup_add() back to btrfs_finish_ordered_io() to increase dedup
  rate.
v5:
  Reuse compress routine for much simpler dedup function.
  Slightly improved performance due to above modification.
  Fix race between dedup enable/disable
  Fix for false ENOSPC report
v6:
  Further enable/disable race window fix.
  Minor format change according to checkpatch.
v7:
  Fix one concurrency bug with balance.
  Slightly modify return value from -EINVAL to -EOPNOTSUPP for
  btrfs_dedup_ioctl() to allow progs to distinguish unsupported commands
  and wrong parameter.
  Rebased to integration-4.6.
v8:
  Rename 'dedup' to 'dedupe'.
  Add support to allow dedupe and compression work at the same time.
  Fix several balance related bugs. Special thanks to Satoru Takeuchi,
  who exposed most of them.
  Small dedupe hit case performance improvement.
v9:
  Re-order the patchset to completely separate pure in-memory and any
  on-disk format change.
  Fold bug fixes into its original patch.
v10:
  Adding back missing bug fix patch.
  Reduce on-disk item size.
  Hide dedupe ioctl under CONFIG_BTRFS_DEBUG.
v11:
  Remove other backend and props support to focus on the framework and
  in-memory backend. Suggested by David.
  Better disable and buffered write race protection.
  Comprehensive fix to dedupe metadata ENOSPC problem.
v12:
  Stateful 'enable' ioctl and new 'reconf' ioctl
  New FORCE flag for enable ioctl to allow stateless ioctl
  Precious error report and extendable ioctl structure.
v12.1
  Rebase to David's for-next-20160704 branch
  Add co-ordinate patch for subpage and dedupe patchset.




Qu Wenruo (4):
  btrfs: delayed-ref: Add support for increasing data ref under spinlock
  btrfs: dedupe: Inband in-memory only de-duplication implement
  btrfs: relocation: Enhance error handling to avoid BUG_ON
  btrfs: dedupe: Introduce new reconfigure ioctl

Wang Xiaoguang (11):
  btrfs: expand cow_file_range() to support in-band dedup and
subpage-blocksize
  btrfs: dedupe: Introduce dedupe framework and its header
  btrfs: dedupe: Introduce function to initialize dedupe info
  btrfs: dedupe: Introduce function to add hash into in-memory tree
  btrfs: dedupe: Introduce function to remove hash from in-memory tree
  btrfs: dedupe: Introduce function to search for an existing hash
  btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface
  btrfs: ordered-extent: Add support for dedupe
  btrfs: dedupe: Add ioctl for inband dedupelication
  btrfs: improve inode's outstanding_extents computation
  btrfs: dedupe: fix false ENOSPC

 fs/btrfs/Makefile|   2 +-
 fs/btrfs/ctree.h |  25 +-
 fs/btrfs/dedupe.c| 820 +++
 fs/btrfs/dedupe.h| 221 +++
 fs/btrfs/delayed-ref.c   |  30 +-
 fs/btrfs/delayed-ref.h   |   8 +
 fs/btrfs/disk-io.c   |   4 +
 fs/btrfs/extent-tree.c   |  83 +++-
 fs/btrfs/extent_io.c |  63 ++-
 fs/btrfs/extent_io.h |  15 +-
 fs/btrfs/file.c  |  26 +-
 fs/btrfs/free-space-cache.c  |   5 +-
 fs/btrfs/inode-map.c |   4 +-
 fs/btrfs/inode.c | 453 -
 fs/btrfs/ioctl.c |  93 -
 fs/btrfs/ordered-data.c  |  46 ++-
 fs/btrfs/ordered-data.h  |  14 +
 fs/btrfs/relocation.c|  46 ++-
 fs/btrfs/sysfs.c |   2 +
 fs/btrfs/tests/extent-io-tests.c |   6 +-
 fs/btrfs/tests/inode-tests.c |  12 +-
 include/uapi/linux/btrfs.h   |  55 +++
 22 files changed, 1872 insertions(+), 161 deletions(-)
 create mode 100644 fs/btrfs/dedupe.c
 create mode 100644 fs/btrfs/dedupe.h




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v12.1 09/15] btrfs: ordered-extent: Add support for dedupe

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Add ordered-extent support for dedupe.

Note, current ordered-extent support only supports non-compressed source
extent.
Support for compressed source extent will be added later.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/ordered-data.c | 46 ++
 fs/btrfs/ordered-data.h | 13 +
 2 files changed, 55 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index aca8264..eedb292 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -26,6 +26,7 @@
 #include "extent_io.h"
 #include "disk-io.h"
 #include "compression.h"
+#include "dedupe.h"
 
 static struct kmem_cache *btrfs_ordered_extent_cache;
 
@@ -184,7 +185,8 @@ static inline struct rb_node *tree_search(struct 
btrfs_ordered_inode_tree *tree,
  */
 static int __btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
  u64 start, u64 len, u64 disk_len,
- int type, int dio, int compress_type)
+ int type, int dio, int compress_type,
+ struct btrfs_dedupe_hash *hash)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_ordered_inode_tree *tree;
@@ -204,6 +206,33 @@ static int __btrfs_add_ordered_extent(struct inode *inode, 
u64 file_offset,
entry->inode = igrab(inode);
entry->compress_type = compress_type;
entry->truncated_len = (u64)-1;
+   entry->hash = NULL;
+   /*
+* A hash hit means we have already incremented the extents delayed
+* ref.
+* We must handle this even if another process is trying to
+* turn off dedupe, otherwise we will leak a reference.
+*/
+   if (hash && (hash->bytenr || root->fs_info->dedupe_enabled)) {
+   struct btrfs_dedupe_info *dedupe_info;
+
+   dedupe_info = root->fs_info->dedupe_info;
+   if (WARN_ON(dedupe_info == NULL)) {
+   kmem_cache_free(btrfs_ordered_extent_cache,
+   entry);
+   return -EINVAL;
+   }
+   entry->hash = btrfs_dedupe_alloc_hash(dedupe_info->hash_algo);
+   if (!entry->hash) {
+   kmem_cache_free(btrfs_ordered_extent_cache, entry);
+   return -ENOMEM;
+   }
+   entry->hash->bytenr = hash->bytenr;
+   entry->hash->num_bytes = hash->num_bytes;
+   memcpy(entry->hash->hash, hash->hash,
+  btrfs_hash_sizes[dedupe_info->hash_algo]);
+   }
+
if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE)
set_bit(type, >flags);
 
@@ -250,15 +279,23 @@ int btrfs_add_ordered_extent(struct inode *inode, u64 
file_offset,
 {
return __btrfs_add_ordered_extent(inode, file_offset, start, len,
  disk_len, type, 0,
- BTRFS_COMPRESS_NONE);
+ BTRFS_COMPRESS_NONE, NULL);
 }
 
+int btrfs_add_ordered_extent_dedupe(struct inode *inode, u64 file_offset,
+  u64 start, u64 len, u64 disk_len, int type,
+  struct btrfs_dedupe_hash *hash)
+{
+   return __btrfs_add_ordered_extent(inode, file_offset, start, len,
+ disk_len, type, 0,
+ BTRFS_COMPRESS_NONE, hash);
+}
 int btrfs_add_ordered_extent_dio(struct inode *inode, u64 file_offset,
 u64 start, u64 len, u64 disk_len, int type)
 {
return __btrfs_add_ordered_extent(inode, file_offset, start, len,
  disk_len, type, 1,
- BTRFS_COMPRESS_NONE);
+ BTRFS_COMPRESS_NONE, NULL);
 }
 
 int btrfs_add_ordered_extent_compress(struct inode *inode, u64 file_offset,
@@ -267,7 +304,7 @@ int btrfs_add_ordered_extent_compress(struct inode *inode, 
u64 file_offset,
 {
return __btrfs_add_ordered_extent(inode, file_offset, start, len,
  disk_len, type, 0,
- compress_type);
+ compress_type, NULL);
 }
 
 /*
@@ -577,6 +614,7 @@ void btrfs_put_ordered_extent(struct btrfs_ordered_extent 
*entry)
list_del(>list);
kfree(sum);
}
+   kfree(entry->hash);
kmem_cache_free(btrfs_ordered_extent_cache, entry);
}
 }
diff --git a/fs/btrfs/ordered-data.h 

[PATCH v12.1 08/15] btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Unlike in-memory or on-disk dedupe method, only SHA256 hash method is
supported yet, so implement btrfs_dedupe_calc_hash() interface using
SHA256.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/dedupe.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
index eb1b110..6296b65 100644
--- a/fs/btrfs/dedupe.c
+++ b/fs/btrfs/dedupe.c
@@ -639,3 +639,49 @@ int btrfs_dedupe_search(struct btrfs_fs_info *fs_info,
}
return ret;
 }
+
+int btrfs_dedupe_calc_hash(struct btrfs_fs_info *fs_info,
+  struct inode *inode, u64 start,
+  struct btrfs_dedupe_hash *hash)
+{
+   int i;
+   int ret;
+   struct page *p;
+   struct btrfs_dedupe_info *dedupe_info = fs_info->dedupe_info;
+   struct crypto_shash *tfm = dedupe_info->dedupe_driver;
+   SHASH_DESC_ON_STACK(sdesc, tfm);
+   u64 dedupe_bs;
+   u64 sectorsize = BTRFS_I(inode)->root->sectorsize;
+
+   if (!fs_info->dedupe_enabled || !hash)
+   return 0;
+
+   if (WARN_ON(dedupe_info == NULL))
+   return -EINVAL;
+
+   WARN_ON(!IS_ALIGNED(start, sectorsize));
+
+   dedupe_bs = dedupe_info->blocksize;
+
+   sdesc->tfm = tfm;
+   sdesc->flags = 0;
+   ret = crypto_shash_init(sdesc);
+   if (ret)
+   return ret;
+   for (i = 0; sectorsize * i < dedupe_bs; i++) {
+   char *d;
+
+   p = find_get_page(inode->i_mapping,
+ (start >> PAGE_SHIFT) + i);
+   if (WARN_ON(!p))
+   return -ENOENT;
+   d = kmap(p);
+   ret = crypto_shash_update(sdesc, d, sectorsize);
+   kunmap(p);
+   put_page(p);
+   if (ret)
+   return ret;
+   }
+   ret = crypto_shash_final(sdesc, hash->hash);
+   return ret;
+}
-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v12.1 14/15] btrfs: dedupe: fix false ENOSPC

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

When testing in-band dedupe, sometimes we got ENOSPC error, though fs
still has much free space. After some debuging work, we found that it's
btrfs_delalloc_reserve_metadata() which sometimes tries to reserve
plenty of metadata space, even for very small data range.

In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try
to reserve is calculated by the difference between outstanding_extents and
reserved_extents. Please see below case for how ENOSPC occurs:

  1, Buffered write 128MB data in unit of 1MB, so finially we'll have
inode outstanding extents be 1, and reserved_extents be 128.
Note it's btrfs_merge_extent_hook() that merges these 1MB units into
one big outstanding extent, but do not change reserved_extents.

  2, When writing dirty pages, for in-band dedupe, cow_file_range() will
split above big extent in unit of 16KB(assume our in-band dedupe blocksize
is 16KB). When first split opeartion finishes, we'll have 2 outstanding
extents and 128 reserved extents, and just right the currently generated
ordered extent is dispatched to run and complete, then
btrfs_delalloc_release_metadata()(see btrfs_finish_ordered_io()) will be
called to release metadata, after that we will have 1 outstanding extents
and 1 reserved extents(also see logic in drop_outstanding_extent()). Later
cow_file_range() continues to handles left data range[16KB, 128MB), and if
no other ordered extent was dispatched to run, there will be 8191
outstanding extents and 1 reserved extent.

  3, Now if another bufferd write for this file enters, then
btrfs_delalloc_reserve_metadata() will at least try to reserve metadata
for 8191 outstanding extents' metadata, for 64K node size, it'll be
8191*65536*16, about 8GB metadata, so obviously it'll return ENOSPC error.

But indeed when a file goes through in-band dedupe, its max extent size
will no longer be BTRFS_MAX_EXTENT_SIZE(128MB), it'll be limited by in-band
dedupe blocksize, so current metadata reservation method in btrfs is not
appropriate or correct, here we introduce btrfs_max_extent_size(), which
will return max extent size for corresponding files, which go through
in-band and we use this value to do metadata reservation and extent_io
merge, split, clear operations, we can make sure difference between
outstanding_extents and reserved_extents will not be so big.

Currently only buffered write will go through in-band dedupe if in-band
dedupe is enabled.

Reported-by: Satoru Takeuchi 
Cc: Josef Bacik 
Cc: Mark Fasheh 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/ctree.h |  16 ++--
 fs/btrfs/dedupe.h|  35 +
 fs/btrfs/extent-tree.c   |  62 
 fs/btrfs/extent_io.c |  63 +++-
 fs/btrfs/extent_io.h |  15 +++-
 fs/btrfs/file.c  |  26 +--
 fs/btrfs/free-space-cache.c  |   5 +-
 fs/btrfs/inode-map.c |   4 +-
 fs/btrfs/inode.c | 157 +++
 fs/btrfs/ioctl.c |   6 +-
 fs/btrfs/ordered-data.h  |   1 +
 fs/btrfs/relocation.c|   8 +-
 fs/btrfs/tests/extent-io-tests.c |   6 +-
 fs/btrfs/tests/inode-tests.c |  12 +--
 14 files changed, 298 insertions(+), 118 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index fc5bae3..87cfb48 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2660,10 +2660,14 @@ int btrfs_subvolume_reserve_metadata(struct btrfs_root 
*root,
 void btrfs_subvolume_release_metadata(struct btrfs_root *root,
  struct btrfs_block_rsv *rsv,
  u64 qgroup_reserved);
-int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes);
-void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes);
-int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len);
-void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len);
+int btrfs_delalloc_reserve_metadata(struct inode *inode, u64 num_bytes,
+   u32 max_extent_size);
+void btrfs_delalloc_release_metadata(struct inode *inode, u64 num_bytes,
+u32 max_extent_size);
+int btrfs_delalloc_reserve_space(struct inode *inode, u64 start, u64 len,
+u32 max_extent_size);
+void btrfs_delalloc_release_space(struct inode *inode, u64 start, u64 len,
+ u32 max_extent_size);
 void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, unsigned short type);
 struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_root *root,
  unsigned short type);
@@ -3104,7 +3108,7 @@ int btrfs_start_delalloc_inodes(struct btrfs_root *root, 
int delay_iput);
 int 

[PATCH v12.1 11/15] btrfs: dedupe: Add ioctl for inband dedupelication

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Add ioctl interface for inband dedupelication, which includes:
1) enable
2) disable
3) status

And a pseudo RO compat flag, to imply that btrfs now supports inband
dedup.
However we don't add any ondisk format change, it's just a pseudo RO
compat flag.

All these ioctl interfaces are state-less, which means caller don't need
to bother previous dedupe state before calling them, and only need to
care the final desired state.

For example, if user want to enable dedupe with specified block size and
limit, just fill the ioctl structure and call enable ioctl.
No need to check if dedupe is already running.

These ioctls will handle things like re-configure or disable quite well.

Also, for invalid parameters, enable ioctl interface will set the field
of the first encounted invalid parameter to (-1) to inform caller.
While for limit_nr/limit_mem, the value will be (0).

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/dedupe.c  | 50 ++
 fs/btrfs/dedupe.h  | 17 
 fs/btrfs/disk-io.c |  3 ++
 fs/btrfs/ioctl.c   | 68 ++
 fs/btrfs/sysfs.c   |  2 ++
 include/uapi/linux/btrfs.h | 12 +++-
 6 files changed, 146 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
index 6296b65..ad1bb82 100644
--- a/fs/btrfs/dedupe.c
+++ b/fs/btrfs/dedupe.c
@@ -41,6 +41,35 @@ static inline struct inmem_hash *inmem_alloc_hash(u16 algo)
GFP_NOFS);
 }
 
+void btrfs_dedupe_status(struct btrfs_fs_info *fs_info,
+struct btrfs_ioctl_dedupe_args *dargs)
+{
+   struct btrfs_dedupe_info *dedupe_info = fs_info->dedupe_info;
+
+   if (!fs_info->dedupe_enabled || !dedupe_info) {
+   dargs->status = 0;
+   dargs->blocksize = 0;
+   dargs->backend = 0;
+   dargs->hash_algo = 0;
+   dargs->limit_nr = 0;
+   dargs->current_nr = 0;
+   memset(dargs->__unused, -1, sizeof(dargs->__unused));
+   return;
+   }
+   mutex_lock(_info->lock);
+   dargs->status = 1;
+   dargs->blocksize = dedupe_info->blocksize;
+   dargs->backend = dedupe_info->backend;
+   dargs->hash_algo = dedupe_info->hash_algo;
+   dargs->limit_nr = dedupe_info->limit_nr;
+   dargs->limit_mem = dedupe_info->limit_nr *
+   (sizeof(struct inmem_hash) +
+btrfs_hash_sizes[dedupe_info->hash_algo]);
+   dargs->current_nr = dedupe_info->current_nr;
+   mutex_unlock(_info->lock);
+   memset(dargs->__unused, -1, sizeof(dargs->__unused));
+}
+
 static int init_dedupe_info(struct btrfs_dedupe_info **ret_info,
struct btrfs_ioctl_dedupe_args *dargs)
 {
@@ -420,6 +449,27 @@ static void unblock_all_writers(struct btrfs_fs_info 
*fs_info)
percpu_up_write(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1);
 }
 
+int btrfs_dedupe_cleanup(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_dedupe_info *dedupe_info;
+
+   fs_info->dedupe_enabled = 0;
+   /* same as disable */
+   smp_wmb();
+   dedupe_info = fs_info->dedupe_info;
+   fs_info->dedupe_info = NULL;
+
+   if (!dedupe_info)
+   return 0;
+
+   if (dedupe_info->backend == BTRFS_DEDUPE_BACKEND_INMEMORY)
+   inmem_destroy(dedupe_info);
+
+   crypto_free_shash(dedupe_info->dedupe_driver);
+   kfree(dedupe_info);
+   return 0;
+}
+
 int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info)
 {
struct btrfs_dedupe_info *dedupe_info;
diff --git a/fs/btrfs/dedupe.h b/fs/btrfs/dedupe.h
index e45915a..48812cb 100644
--- a/fs/btrfs/dedupe.h
+++ b/fs/btrfs/dedupe.h
@@ -91,6 +91,15 @@ static inline struct btrfs_dedupe_hash 
*btrfs_dedupe_alloc_hash(u16 algo)
 int btrfs_dedupe_enable(struct btrfs_fs_info *fs_info,
struct btrfs_ioctl_dedupe_args *dargs);
 
+
+ /*
+ * Get inband dedupe info
+ * Since it needs to access different backends' hash size, which
+ * is not exported, we need such simple function.
+ */
+void btrfs_dedupe_status(struct btrfs_fs_info *fs_info,
+struct btrfs_ioctl_dedupe_args *dargs);
+
 /*
  * Disable dedupe and invalidate all its dedupe data.
  * Called at dedupe disable time.
@@ -102,12 +111,10 @@ int btrfs_dedupe_enable(struct btrfs_fs_info *fs_info,
 int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info);
 
 /*
- * Get current dedupe status.
- * Return 0 for success
- * No possible error yet
+ * Cleanup current btrfs_dedupe_info
+ * Called in umount time
  */
-void btrfs_dedupe_status(struct btrfs_fs_info *fs_info,
-struct btrfs_ioctl_dedupe_args *dargs);
+int btrfs_dedupe_cleanup(struct btrfs_fs_info *fs_info);
 
 /*
  * Calculate hash for dedupe.
diff --git 

[PATCH v12.1 13/15] btrfs: improve inode's outstanding_extents computation

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

This issue was revealed by modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB,
When modifying BTRFS_MAX_EXTENT_SIZE(128MB) to 64KB, fsstress test often
gets these warnings from btrfs_destroy_inode():
WARN_ON(BTRFS_I(inode)->outstanding_extents);
WARN_ON(BTRFS_I(inode)->reserved_extents);

Simple test program below can reproduce this issue steadily.
Note: you need to modify BTRFS_MAX_EXTENT_SIZE to 64KB to have test,
otherwise there won't be such WARNING.
#include 
#include 
#include 
#include 
#include 

int main(void)
{
int fd;
char buf[68 *1024];

memset(buf, 0, 68 * 1024);
fd = open("testfile", O_CREAT | O_EXCL | O_RDWR);
pwrite(fd, buf, 68 * 1024, 64 * 1024);
return;
}

When BTRFS_MAX_EXTENT_SIZE is 64KB, and buffered data range is:
64KB128K132KB
|---|---|
 64 + 4KB

1) for above data range, btrfs_delalloc_reserve_metadata() will reserve
metadata and set BTRFS_I(inode)->outstanding_extents to 2.
(68KB + 64KB - 1) / 64KB == 2

Outstanding_extents: 2

2) then btrfs_dirty_page() will be called to dirty pages and set
EXTENT_DELALLOC flag. In this case, btrfs_set_bit_hook() will be called
twice.
The 1st set_bit_hook() call will set DEALLOC flag for the first 64K.
64KB128KB
|---|
64KB DELALLOC
Outstanding_extents: 2

Set_bit_hooks() uses FIRST_DELALLOC flag to avoid re-increase
outstanding_extents counter.
So for 1st set_bit_hooks() call, it won't modify outstanding_extents,
it's still 2.

Then FIRST_DELALLOC flag is *CLEARED*.

3) 2nd btrfs_set_bit_hook() call.
Because FIRST_DELALLOC have been cleared by previous set_bit_hook(),
btrfs_set_bit_hook() will increase BTRFS_I(inode)->outstanding_extents by
one, so now BTRFS_I(inode)->outstanding_extents is 3.
64KB128KB132KB
|---||
64K DELALLOC   4K DELALLOC
Outstanding_extents: 3

But the correct outstanding_extents number should be 2, not 3.
The 2nd btrfs_set_bit_hook() call just screwed up this, and leads to the
WARN_ON().

Normally, we can solve it by only increasing outstanding_extents in
set_bit_hook().
But the problem is for delalloc_reserve/release_metadata(), we only have
a 'length' parameter, and calculate in-accurate outstanding_extents.
If we only rely on set_bit_hook() release_metadata() will crew things up
as it will decrease inaccurate number.

So the fix we use is:
1) Increase *INACCURATE* outstanding_extents at delalloc_reserve_meta
   Just as a place holder.
2) Increase *accurate* outstanding_extents at set_bit_hooks()
   This is the real increaser.
3) Decrease *INACCURATE* outstanding_extents before returning
   This makes outstanding_extents to correct value.

For 128M BTRFS_MAX_EXTENT_SIZE, due to limitation of
__btrfs_buffered_write(), each iteration will only handle about 2MB
data.
So btrfs_dirty_pages() won't need to handle cases cross 2 extents.

Cc: Mark Fasheh 
Cc: Josef Bacik 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/ctree.h |  2 ++
 fs/btrfs/inode.c | 68 +++-
 fs/btrfs/ioctl.c |  6 ++---
 3 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 412aad0..fc5bae3 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3105,6 +3105,8 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info 
*fs_info, int delay_iput,
   int nr);
 int btrfs_set_extent_delalloc(struct inode *inode, u64 start, u64 end,
  struct extent_state **cached_state);
+int btrfs_set_extent_defrag(struct inode *inode, u64 start, u64 end,
+   struct extent_state **cached_state);
 int btrfs_create_subvol_root(struct btrfs_trans_handle *trans,
 struct btrfs_root *new_root,
 struct btrfs_root *parent_root,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7bfbaec..41994db 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1730,11 +1730,15 @@ static void btrfs_split_extent_hook(struct inode *inode,
struct extent_state *orig, u64 split)
 {
u64 size;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
 
/* not delalloc, ignore it */
if (!(orig->state & EXTENT_DELALLOC))
return;
 
+   if (root == root->fs_info->tree_root)
+   return;
+
size = orig->end - orig->start + 1;
   

[PATCH v12.1 07/15] btrfs: dedupe: Introduce function to search for an existing hash

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce static function inmem_search() to handle the job for in-memory
hash tree.

The trick is, we must ensure the delayed ref head is not being run at
the time we search the for the hash.

With inmem_search(), we can implement the btrfs_dedupe_search()
interface.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/dedupe.c | 185 ++
 1 file changed, 185 insertions(+)

diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
index 14c57fa..eb1b110 100644
--- a/fs/btrfs/dedupe.c
+++ b/fs/btrfs/dedupe.c
@@ -20,6 +20,7 @@
 #include "btrfs_inode.h"
 #include "transaction.h"
 #include "delayed-ref.h"
+#include "qgroup.h"
 
 struct inmem_hash {
struct rb_node hash_node;
@@ -454,3 +455,187 @@ int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info)
kfree(dedupe_info);
return 0;
 }
+
+/*
+ * Caller must ensure the corresponding ref head is not being run.
+ */
+static struct inmem_hash *
+inmem_search_hash(struct btrfs_dedupe_info *dedupe_info, u8 *hash)
+{
+   struct rb_node **p = _info->hash_root.rb_node;
+   struct rb_node *parent = NULL;
+   struct inmem_hash *entry = NULL;
+   u16 hash_algo = dedupe_info->hash_algo;
+   int hash_len = btrfs_hash_sizes[hash_algo];
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct inmem_hash, hash_node);
+
+   if (memcmp(hash, entry->hash, hash_len) < 0) {
+   p = &(*p)->rb_left;
+   } else if (memcmp(hash, entry->hash, hash_len) > 0) {
+   p = &(*p)->rb_right;
+   } else {
+   /* Found, need to re-add it to LRU list head */
+   list_del(>lru_list);
+   list_add(>lru_list, _info->lru_list);
+   return entry;
+   }
+   }
+   return NULL;
+}
+
+static int inmem_search(struct btrfs_dedupe_info *dedupe_info,
+   struct inode *inode, u64 file_pos,
+   struct btrfs_dedupe_hash *hash)
+{
+   int ret;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
+   struct btrfs_trans_handle *trans;
+   struct btrfs_delayed_ref_root *delayed_refs;
+   struct btrfs_delayed_ref_head *head;
+   struct btrfs_delayed_ref_head *insert_head;
+   struct btrfs_delayed_data_ref *insert_dref;
+   struct btrfs_qgroup_extent_record *insert_qrecord = NULL;
+   struct inmem_hash *found_hash;
+   int free_insert = 1;
+   u64 bytenr;
+   u32 num_bytes;
+
+   insert_head = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
+   if (!insert_head)
+   return -ENOMEM;
+   insert_head->extent_op = NULL;
+   insert_dref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
+   if (!insert_dref) {
+   kmem_cache_free(btrfs_delayed_ref_head_cachep, insert_head);
+   return -ENOMEM;
+   }
+   if (root->fs_info->quota_enabled &&
+   is_fstree(root->root_key.objectid)) {
+   insert_qrecord = kmalloc(sizeof(*insert_qrecord), GFP_NOFS);
+   if (!insert_qrecord) {
+   kmem_cache_free(btrfs_delayed_ref_head_cachep,
+   insert_head);
+   kmem_cache_free(btrfs_delayed_data_ref_cachep,
+   insert_dref);
+   return -ENOMEM;
+   }
+   }
+
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto free_mem;
+   }
+
+again:
+   mutex_lock(_info->lock);
+   found_hash = inmem_search_hash(dedupe_info, hash->hash);
+   /* If we don't find a duplicated extent, just return. */
+   if (!found_hash) {
+   ret = 0;
+   goto out;
+   }
+   bytenr = found_hash->bytenr;
+   num_bytes = found_hash->num_bytes;
+
+   delayed_refs = >transaction->delayed_refs;
+
+   spin_lock(_refs->lock);
+   head = btrfs_find_delayed_ref_head(trans, bytenr);
+   if (!head) {
+   /*
+* We can safely insert a new delayed_ref as long as we
+* hold delayed_refs->lock.
+* Only need to use atomic inc_extent_ref()
+*/
+   btrfs_add_delayed_data_ref_locked(root->fs_info, trans,
+   insert_dref, insert_head, insert_qrecord,
+   bytenr, num_bytes, 0, root->root_key.objectid,
+   btrfs_ino(inode), file_pos, 0,
+   BTRFS_ADD_DELAYED_REF);
+   spin_unlock(_refs->lock);
+
+   /* 

[PATCH v12.1 12/15] btrfs: relocation: Enhance error handling to avoid BUG_ON

2016-07-10 Thread Qu Wenruo
Since the introduce of btrfs dedupe tree, it's possible that balance can
race with dedupe disabling.

When this happens, dedupe_enabled will make btrfs_get_fs_root() return
PTR_ERR(-ENOENT).
But due to a bug in error handling branch, when this happens
backref_cache->nr_nodes is increased but the node is neither added to
backref_cache or nr_nodes decreased.
Causing BUG_ON() in backref_cache_cleanup()

[ 2611.668810] [ cut here ]
[ 2611.669946] kernel BUG at
/home/sat/ktest/linux/fs/btrfs/relocation.c:243!
[ 2611.670572] invalid opcode:  [#1] SMP
[ 2611.686797] Call Trace:
[ 2611.687034]  []
btrfs_relocate_block_group+0x1b3/0x290 [btrfs]
[ 2611.687706]  []
btrfs_relocate_chunk.isra.40+0x47/0xd0 [btrfs]
[ 2611.688385]  [] btrfs_balance+0xb22/0x11e0 [btrfs]
[ 2611.688966]  [] btrfs_ioctl_balance+0x391/0x3a0
[btrfs]
[ 2611.689587]  [] btrfs_ioctl+0x1650/0x2290 [btrfs]
[ 2611.690145]  [] ? lru_cache_add+0x3a/0x80
[ 2611.690647]  [] ?
lru_cache_add_active_or_unevictable+0x4c/0xc0
[ 2611.691310]  [] ? handle_mm_fault+0xcd4/0x17f0
[ 2611.691842]  [] ? cp_new_stat+0x153/0x180
[ 2611.692342]  [] ? __vma_link_rb+0xfd/0x110
[ 2611.692842]  [] ? vma_link+0xb9/0xc0
[ 2611.693303]  [] do_vfs_ioctl+0xa1/0x5a0
[ 2611.693781]  [] ? __do_page_fault+0x1b4/0x400
[ 2611.694310]  [] SyS_ioctl+0x41/0x70
[ 2611.694758]  [] entry_SYSCALL_64_fastpath+0x12/0x71
[ 2611.695331] Code: ff 48 8b 45 bf 49 83 af a8 05 00 00 01 49 89 87 a0
05 00 00 e9 2e fd ff ff b8 f4 ff ff ff e9 e4 fb ff ff 0f 0b 0f 0b 0f 0b
0f 0b <0f> 0b 0f 0b 41 89 c6 e9 b8 fb ff ff e8 9e a6 e8 e0 4c 89 e7 44
[ 2611.697870] RIP  []
relocate_block_group+0x741/0x7a0 [btrfs]
[ 2611.698818]  RSP 

This patch will call remove_backref_node() in error handling branch, and
cache the returned -ENOENT in relocate_tree_block() and continue
balancing.

Reported-by: Satoru Takeuchi 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/relocation.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 51f9c47..6b53351 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -887,6 +887,13 @@ again:
root = read_fs_root(rc->extent_root->fs_info, key.offset);
if (IS_ERR(root)) {
err = PTR_ERR(root);
+   /*
+* Don't forget to cleanup current node.
+* As it may not be added to backref_cache but nr_node
+* increased.
+* This will cause BUG_ON() in backref_cache_cleanup().
+*/
+   remove_backref_node(>backref_cache, cur);
goto out;
}
 
@@ -2994,14 +3001,21 @@ int relocate_tree_blocks(struct btrfs_trans_handle 
*trans,
}
 
rb_node = rb_first(blocks);
-   while (rb_node) {
+   for (rb_node = rb_first(blocks); rb_node; rb_node = rb_next(rb_node)) {
block = rb_entry(rb_node, struct tree_block, rb_node);
 
node = build_backref_tree(rc, >key,
  block->level, block->bytenr);
if (IS_ERR(node)) {
+   /*
+* The root(dedupe tree yet) of the tree block is
+* going to be freed and can't be reached.
+* Just skip it and continue balancing.
+*/
+   if (PTR_ERR(node) == -ENOENT)
+   continue;
err = PTR_ERR(node);
-   goto out;
+   break;
}
 
ret = relocate_tree_block(trans, rc, node, >key,
@@ -3009,11 +3023,9 @@ int relocate_tree_blocks(struct btrfs_trans_handle 
*trans,
if (ret < 0) {
if (ret != -EAGAIN || rb_node == rb_first(blocks))
err = ret;
-   goto out;
+   break;
}
-   rb_node = rb_next(rb_node);
}
-out:
err = finish_pending_nodes(trans, rc, path, err);
 
 out_free_path:
-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v12.1 01/15] btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Extract cow_file_range() new parameters for both in-band dedupe and
subpage sector size patchset.

This should make conflict of both patchset to minimal, and reduce the
effort needed to rebase them.

Cc: Chandan Rajendra 
Cc: David Sterba 
Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/dedupe.h | 24 
 fs/btrfs/inode.c  | 28 +---
 2 files changed, 41 insertions(+), 11 deletions(-)
 create mode 100644 fs/btrfs/dedupe.h

diff --git a/fs/btrfs/dedupe.h b/fs/btrfs/dedupe.h
new file mode 100644
index 000..83ebfe2
--- /dev/null
+++ b/fs/btrfs/dedupe.h
@@ -0,0 +1,24 @@
+/*
+ * Copyright (C) 2016 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef __BTRFS_DEDUPE__
+#define __BTRFS_DEDUPE__
+
+/* later in-band dedupe will expand this struct */
+struct btrfs_dedupe_hash;
+#endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index bf4319d..ef2b27e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -60,6 +60,7 @@
 #include "hash.h"
 #include "props.h"
 #include "qgroup.h"
+#include "dedupe.h"
 
 struct btrfs_iget_args {
struct btrfs_key *location;
@@ -105,8 +106,9 @@ static int btrfs_truncate(struct inode *inode);
 static int btrfs_finish_ordered_io(struct btrfs_ordered_extent 
*ordered_extent);
 static noinline int cow_file_range(struct inode *inode,
   struct page *locked_page,
-  u64 start, u64 end, int *page_started,
-  unsigned long *nr_written, int unlock);
+  u64 start, u64 end, u64 delalloc_end,
+  int *page_started, unsigned long *nr_written,
+  int unlock, struct btrfs_dedupe_hash *hash);
 static struct extent_map *create_pinned_em(struct inode *inode, u64 start,
   u64 len, u64 orig_start,
   u64 block_start, u64 block_len,
@@ -710,7 +712,10 @@ retry:
 async_extent->start,
 async_extent->start +
 async_extent->ram_size - 1,
-_started, _written, 0);
+async_extent->start +
+async_extent->ram_size - 1,
+_started, _written, 0,
+NULL);
 
/* JDM XXX */
 
@@ -923,9 +928,9 @@ static u64 get_extent_allocation_hint(struct inode *inode, 
u64 start,
  */
 static noinline int cow_file_range(struct inode *inode,
   struct page *locked_page,
-  u64 start, u64 end, int *page_started,
-  unsigned long *nr_written,
-  int unlock)
+  u64 start, u64 end, u64 delalloc_end,
+  int *page_started, unsigned long *nr_written,
+  int unlock, struct btrfs_dedupe_hash *hash)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
u64 alloc_hint = 0;
@@ -1416,7 +1421,8 @@ out_check:
if (cow_start != (u64)-1) {
ret = cow_file_range(inode, locked_page,
 cow_start, found_key.offset - 1,
-page_started, nr_written, 1);
+end, page_started, nr_written, 1,
+NULL);
if (ret) {
if (!nolock && nocow)
btrfs_end_write_no_snapshoting(root);
@@ -1499,8 +1505,8 @@ out_check:
}
 
if (cow_start != (u64)-1) {
-   ret = cow_file_range(inode, locked_page, cow_start, end,
-page_started, nr_written, 1);
+   ret = 

[PATCH v12.1 15/15] btrfs: dedupe: Introduce new reconfigure ioctl

2016-07-10 Thread Qu Wenruo
Introduce new reconfigure ioctl, and new FORCE flag for in-band dedupe
ioctls.

Now dedupe enable and reconfigure ioctl are stateful.


| Current state |   Ioctl| Next state  |

| Disabled  |  enable| Enabled |
| Enabled   |  enable| Not allowed |
| Enabled   |  reconf| Enabled |
| Enabled   |  disable   | Disabled|
| Disabled  |  dsiable   | Disabled|
| Disabled  |  reconf| Not allowed |

(While disbale is always stateless)

While for guys prefer stateless ioctl (myself for example), new FORCE
flag is introduced.

In FORCE mode, enable/disable is completely stateless.

| Current state |   Ioctl| Next state  |

| Disabled  |  enable| Enabled |
| Enabled   |  enable| Enabled |
| Enabled   |  disable   | Disabled|
| Disabled  |  disable   | Disabled|


Also, re-configure ioctl will only modify specified fields.
Unlike enable, un-specified fields will be filled with default value.

For example:
 # btrfs dedupe enable --block-size 64k /mnt
 # btrfs dedupe reconfigure --limit-hash 1m /mnt
Will leads to:
 dedupe blocksize: 64K
 dedupe hash limit nr: 1m

While for enable:
 # btrfs dedupe enable --force --block-size 64k /mnt
 # btrfs dedupe enable --force --limit-hash 1m /mnt
Will reset blocksize to default value:
 dedupe blocksize: 128K << reset
 dedupe hash limit nr: 1m

Suggested-by: David Sterba 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/dedupe.c  | 131 -
 fs/btrfs/dedupe.h  |  13 +
 fs/btrfs/ioctl.c   |  13 +
 include/uapi/linux/btrfs.h |  11 +++-
 4 files changed, 143 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
index ad1bb82..a3f4f0d 100644
--- a/fs/btrfs/dedupe.c
+++ b/fs/btrfs/dedupe.c
@@ -41,6 +41,40 @@ static inline struct inmem_hash *inmem_alloc_hash(u16 algo)
GFP_NOFS);
 }
 
+/*
+ * Copy from current dedupe info to fill dargs.
+ * For reconf case, only fill members which is uninitialized.
+ */
+static void get_dedupe_status(struct btrfs_dedupe_info *dedupe_info,
+ struct btrfs_ioctl_dedupe_args *dargs)
+{
+   int reconf = (dargs->cmd == BTRFS_DEDUPE_CTL_RECONF);
+
+   dargs->status = 1;
+
+   if (!reconf || (reconf && dargs->blocksize == (u64)-1))
+   dargs->blocksize = dedupe_info->blocksize;
+   if (!reconf || (reconf && dargs->backend == (u16)-1))
+   dargs->backend = dedupe_info->backend;
+   if (!reconf || (reconf && dargs->hash_algo ==(u16)-1))
+   dargs->hash_algo = dedupe_info->hash_algo;
+
+   /*
+* For re-configure case, if not modifying limit,
+* therir limit will be set to 0, unlike other fields
+*/
+   if (!reconf || !(dargs->limit_nr || dargs->limit_mem)) {
+   dargs->limit_nr = dedupe_info->limit_nr;
+   dargs->limit_mem = dedupe_info->limit_nr *
+   (sizeof(struct inmem_hash) +
+btrfs_hash_sizes[dedupe_info->hash_algo]);
+   }
+
+   /* current_nr doesn't makes sense for reconfig case */
+   if (!reconf)
+   dargs->current_nr = dedupe_info->current_nr;
+}
+
 void btrfs_dedupe_status(struct btrfs_fs_info *fs_info,
 struct btrfs_ioctl_dedupe_args *dargs)
 {
@@ -57,15 +91,7 @@ void btrfs_dedupe_status(struct btrfs_fs_info *fs_info,
return;
}
mutex_lock(_info->lock);
-   dargs->status = 1;
-   dargs->blocksize = dedupe_info->blocksize;
-   dargs->backend = dedupe_info->backend;
-   dargs->hash_algo = dedupe_info->hash_algo;
-   dargs->limit_nr = dedupe_info->limit_nr;
-   dargs->limit_mem = dedupe_info->limit_nr *
-   (sizeof(struct inmem_hash) +
-btrfs_hash_sizes[dedupe_info->hash_algo]);
-   dargs->current_nr = dedupe_info->current_nr;
+   get_dedupe_status(dedupe_info, dargs);
mutex_unlock(_info->lock);
memset(dargs->__unused, -1, sizeof(dargs->__unused));
 }
@@ -114,17 +140,50 @@ static int init_dedupe_info(struct btrfs_dedupe_info 
**ret_info,
 static int check_dedupe_parameter(struct btrfs_fs_info *fs_info,
  struct btrfs_ioctl_dedupe_args *dargs)
 {
-   u64 blocksize = dargs->blocksize;
-   u64 limit_nr = dargs->limit_nr;
-   u64 limit_mem = dargs->limit_mem;
-   u16 hash_algo = dargs->hash_algo;
-   u8 backend = dargs->backend;
+   struct btrfs_dedupe_info *dedupe_info = fs_info->dedupe_info;
+
+   u64 blocksize;
+   u64 limit_nr;
+   u64 

[PATCH v12.1 10/15] btrfs: dedupe: Inband in-memory only de-duplication implement

2016-07-10 Thread Qu Wenruo
Core implement for inband de-duplication.
It reuse the async_cow_start() facility to do the calculate dedupe hash.
And use dedupe hash to do inband de-duplication at extent level.

The work flow is as below:
1) Run delalloc range for an inode
2) Calculate hash for the delalloc range at the unit of dedupe_bs
3) For hash match(duplicated) case, just increase source extent ref
   and insert file extent.
   For hash mismatch case, go through the normal cow_file_range()
   fallback, and add hash into dedupe_tree.
   Compress for hash miss case is not supported yet.

Current implement restore all dedupe hash in memory rb-tree, with LRU
behavior to control the limit.

Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c |  21 +
 fs/btrfs/inode.c   | 250 +++--
 fs/btrfs/relocation.c  |  16 
 3 files changed, 257 insertions(+), 30 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index adefa1e..69806ed 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -37,6 +37,7 @@
 #include "math.h"
 #include "sysfs.h"
 #include "qgroup.h"
+#include "dedupe.h"
 
 #undef SCRAMBLE_DELAYED_REFS
 
@@ -2415,6 +2416,8 @@ static int run_one_delayed_ref(struct btrfs_trans_handle 
*trans,
 
if (btrfs_delayed_ref_is_head(node)) {
struct btrfs_delayed_ref_head *head;
+   struct btrfs_fs_info *fs_info = root->fs_info;
+
/*
 * we've hit the end of the chain and we were supposed
 * to insert this extent into the tree.  But, it got
@@ -2430,6 +2433,18 @@ static int run_one_delayed_ref(struct btrfs_trans_handle 
*trans,
btrfs_pin_extent(root, node->bytenr,
 node->num_bytes, 1);
if (head->is_data) {
+   /*
+* If insert_reserved is given, it means
+* a new extent is revered, then deleted
+* in one tran, and inc/dec get merged to 0.
+*
+* In this case, we need to remove its dedupe
+* hash.
+*/
+   ret = btrfs_dedupe_del(trans, fs_info,
+  node->bytenr);
+   if (ret < 0)
+   return ret;
ret = btrfs_del_csums(trans, root,
  node->bytenr,
  node->num_bytes);
@@ -7050,6 +7065,12 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
btrfs_release_path(path);
 
if (is_data) {
+   ret = btrfs_dedupe_del(trans, info, bytenr);
+   if (ret < 0) {
+   btrfs_abort_transaction(trans, extent_root,
+   ret);
+   goto out;
+   }
ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
if (ret) {
btrfs_abort_transaction(trans, extent_root, 
ret);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ef2b27e..7bfbaec 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -337,6 +337,7 @@ struct async_extent {
struct page **pages;
unsigned long nr_pages;
int compress_type;
+   struct btrfs_dedupe_hash *hash;
struct list_head list;
 };
 
@@ -355,7 +356,8 @@ static noinline int add_async_extent(struct async_cow *cow,
 u64 compressed_size,
 struct page **pages,
 unsigned long nr_pages,
-int compress_type)
+int compress_type,
+struct btrfs_dedupe_hash *hash)
 {
struct async_extent *async_extent;
 
@@ -367,6 +369,7 @@ static noinline int add_async_extent(struct async_cow *cow,
async_extent->pages = pages;
async_extent->nr_pages = nr_pages;
async_extent->compress_type = compress_type;
+   async_extent->hash = hash;
list_add_tail(_extent->list, >extents);
return 0;
 }
@@ -596,7 +599,7 @@ cont:
 */
add_async_extent(async_cow, start, num_bytes,
total_compressed, pages, nr_pages_ret,
-   compress_type);
+   compress_type, NULL);
 
if 

[PATCH v12.1 02/15] btrfs: dedupe: Introduce dedupe framework and its header

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce the header for btrfs in-band(write time) de-duplication
framework and needed header.

The new de-duplication framework is going to support 2 different dedupe
methods and 1 dedupe hash.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/ctree.h   |   7 +++
 fs/btrfs/dedupe.h  | 137 -
 fs/btrfs/disk-io.c |   1 +
 include/uapi/linux/btrfs.h |  34 +++
 4 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 83a6a93..412aad0 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1093,6 +1093,13 @@ struct btrfs_fs_info {
struct list_head pinned_chunks;
 
int creating_free_space_tree;
+
+   /*
+* Inband de-duplication related structures
+*/
+   unsigned long dedupe_enabled:1;
+   struct btrfs_dedupe_info *dedupe_info;
+   struct mutex dedupe_ioctl_lock;
 };
 
 struct btrfs_subvolume_writers {
diff --git a/fs/btrfs/dedupe.h b/fs/btrfs/dedupe.h
index 83ebfe2..8f57661 100644
--- a/fs/btrfs/dedupe.h
+++ b/fs/btrfs/dedupe.h
@@ -19,6 +19,139 @@
 #ifndef __BTRFS_DEDUPE__
 #define __BTRFS_DEDUPE__
 
-/* later in-band dedupe will expand this struct */
-struct btrfs_dedupe_hash;
+#include 
+#include 
+#include 
+
+static int btrfs_hash_sizes[] = { 32 };
+
+/*
+ * For caller outside of dedupe.c
+ *
+ * Different dedupe backends should have their own hash structure
+ */
+struct btrfs_dedupe_hash {
+   u64 bytenr;
+   u32 num_bytes;
+
+   /* last field is a variable length array of dedupe hash */
+   u8 hash[];
+};
+
+struct btrfs_dedupe_info {
+   /* dedupe blocksize */
+   u64 blocksize;
+   u16 backend;
+   u16 hash_algo;
+
+   struct crypto_shash *dedupe_driver;
+
+   /*
+* Use mutex to portect both backends
+* Even for in-memory backends, the rb-tree can be quite large,
+* so mutex is better for such use case.
+*/
+   struct mutex lock;
+
+   /* following members are only used in in-memory backend */
+   struct rb_root hash_root;
+   struct rb_root bytenr_root;
+   struct list_head lru_list;
+   u64 limit_nr;
+   u64 current_nr;
+};
+
+struct btrfs_trans_handle;
+
+static inline int btrfs_dedupe_hash_hit(struct btrfs_dedupe_hash *hash)
+{
+   return (hash && hash->bytenr);
+}
+
+int btrfs_dedupe_hash_size(u16 algo);
+struct btrfs_dedupe_hash *btrfs_dedupe_alloc_hash(u16 algo);
+
+/*
+ * Initial inband dedupe info
+ * Called at dedupe enable time.
+ *
+ * Return 0 for success
+ * Return <0 for any error
+ * (from unsupported param to tree creation error for some backends)
+ */
+int btrfs_dedupe_enable(struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_dedupe_args *dargs);
+
+/*
+ * Disable dedupe and invalidate all its dedupe data.
+ * Called at dedupe disable time.
+ *
+ * Return 0 for success
+ * Return <0 for any error
+ * (tree operation error for some backends)
+ */
+int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info);
+
+/*
+ * Get current dedupe status.
+ * Return 0 for success
+ * No possible error yet
+ */
+void btrfs_dedupe_status(struct btrfs_fs_info *fs_info,
+struct btrfs_ioctl_dedupe_args *dargs);
+
+/*
+ * Calculate hash for dedupe.
+ * Caller must ensure [start, start + dedupe_bs) has valid data.
+ *
+ * Return 0 for success
+ * Return <0 for any error
+ * (error from hash codes)
+ */
+int btrfs_dedupe_calc_hash(struct btrfs_fs_info *fs_info,
+  struct inode *inode, u64 start,
+  struct btrfs_dedupe_hash *hash);
+
+/*
+ * Search for duplicated extents by calculated hash
+ * Caller must call btrfs_dedupe_calc_hash() first to get the hash.
+ *
+ * @inode: the inode for we are writing
+ * @file_pos: offset inside the inode
+ * As we will increase extent ref immediately after a hash match,
+ * we need @file_pos and @inode in this case.
+ *
+ * Return > 0 for a hash match, and the extent ref will be
+ * *INCREASED*, and hash->bytenr/num_bytes will record the existing
+ * extent data.
+ * Return 0 for a hash miss. Nothing is done
+ * Return <0 for any error
+ * (tree operation error for some backends)
+ */
+int btrfs_dedupe_search(struct btrfs_fs_info *fs_info,
+   struct inode *inode, u64 file_pos,
+   struct btrfs_dedupe_hash *hash);
+
+/*
+ * Add a dedupe hash into dedupe info
+ * Return 0 for success
+ * Return <0 for any error
+ * (tree operation error for some backends)
+ */
+int btrfs_dedupe_add(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info,
+struct btrfs_dedupe_hash *hash);
+
+/*
+ * Remove a dedupe hash from dedupe info
+ * Return 0 for success
+ * Return <0 for any error
+ * (tree operation error 

[PATCH v12.1 03/15] btrfs: dedupe: Introduce function to initialize dedupe info

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Add generic function to initialize dedupe info.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/Makefile  |   2 +-
 fs/btrfs/dedupe.c  | 185 +
 fs/btrfs/dedupe.h  |  13 +++-
 include/uapi/linux/btrfs.h |   4 +-
 4 files changed, 200 insertions(+), 4 deletions(-)
 create mode 100644 fs/btrfs/dedupe.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17..1b8c627 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-  uuid-tree.o props.o hash.o free-space-tree.o
+  uuid-tree.o props.o hash.o free-space-tree.o dedupe.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
new file mode 100644
index 000..b14166a
--- /dev/null
+++ b/fs/btrfs/dedupe.c
@@ -0,0 +1,185 @@
+/*
+ * Copyright (C) 2016 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#include "ctree.h"
+#include "dedupe.h"
+#include "btrfs_inode.h"
+#include "transaction.h"
+#include "delayed-ref.h"
+
+struct inmem_hash {
+   struct rb_node hash_node;
+   struct rb_node bytenr_node;
+   struct list_head lru_list;
+
+   u64 bytenr;
+   u32 num_bytes;
+
+   u8 hash[];
+};
+
+static int init_dedupe_info(struct btrfs_dedupe_info **ret_info,
+   struct btrfs_ioctl_dedupe_args *dargs)
+{
+   struct btrfs_dedupe_info *dedupe_info;
+
+   dedupe_info = kzalloc(sizeof(*dedupe_info), GFP_NOFS);
+   if (!dedupe_info)
+   return -ENOMEM;
+
+   dedupe_info->hash_algo = dargs->hash_algo;
+   dedupe_info->backend = dargs->backend;
+   dedupe_info->blocksize = dargs->blocksize;
+   dedupe_info->limit_nr = dargs->limit_nr;
+
+   /* only support SHA256 yet */
+   dedupe_info->dedupe_driver = crypto_alloc_shash("sha256", 0, 0);
+   if (IS_ERR(dedupe_info->dedupe_driver)) {
+   int ret;
+
+   ret = PTR_ERR(dedupe_info->dedupe_driver);
+   kfree(dedupe_info);
+   return ret;
+   }
+
+   dedupe_info->hash_root = RB_ROOT;
+   dedupe_info->bytenr_root = RB_ROOT;
+   dedupe_info->current_nr = 0;
+   INIT_LIST_HEAD(_info->lru_list);
+   mutex_init(_info->lock);
+
+   *ret_info = dedupe_info;
+   return 0;
+}
+
+/*
+ * Helper to check if parameters are valid.
+ * The first invalid field will be set to (-1), to info user which parameter
+ * is invalid.
+ * Except dargs->limit_nr or dargs->limit_mem, in that case, 0 will returned
+ * to info user, since user can specify any value to limit, except 0.
+ */
+static int check_dedupe_parameter(struct btrfs_fs_info *fs_info,
+ struct btrfs_ioctl_dedupe_args *dargs)
+{
+   u64 blocksize = dargs->blocksize;
+   u64 limit_nr = dargs->limit_nr;
+   u64 limit_mem = dargs->limit_mem;
+   u16 hash_algo = dargs->hash_algo;
+   u8 backend = dargs->backend;
+
+   /*
+* Set all reserved fields to -1, allow user to detect
+* unsupported optional parameters.
+*/
+   memset(dargs->__unused, -1, sizeof(dargs->__unused));
+   if (blocksize > BTRFS_DEDUPE_BLOCKSIZE_MAX ||
+   blocksize < BTRFS_DEDUPE_BLOCKSIZE_MIN ||
+   blocksize < fs_info->tree_root->sectorsize ||
+   !is_power_of_2(blocksize) ||
+   blocksize < PAGE_SIZE) {
+   dargs->blocksize = (u64)-1;
+   return -EINVAL;
+   }
+   if (hash_algo >= ARRAY_SIZE(btrfs_hash_sizes)) {
+   dargs->hash_algo = (u16)-1;
+   return -EINVAL;
+   }
+   if (backend >= BTRFS_DEDUPE_BACKEND_COUNT) {
+   dargs->backend = (u8)-1;
+   return -EINVAL;
+   }
+
+   /* Backend specific check */
+   if 

[PATCH v12.1 06/15] btrfs: delayed-ref: Add support for increasing data ref under spinlock

2016-07-10 Thread Qu Wenruo
For in-band dedupe, btrfs needs to increase data ref with delayed_ref
locked, so add a new function btrfs_add_delayed_data_ref_lock() to
increase extent ref with delayed_refs already locked.

Signed-off-by: Qu Wenruo 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/delayed-ref.c | 30 +++---
 fs/btrfs/delayed-ref.h |  8 
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index e7b1ec0..bd97cf0 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -806,6 +806,26 @@ free_ref:
 }
 
 /*
+ * Do real delayed data ref insert.
+ * Caller must hold delayed_refs->lock and allocation memory
+ * for dref,head_ref and record.
+ */
+void btrfs_add_delayed_data_ref_locked(struct btrfs_fs_info *fs_info,
+   struct btrfs_trans_handle *trans,
+   struct btrfs_delayed_data_ref *dref,
+   struct btrfs_delayed_ref_head *head_ref,
+   struct btrfs_qgroup_extent_record *qrecord,
+   u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root,
+   u64 owner, u64 offset, u64 reserved, int action)
+{
+   head_ref = add_delayed_ref_head(fs_info, trans, _ref->node,
+   qrecord, bytenr, num_bytes, ref_root, reserved,
+   action, 1);
+   add_delayed_data_ref(fs_info, trans, head_ref, >node, bytenr,
+   num_bytes, parent, ref_root, owner, offset, action);
+}
+
+/*
  * add a delayed data ref. it's similar to btrfs_add_delayed_tree_ref.
  */
 int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
@@ -850,13 +870,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 * insert both the head node and the new ref without dropping
 * the spin lock
 */
-   head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, record,
-   bytenr, num_bytes, ref_root, reserved,
-   action, 1);
-
-   add_delayed_data_ref(fs_info, trans, head_ref, >node, bytenr,
-  num_bytes, parent, ref_root, owner, offset,
-  action);
+   btrfs_add_delayed_data_ref_locked(fs_info, trans, ref, head_ref, record,
+   bytenr, num_bytes, parent, ref_root, owner, offset,
+   reserved, action);
spin_unlock(_refs->lock);
 
return 0;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index 5fca953..5830341 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -239,11 +239,19 @@ static inline void btrfs_put_delayed_ref(struct 
btrfs_delayed_ref_node *ref)
}
 }
 
+struct btrfs_qgroup_extent_record;
 int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
   struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes, u64 parent,
   u64 ref_root, int level, int action,
   struct btrfs_delayed_extent_op *extent_op);
+void btrfs_add_delayed_data_ref_locked(struct btrfs_fs_info *fs_info,
+   struct btrfs_trans_handle *trans,
+   struct btrfs_delayed_data_ref *dref,
+   struct btrfs_delayed_ref_head *head_ref,
+   struct btrfs_qgroup_extent_record *qrecord,
+   u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root,
+   u64 owner, u64 offset, u64 reserved, int action);
 int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
   struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes,
-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v12.1 04/15] btrfs: dedupe: Introduce function to add hash into in-memory tree

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce static function inmem_add() to add hash into in-memory tree.
And now we can implement the btrfs_dedupe_add() interface.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
Reviewed-by: Josef Bacik 
---
 fs/btrfs/dedupe.c | 151 ++
 1 file changed, 151 insertions(+)

diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
index b14166a..e51412b 100644
--- a/fs/btrfs/dedupe.c
+++ b/fs/btrfs/dedupe.c
@@ -32,6 +32,14 @@ struct inmem_hash {
u8 hash[];
 };
 
+static inline struct inmem_hash *inmem_alloc_hash(u16 algo)
+{
+   if (WARN_ON(algo >= ARRAY_SIZE(btrfs_hash_sizes)))
+   return NULL;
+   return kzalloc(sizeof(struct inmem_hash) + btrfs_hash_sizes[algo],
+   GFP_NOFS);
+}
+
 static int init_dedupe_info(struct btrfs_dedupe_info **ret_info,
struct btrfs_ioctl_dedupe_args *dargs)
 {
@@ -183,3 +191,146 @@ int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info)
/* Place holder for bisect, will be implemented in later patches */
return 0;
 }
+
+static int inmem_insert_hash(struct rb_root *root,
+struct inmem_hash *hash, int hash_len)
+{
+   struct rb_node **p = >rb_node;
+   struct rb_node *parent = NULL;
+   struct inmem_hash *entry = NULL;
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct inmem_hash, hash_node);
+   if (memcmp(hash->hash, entry->hash, hash_len) < 0)
+   p = &(*p)->rb_left;
+   else if (memcmp(hash->hash, entry->hash, hash_len) > 0)
+   p = &(*p)->rb_right;
+   else
+   return 1;
+   }
+   rb_link_node(>hash_node, parent, p);
+   rb_insert_color(>hash_node, root);
+   return 0;
+}
+
+static int inmem_insert_bytenr(struct rb_root *root,
+  struct inmem_hash *hash)
+{
+   struct rb_node **p = >rb_node;
+   struct rb_node *parent = NULL;
+   struct inmem_hash *entry = NULL;
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct inmem_hash, bytenr_node);
+   if (hash->bytenr < entry->bytenr)
+   p = &(*p)->rb_left;
+   else if (hash->bytenr > entry->bytenr)
+   p = &(*p)->rb_right;
+   else
+   return 1;
+   }
+   rb_link_node(>bytenr_node, parent, p);
+   rb_insert_color(>bytenr_node, root);
+   return 0;
+}
+
+static void __inmem_del(struct btrfs_dedupe_info *dedupe_info,
+   struct inmem_hash *hash)
+{
+   list_del(>lru_list);
+   rb_erase(>hash_node, _info->hash_root);
+   rb_erase(>bytenr_node, _info->bytenr_root);
+
+   if (!WARN_ON(dedupe_info->current_nr == 0))
+   dedupe_info->current_nr--;
+
+   kfree(hash);
+}
+
+/*
+ * Insert a hash into in-memory dedupe tree
+ * Will remove exceeding last recent use hash.
+ *
+ * If the hash mathced with existing one, we won't insert it, to
+ * save memory
+ */
+static int inmem_add(struct btrfs_dedupe_info *dedupe_info,
+struct btrfs_dedupe_hash *hash)
+{
+   int ret = 0;
+   u16 algo = dedupe_info->hash_algo;
+   struct inmem_hash *ihash;
+
+   ihash = inmem_alloc_hash(algo);
+
+   if (!ihash)
+   return -ENOMEM;
+
+   /* Copy the data out */
+   ihash->bytenr = hash->bytenr;
+   ihash->num_bytes = hash->num_bytes;
+   memcpy(ihash->hash, hash->hash, btrfs_hash_sizes[algo]);
+
+   mutex_lock(_info->lock);
+
+   ret = inmem_insert_bytenr(_info->bytenr_root, ihash);
+   if (ret > 0) {
+   kfree(ihash);
+   ret = 0;
+   goto out;
+   }
+
+   ret = inmem_insert_hash(_info->hash_root, ihash,
+   btrfs_hash_sizes[algo]);
+   if (ret > 0) {
+   /*
+* We only keep one hash in tree to save memory, so if
+* hash conflicts, free the one to insert.
+*/
+   rb_erase(>bytenr_node, _info->bytenr_root);
+   kfree(ihash);
+   ret = 0;
+   goto out;
+   }
+
+   list_add(>lru_list, _info->lru_list);
+   dedupe_info->current_nr++;
+
+   /* Remove the last dedupe hash if we exceed limit */
+   while (dedupe_info->current_nr > dedupe_info->limit_nr) {
+   struct inmem_hash *last;
+
+   last = list_entry(dedupe_info->lru_list.prev,
+ struct inmem_hash, lru_list);
+   __inmem_del(dedupe_info, last);
+   }
+out:
+   mutex_unlock(_info->lock);
+   return 0;
+}
+
+int btrfs_dedupe_add(struct 

[PATCH v12.1 05/15] btrfs: dedupe: Introduce function to remove hash from in-memory tree

2016-07-10 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce static function inmem_del() to remove hash from in-memory
dedupe tree.
And implement btrfs_dedupe_del() and btrfs_dedup_disable() interfaces.

Also for btrfs_dedupe_disable(), add new functions to wait existing
writer and block incoming writers to eliminate all possible race.

Cc: Mark Fasheh 
Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/dedupe.c | 132 +++---
 1 file changed, 126 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/dedupe.c b/fs/btrfs/dedupe.c
index e51412b..14c57fa 100644
--- a/fs/btrfs/dedupe.c
+++ b/fs/btrfs/dedupe.c
@@ -186,12 +186,6 @@ enable:
return ret;
 }
 
-int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info)
-{
-   /* Place holder for bisect, will be implemented in later patches */
-   return 0;
-}
-
 static int inmem_insert_hash(struct rb_root *root,
 struct inmem_hash *hash, int hash_len)
 {
@@ -334,3 +328,129 @@ int btrfs_dedupe_add(struct btrfs_trans_handle *trans,
return inmem_add(dedupe_info, hash);
return -EINVAL;
 }
+
+static struct inmem_hash *
+inmem_search_bytenr(struct btrfs_dedupe_info *dedupe_info, u64 bytenr)
+{
+   struct rb_node **p = _info->bytenr_root.rb_node;
+   struct rb_node *parent = NULL;
+   struct inmem_hash *entry = NULL;
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct inmem_hash, bytenr_node);
+
+   if (bytenr < entry->bytenr)
+   p = &(*p)->rb_left;
+   else if (bytenr > entry->bytenr)
+   p = &(*p)->rb_right;
+   else
+   return entry;
+   }
+
+   return NULL;
+}
+
+/* Delete a hash from in-memory dedupe tree */
+static int inmem_del(struct btrfs_dedupe_info *dedupe_info, u64 bytenr)
+{
+   struct inmem_hash *hash;
+
+   mutex_lock(_info->lock);
+   hash = inmem_search_bytenr(dedupe_info, bytenr);
+   if (!hash) {
+   mutex_unlock(_info->lock);
+   return 0;
+   }
+
+   __inmem_del(dedupe_info, hash);
+   mutex_unlock(_info->lock);
+   return 0;
+}
+
+/* Remove a dedupe hash from dedupe tree */
+int btrfs_dedupe_del(struct btrfs_trans_handle *trans,
+struct btrfs_fs_info *fs_info, u64 bytenr)
+{
+   struct btrfs_dedupe_info *dedupe_info = fs_info->dedupe_info;
+
+   if (!fs_info->dedupe_enabled)
+   return 0;
+
+   if (WARN_ON(dedupe_info == NULL))
+   return -EINVAL;
+
+   if (dedupe_info->backend == BTRFS_DEDUPE_BACKEND_INMEMORY)
+   return inmem_del(dedupe_info, bytenr);
+   return -EINVAL;
+}
+
+static void inmem_destroy(struct btrfs_dedupe_info *dedupe_info)
+{
+   struct inmem_hash *entry, *tmp;
+
+   mutex_lock(_info->lock);
+   list_for_each_entry_safe(entry, tmp, _info->lru_list, lru_list)
+   __inmem_del(dedupe_info, entry);
+   mutex_unlock(_info->lock);
+}
+
+/*
+ * Helper function to wait and block all incoming writers
+ *
+ * Use rw_sem introduced for freeze to wait/block writers.
+ * So during the block time, no new write will happen, so we can
+ * do something quite safe, espcially helpful for dedupe disable,
+ * as it affect buffered write.
+ */
+static void block_all_writers(struct btrfs_fs_info *fs_info)
+{
+   struct super_block *sb = fs_info->sb;
+
+   percpu_down_write(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1);
+   down_write(>s_umount);
+}
+
+static void unblock_all_writers(struct btrfs_fs_info *fs_info)
+{
+   struct super_block *sb = fs_info->sb;
+
+   up_write(>s_umount);
+   percpu_up_write(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1);
+}
+
+int btrfs_dedupe_disable(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_dedupe_info *dedupe_info;
+   int ret;
+
+   dedupe_info = fs_info->dedupe_info;
+
+   if (!dedupe_info)
+   return 0;
+
+   /* Don't allow disable status change in RO mount */
+   if (fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
+   /*
+* Wait for all unfinished writers and block further writers.
+* Then sync the whole fs so all current write will go through
+* dedupe, and all later write won't go through dedupe.
+*/
+   block_all_writers(fs_info);
+   ret = sync_filesystem(fs_info->sb);
+   fs_info->dedupe_enabled = 0;
+   fs_info->dedupe_info = NULL;
+   unblock_all_writers(fs_info);
+   if (ret < 0)
+   return ret;
+
+   /* now we are OK to clean up everything */
+   if (dedupe_info->backend == BTRFS_DEDUPE_BACKEND_INMEMORY)
+   inmem_destroy(dedupe_info);
+
+   crypto_free_shash(dedupe_info->dedupe_driver);
+   kfree(dedupe_info);
+  

[PATCH v12.1 00/15] Btrfs in-band de-duplication

2016-07-10 Thread Qu Wenruo
This patchset can be fetched from github:
https://github.com/adam900710/linux.git wang_dedupe_20160711

This version is just a small update, rebased to David's for-next-20160704
branch, and adds a small non-functional patch to co-ordinate dedupe and
subpage size patchset.

With this update, conflict for both dedupe and subpage patchset could be
solved quite easily.

Qu Wenruo (4):
  btrfs: delayed-ref: Add support for increasing data ref under spinlock
  btrfs: dedupe: Inband in-memory only de-duplication implement
  btrfs: relocation: Enhance error handling to avoid BUG_ON
  btrfs: dedupe: Introduce new reconfigure ioctl

Wang Xiaoguang (11):
  btrfs: expand cow_file_range() to support in-band dedup and
subpage-blocksize
  btrfs: dedupe: Introduce dedupe framework and its header
  btrfs: dedupe: Introduce function to initialize dedupe info
  btrfs: dedupe: Introduce function to add hash into in-memory tree
  btrfs: dedupe: Introduce function to remove hash from in-memory tree
  btrfs: dedupe: Introduce function to search for an existing hash
  btrfs: dedupe: Implement btrfs_dedupe_calc_hash interface
  btrfs: ordered-extent: Add support for dedupe
  btrfs: dedupe: Add ioctl for inband dedupelication
  btrfs: improve inode's outstanding_extents computation
  btrfs: dedupe: fix false ENOSPC

 fs/btrfs/Makefile|   2 +-
 fs/btrfs/ctree.h |  25 +-
 fs/btrfs/dedupe.c| 820 +++
 fs/btrfs/dedupe.h| 221 +++
 fs/btrfs/delayed-ref.c   |  30 +-
 fs/btrfs/delayed-ref.h   |   8 +
 fs/btrfs/disk-io.c   |   4 +
 fs/btrfs/extent-tree.c   |  83 +++-
 fs/btrfs/extent_io.c |  63 ++-
 fs/btrfs/extent_io.h |  15 +-
 fs/btrfs/file.c  |  26 +-
 fs/btrfs/free-space-cache.c  |   5 +-
 fs/btrfs/inode-map.c |   4 +-
 fs/btrfs/inode.c | 453 -
 fs/btrfs/ioctl.c |  93 -
 fs/btrfs/ordered-data.c  |  46 ++-
 fs/btrfs/ordered-data.h  |  14 +
 fs/btrfs/relocation.c|  46 ++-
 fs/btrfs/sysfs.c |   2 +
 fs/btrfs/tests/extent-io-tests.c |   6 +-
 fs/btrfs/tests/inode-tests.c |  12 +-
 include/uapi/linux/btrfs.h   |  55 +++
 22 files changed, 1872 insertions(+), 161 deletions(-)
 create mode 100644 fs/btrfs/dedupe.c
 create mode 100644 fs/btrfs/dedupe.h

-- 
2.9.0



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: add option to run balance as daemon

2016-07-10 Thread Satoru Takeuchi
On 2016/06/22 0:16, Austin S. Hemmelgarn wrote:
> Currently, balance operations are run synchronously in the foreground.
> This is nice for interactive management, but is kind of crappy when you
> start looking at automation and similar things.
> 
> This patch adds an option to `btrfs balance start` to tell it to
> daemonize prior to running the balance operation, thus allowing us to
> preform balances asynchronously.  The two biggest use cases I have for
> this are starting a balance on a remote server without establishing a
> full shell session, and being able to background the balance in a
> recovery shell (which usually has no job control) so I can still get
> progress information.
> 
> Because it simply daemonizes prior to calling the balance ioctl, this
> doesn't actually need any kernel support.
> 
> Signed-off-by: Austin S. Hemmelgarn 
> ---
> This works as is, but there are two specific things I would love to
> eventually fix but don't have the time to fix right now:
> * There is no way to get any feedback from the balance operation.
> * Because of how everything works, trying to start a new balance with
>   --background while one iw already running won't return an error but
>   won't queue or start a new balance either.
> 
> The first one is more a utility item than anything else, and probably
> would not be hard to add.  Ideally, it should be output to a user
> specified file, and this should work even for a normal foreground balance.
> 
> The second is very much a UX issue, but can't be easily sovled without
> doing some creative process monitoring from the parrent processes.
> 
>  Documentation/btrfs-balance.asciidoc |  2 ++
>  cmds-balance.c   | 43 
> +++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/btrfs-balance.asciidoc 
> b/Documentation/btrfs-balance.asciidoc
> index 7df40b9..f487dbb 100644
> --- a/Documentation/btrfs-balance.asciidoc
> +++ b/Documentation/btrfs-balance.asciidoc
> @@ -85,6 +85,8 @@ act on system chunks (requires '-f'), see `FILTERS` section 
> for details about 'f
>  be verbose and print balance filter arguments
>  -f
>  force reducing of metadata integrity, eg. when going from 'raid1' to 'single'
> +--background
> +run the balance operation asynchronously in the background
>  
>  *status* [-v] ::
>  Show status of running or paused balance.
> diff --git a/cmds-balance.c b/cmds-balance.c
> index 708bbf4..66169b7 100644
> --- a/cmds-balance.c
> +++ b/cmds-balance.c
> @@ -20,6 +20,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include 
>  
>  #include "kerncompat.h"
> @@ -510,6 +513,7 @@ static const char * const cmd_balance_start_usage[] = {
>   "-v be verbose",
>   "-f force reducing of metadata integrity",
>   "--full-balance do not print warning and do not delay start",
> + "--background   run the balance as a background process",
>   NULL
>  };
>  
> @@ -520,6 +524,7 @@ static int cmd_balance_start(int argc, char **argv)
>   , NULL };
>   int force = 0;
>   int verbose = 0;
> + int background = 0;
>   unsigned start_flags = 0;
>   int i;
>  
> @@ -527,7 +532,8 @@ static int cmd_balance_start(int argc, char **argv)
>  
>   optind = 1;
>   while (1) {
> - enum { GETOPT_VAL_FULL_BALANCE = 256 };
> + enum { GETOPT_VAL_FULL_BALANCE = 256,
> + GETOPT_VAL_BACKGROUND = 257 };
>   static const struct option longopts[] = {
>   { "data", optional_argument, NULL, 'd'},
>   { "metadata", optional_argument, NULL, 'm' },
> @@ -536,6 +542,8 @@ static int cmd_balance_start(int argc, char **argv)
>   { "verbose", no_argument, NULL, 'v' },
>   { "full-balance", no_argument, NULL,
>   GETOPT_VAL_FULL_BALANCE },
> + { "background", no_argument, NULL,
> + GETOPT_VAL_BACKGROUND },
>   { NULL, 0, NULL, 0 }
>   };
>  
> @@ -574,6 +582,9 @@ static int cmd_balance_start(int argc, char **argv)
>   case GETOPT_VAL_FULL_BALANCE:
>   start_flags |= BALANCE_START_NOWARN;
>   break;
> + case GETOPT_VAL_BACKGROUND:
> + background = 1;
> + break;
>   default:
>   usage(cmd_balance_start_usage);
>   }
> @@ -626,6 +637,36 @@ static int cmd_balance_start(int argc, char **argv)
>   args.flags |= BTRFS_BALANCE_FORCE;
>   if (verbose)
>   dump_ioctl_balance_args();
> + if (background) {
> + switch (fork()) {
> + case (-1):
> + error("Unable to fork to run balance in 

[RESEND PATCH] btrfs: Fix slab accounting flags

2016-07-10 Thread Nikolay Borisov
BTRFS is using a variety of slab caches to satisfy internal needs.
Those slab caches are always allocated with the SLAB_RECLAIM_ACCOUNT,
meaning allocations from the caches are going to be accounted as
SReclaimable. At the same time btrfs is not registering any shrinkers
whatsoever, thus preventing memory from the slabs to be shrunk. This
means those caches are not in fact reclaimable.

To fix this remove the SLAB_RECLAIM_ACCOUNT on all caches apart from the
inode cache, since this one is being freed by the generic VFS super_block
shrinker. Also set the transaction related caches as SLAB_TEMPORARY,
to better document the lifetime of the objects (it just translates
to SLAB_RECLAIM_ACCOUNT).

Signed-off-by: Nikolay Borisov 
Reviewed-by: David Sterba 
---
 fs/btrfs/backref.c   | 2 +-
 fs/btrfs/delayed-inode.c | 2 +-
 fs/btrfs/delayed-ref.c   | 8 
 fs/btrfs/disk-io.c   | 2 +-
 fs/btrfs/extent_io.c | 4 ++--
 fs/btrfs/extent_map.c| 2 +-
 fs/btrfs/file.c  | 2 +-
 fs/btrfs/inode.c | 8 
 fs/btrfs/ordered-data.c  | 2 +-
 9 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index 8bb3509099e8..1ac8a3c7f68e 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -139,7 +139,7 @@ int __init btrfs_prelim_ref_init(void)
btrfs_prelim_ref_cache = kmem_cache_create("btrfs_prelim_ref",
sizeof(struct __prelim_ref),
0,
-   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+   SLAB_MEM_SPREAD,
NULL);
if (!btrfs_prelim_ref_cache)
return -ENOMEM;
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 61561c2a3f96..4e5cd7dde3c5 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -34,7 +34,7 @@ int __init btrfs_delayed_inode_init(void)
delayed_node_cache = kmem_cache_create("btrfs_delayed_node",
sizeof(struct btrfs_delayed_node),
0,
-   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+   SLAB_MEM_SPREAD,
NULL);
if (!delayed_node_cache)
return -ENOMEM;
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 430b3689b112..08e452599249 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -940,28 +940,28 @@ int btrfs_delayed_ref_init(void)
btrfs_delayed_ref_head_cachep = kmem_cache_create(
"btrfs_delayed_ref_head",
sizeof(struct btrfs_delayed_ref_head), 0,
-   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
+   SLAB_MEM_SPREAD, NULL);
if (!btrfs_delayed_ref_head_cachep)
goto fail;
 
btrfs_delayed_tree_ref_cachep = kmem_cache_create(
"btrfs_delayed_tree_ref",
sizeof(struct btrfs_delayed_tree_ref), 0,
-   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
+   SLAB_MEM_SPREAD, NULL);
if (!btrfs_delayed_tree_ref_cachep)
goto fail;
 
btrfs_delayed_data_ref_cachep = kmem_cache_create(
"btrfs_delayed_data_ref",
sizeof(struct btrfs_delayed_data_ref), 0,
-   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
+   SLAB_MEM_SPREAD, NULL);
if (!btrfs_delayed_data_ref_cachep)
goto fail;
 
btrfs_delayed_extent_op_cachep = kmem_cache_create(
"btrfs_delayed_extent_op",
sizeof(struct btrfs_delayed_extent_op), 0,
-   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD, NULL);
+   SLAB_MEM_SPREAD, NULL);
if (!btrfs_delayed_extent_op_cachep)
goto fail;
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 54cca7a1572b..cb1a05da5682 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -101,7 +101,7 @@ int __init btrfs_end_io_wq_init(void)
btrfs_end_io_wq_cache = kmem_cache_create("btrfs_end_io_wq",
sizeof(struct btrfs_end_io_wq),
0,
-   SLAB_RECLAIM_ACCOUNT | SLAB_MEM_SPREAD,
+   SLAB_MEM_SPREAD,
NULL);
if (!btrfs_end_io_wq_cache)
return -ENOMEM;
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 

Re: Unable to mount degraded RAID5

2016-07-10 Thread Chris Murphy
On Sun, Jul 10, 2016 at 1:01 AM, Tomáš Hrdina  wrote:

> sudo btrfs check --repair /dev/sdc
> enabling repair mode
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> parent transid verify failed on 7008807157760 wanted 70175 found 70133
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> checksum verify failed on 7008807157760 found F192848C wanted 1571393A
> bytenr mismatch, want=7008807157760, have=65536
> Checking filesystem on /dev/sdc
> UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
> checking extents
> parent transid verify failed on 7009468874752 wanted 70180 found 70133
> parent transid verify failed on 7009468874752 wanted 70180 found 70133
> checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
> checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
> bytenr mismatch, want=7009468874752, have=65536
> parent transid verify failed on 7008859045888 wanted 70175 found 70133
> parent transid verify failed on 7008859045888 wanted 70175 found 70133
> checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
> checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
> bytenr mismatch, want=7008859045888, have=65536
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
> parent transid verify failed on 7008899547136 wanted 70175 found 70133
> Ignoring transid failure
> leaf parent key incorrect 7008899547136
> bad block 7008899547136
> Errors found in extent allocation tree or chunk allocation
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> parent transid verify failed on 7009074167808 wanted 70175 found 70133
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
> bytenr mismatch, want=7009074167808, have=65536


OK well it was all a goose chase then. These are all the same messages
from 4 days ago also. The central problem appears to be checksum
verifications on multiple blocks, which really doesn't make sense to
me because it should be able to reconstruct from parity.

How is it possible to have four root trees, all of which point to
different leaf/nodes, all of which have some kind of checksum failure?
None of them are good? And none of them can be reconstructed? Sounds
fishy.

You try to plug each of those bytenr's into

btrfs-debug-tree -b   and see if it'll show you what leaf
information is there that it doesn't like. But if there's a csum
mismatch, it may refuse to show anything, rather than show it and say
it's unreliable due to csum mismatch.  If it refuses to show it you
could plug each of those failed bytenrs into

btrfs-map-logical -l  and get a device and physical sector,
then you can get the entire leaf, compute a new csum and overwrite the
current one. That way it now passes csum and see if that's the only
problem, or if there's another brick wall later. Of course, if the
csum was correct, and it's the metadata that's bad, honoring bad
metadata as valid might cause a bad fix and then the whole thing
implodes. But you're pretty much there already I'd say.

If I were to pick an address to start with, it'd be this one.

> leaf parent key incorrect 7008899547136
> bad block 7008899547136

But other than that, I'm out of ideas. It's completely reasonable to
just give up at this point.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to mount degraded RAID5

2016-07-10 Thread Tomáš Hrdina
After every step, I tried mount fs with ro, ro,recovery and
ro,degraded,recovery. If failed, I moved to next step.


sudo btrfs check --repair /dev/sdc
enabling repair mode
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
Checking filesystem on /dev/sdc
UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
checking extents
parent transid verify failed on 7009468874752 wanted 70180 found 70133
parent transid verify failed on 7009468874752 wanted 70180 found 70133
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
bytenr mismatch, want=7009468874752, have=65536
parent transid verify failed on 7008859045888 wanted 70175 found 70133
parent transid verify failed on 7008859045888 wanted 70175 found 70133
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
bytenr mismatch, want=7008859045888, have=65536
parent transid verify failed on 7008899547136 wanted 70175 found 70133
parent transid verify failed on 7008899547136 wanted 70175 found 70133
checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
parent transid verify failed on 7008899547136 wanted 70175 found 70133
Ignoring transid failure
leaf parent key incorrect 7008899547136
bad block 7008899547136
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536


btrfs check -r  --repair

I didn't use any bytenr for recovery. Recovery worked without -t.

sudo btrfs-find-root /dev/sdc
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
Superblock thinks the generation is 70182
Superblock thinks the level is 1
Found tree root at 6062830010368 gen 70182 level 1
Well block 6062434418688(gen: 70181 level: 1) seems good, but
generation/level doesn't match, want gen: 70182 level: 1
Well block 6062497202176(gen: 69186 level: 0) seems good, but
generation/level doesn't match, want gen: 70182 level: 1
Well block 6062470332416(gen: 69186 level: 0) seems good, but
generation/level doesn't match, want gen: 70182 level: 1


sudo btrfs check -r 6062830010368 --repair /dev/sdc
enabling repair mode
parent transid verify failed on 7008807157760 wanted 70175 found 70133
parent transid verify failed on 7008807157760 wanted 70175 found 70133
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
checksum verify failed on 7008807157760 found F192848C wanted 1571393A
bytenr mismatch, want=7008807157760, have=65536
Checking filesystem on /dev/sdc
UUID: 2dab74bb-fc73-4c47-a413-a55840f6f71e
checking extents
parent transid verify failed on 7009468874752 wanted 70180 found 70133
parent transid verify failed on 7009468874752 wanted 70180 found 70133
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
checksum verify failed on 7009468874752 found 2B10421A wanted CFF3FFAC
bytenr mismatch, want=7009468874752, have=65536
parent transid verify failed on 7008859045888 wanted 70175 found 70133
parent transid verify failed on 7008859045888 wanted 70175 found 70133
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
checksum verify failed on 7008859045888 found 7313A127 wanted 97F01C91
bytenr mismatch, want=7008859045888, have=65536
parent transid verify failed on 7008899547136 wanted 70175 found 70133
parent transid verify failed on 7008899547136 wanted 70175 found 70133
checksum verify failed on 7008899547136 found 2B6F9045 wanted CF8C2DF3
parent transid verify failed on 7008899547136 wanted 70175 found 70133
Ignoring transid failure
leaf parent key incorrect 7008899547136
bad block 7008899547136
Errors found in extent allocation tree or chunk allocation
parent transid verify failed on 7009074167808 wanted 70175 found 70133
parent transid verify failed on 7009074167808 wanted 70175 found 70133
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
checksum verify failed on 7009074167808 found FDA6D1F0 wanted 19456C46
bytenr mismatch, want=7009074167808, have=65536


sudo btrfs check -r 6062434418688 --repair /dev/sdc
enabling repair mode
parent transid verify failed on 6062434418688 wanted 70182 found 70181
parent transid verify failed on 6062434418688 wanted 70182 found 70181
checksum verify failed on 6062434418688 found F868085E wanted 1C8BB5E8
parent