Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Peter Becker
> 2016-12-08 16:11 GMT+01:00 Swâmi Petaramesh :
>
> Then it took another 48 hours just for "loading the hashes of duplicate
> extents".
>

This issue i adressing currently with the following patches:
https://github.com/Floyddotnet/duperemove/commits/digest_trigger

Tested with a 3,9 TB directory, with 4723 objects:

old implementation of dbfile_load_hashes took 36593ms
new implementation of dbfile_load_hashes took 11ms

You can use this versions save. But i have to do more work. (for
example a migrationscript for existing hashfiles)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-convert: Fix migrate_super_block() to work with 64k sectorsize

2016-12-08 Thread Chandan Rajendra
On Friday, December 09, 2016 09:09:29 AM Qu Wenruo wrote:
> 
> At 12/08/2016 09:56 PM, Chandan Rajendra wrote:
> > migrate_super_block() uses sectorsize to refer to the size of the
> > superblock. Hence on 64k sectorsize filesystems, it ends up computing
> > checksum beyond the super block length (i.e.
> > BTRFS_SUPER_INFO_SIZE). This commit fixes the bug by using
> > BTRFS_SUPER_INFO_SIZE instead of sectorsize of the underlying
> > filesystem.
> >
> > Signed-off-by: Chandan Rajendra 
> 
> Reviewed-by: Qu Wenruo 
> 
> BTW would you please enhance the convert tests?
> Current convert tests only uses 4K as block size.
> So adding 64K blocksize would definitely improve the tests.
>

Thanks for the hint. I just executed btrfs/012 with 64k blocksize hardcoded
and found that 'btrfs rollback' failed. I will fix rollback first and then
work on getting btrfs/012 to support 64k blocksize.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] Btrfs: add DAX support for nocow btrfs

2016-12-08 Thread Dave Chinner
On Wed, Dec 07, 2016 at 01:45:08PM -0800, Liu Bo wrote:
> Since I haven't figure out how to map multiple devices to userspace without
> pagecache, this DAX support is only for single-device, and I don't think
> DAX(Direct Access) can work with cow, this is limited to nocow case.  I made
> this by setting nodatacow in dax mount option.

DAX can be made to work with COW quite easily - it's already been
done, in fact. Go look up Nova for how it works with DAX:

https://github.com/Andiry/nova

Essentially, it has a set of "temporary pages" it links to the inode
where writes are done directly, and when a synchronisation event
occurs it pulls them from the per-inode list, does whatever
transformations are needed (e.g. CRC calculation, mirroring, etc)
and marks them them as current in the inode extent list.

When a new overwrite comes along, it allocates a new block in the
temporary page list, copies the existing data into it, and then uses
that block for DAX until the next synchronisation event occurs.

For XFS, CoW for DAX through read/write isn't really any different
to the direct IO path we currently already have. And for page write
faults on shared extents, instead of zeroing the newly allocated
block we simply copy the original data into the new block before the
allocation returns. It does mean, however, that XFS does not have
the capability for data transformations in the IO path. This limits
us to atomic write devices (software raid 0 or hardware redundancy
such as DIMM mirroring), but we can still do out-of-band online data
transformations and movement (e.g. dedupe, defrag) with DAX.

Yes, I know these methods are very different to how btrfs uses COW.
However, my point is that DAX and CoW and/or mulitple devices are
not incompatible if the architecture is correctly structured. i.e
DAX should be able to work even with most of btrfs's special magic
still enabled.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] Btrfs: add mount option for dax

2016-12-08 Thread Dave Chinner
On Wed, Dec 07, 2016 at 01:45:05PM -0800, Liu Bo wrote:
> Signed-off-by: Liu Bo 
> ---
>  fs/btrfs/ctree.h |  1 +
>  fs/btrfs/super.c | 40 +++-
>  2 files changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 0b8ce2b..e54c6e6 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1317,6 +1317,7 @@ static inline u32 BTRFS_MAX_XATTR_SIZE(const struct 
> btrfs_root *root)
>  #define BTRFS_MOUNT_FRAGMENT_METADATA(1 << 25)
>  #define BTRFS_MOUNT_FREE_SPACE_TREE  (1 << 26)
>  #define BTRFS_MOUNT_NOLOGREPLAY  (1 << 27)
> +#define BTRFS_MOUNT_DAX  (1 << 28)
>  
>  #define BTRFS_DEFAULT_COMMIT_INTERVAL(30)
>  #define BTRFS_DEFAULT_MAX_INLINE (2048)
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 74ed5aa..9b18f3d 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -323,7 +323,7 @@ enum {
>   Opt_commit_interval, Opt_barrier, Opt_nodefrag, Opt_nodiscard,
>   Opt_noenospc_debug, Opt_noflushoncommit, Opt_acl, Opt_datacow,
>   Opt_datasum, Opt_treelog, Opt_noinode_cache, Opt_usebackuproot,
> - Opt_nologreplay, Opt_norecovery,
> + Opt_nologreplay, Opt_norecovery, Opt_dax,

Can we please not create more filesystems with a DAX mount option?
This was only even an enabler, and not meant to be a permanent
thing. The permanent functionality for DAX is supposed to be
per-inode inheritable DAX flags - not mount options - so that
applications can choose on a per file basis to enable/disable DAX
access as they see fit.

This also enables the filesystem to reject the attempt to turn on
DAX if the set of contexts for the file are not DAX compatible

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: btrfs-convert: Prevent accounting blocks beyond end of device

2016-12-08 Thread Chandan Rajendra
On Friday, December 09, 2016 09:03:57 AM Qu Wenruo wrote:
> Hi Chandan,
> 
> Thanks for the patch.
> 
> At 12/08/2016 09:56 PM, Chandan Rajendra wrote:
> > When looping across data block bitmap, __ext2_add_one_block() may add
> > blocks which do not exist on the underlying disk. This commit prevents
> > this from happening by checking the block index against the maximum
> > block count that was present in the ext4 filesystem instance that is
> > being converted.
> 
> The patch looks good to me.
> 
> Reviewed-by: Qu Wenruo 
> 
> Just curious about if such image can pass e2fsck.
> And would you please upload a minimal image as btrfs-progs test case?
> 

Hi Qu,

Such an ext4 filesystem can be consistently created on ppc64 with 64k as the
blocksize of the filesystem. Also, the filesystem thus created passes e2fsck.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Chris Murphy
On Thu, Dec 8, 2016 at 8:11 AM, Swâmi Petaramesh  wrote:

> Well, the damn thing has been running for 15 days uninterrupted !
> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
> wasn't expecting it to last THAT long...).

Can you check some bigger files and see if they've become fragmented?
I'm seeing 1.4GiB files with 2-3 extents reported by filefrag, go to
over 5000 fragments during dedupe. This is not something I recall
happening some months ago.

I inadvertently replied to the wrong dedupe thread about my test and
what I'm finding, it's here.
https://www.spinics.net/lists/linux-btrfs/msg61304.html

But if you're seeing something similar, then it would explain why it's
so slow in your case.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Chris Murphy
On Thu, Dec 8, 2016 at 7:26 PM, Darrick J. Wong  wrote:
> On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:
>> OK something's wrong.
>>
>> Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
>> (mkfs.btrfs -dsingle -msingle, default mount options) and two
>> identical files separately copied.
>>
>> [chris@f25s]$ ls -li /mnt/test
>> total 2811904
>> 260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> 259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
>> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>>
>> [chris@f25s]$ filefrag /mnt/test/*
>> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
>> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found
>>
>>
>> [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
>> Using 128K blocks
>> Using hash: murmur3
>> Gathering file list...
>> Using 4 threads for file hashing phase
>> [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
>> [2/2] (100.00%) csum: 
>> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
>> Total files:  2
>> Total hashes: 21968
>> Loading only duplicated hashes from hashfile.
>> Using 4 threads for dedupe phase
>> [0xba8400] (1/10947) Try to dedupe extents with id e47862ea
>> [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
>> [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
>> [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
>> [0xba8540] Add extent for file
>> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
>> 1182924800 (4)
>> [0xba8540] Add extent for file
>> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
>> 1182924800 (5)
>> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
>> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
>
> Ew, it's deduping these two 1.4GB files 128K at a time, which results in
> 12000 ioctl calls.  Each of those 12000 calls has to lock the two
> inodes, read the file contents, remap the blocks, etc.  instead of
> finding the maximal identical range and making a single call for the
> whole range.
>
> That's probably why it's taking forever to dedupe.

Yes but it looks like it's also heavily fragmenting the files as a
result as well.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/10] Qgroup fixes for kdave/for-next-20161125 branch

2016-12-08 Thread Qu Wenruo
The branch can be fetched from github:
https://github.com/adam900710/linux.git for-david-next-qgroup-fixes

If David wants to push these fixes to 4.10, then I can rebase these patches to
Chris' for-linus branch.

Recent qgroup fixes for several problems:
1) Qgroup reserved space underflow
   Caused by several reasons, from buffer write happens before qgroup enable,
   to freeing qgroup space not reserved by caller.

2) inode_cache mount option corruption

3) Enhance qgroup trace point
   Used for debugging above problems.

All patches are already submitted to mail list.
The 1st patch is the diff between V4 and V5 patch which adds WARN_ON() for
underflowing qgroup reserved space.

Qu Wenruo (10):
  btrfs: qgroup: Fix wrong qgroup passed to reserved space error report
  btrfs: qgroup: Add trace point for qgroup reserved space
  btrfs: qgroup: Re-arrange tracepoint timing to co-operate with
reserved space tracepoint
  btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount
option
  btrfs: qgroup: Add quick exit for non-fs extents
  btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  btrfs: qgroup: Return actually freed bytes for qgroup release or free
data
  btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered
write and quota enable
  btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  btrfs: qgroup: Fix qgroup reserved space underflow by only freeing
reserved ranges

 fs/btrfs/ctree.h |  14 +--
 fs/btrfs/extent-tree.c   |  32 ---
 fs/btrfs/extent_io.h |  24 +-
 fs/btrfs/file.c  |  45 ++
 fs/btrfs/inode-map.c |   4 +-
 fs/btrfs/inode.c |  69 +--
 fs/btrfs/ioctl.c |   9 +-
 fs/btrfs/qgroup.c| 197 +++
 fs/btrfs/qgroup.h|  14 ++-
 fs/btrfs/relocation.c|  12 +--
 fs/btrfs/transaction.c   |  20 ++---
 include/trace/events/btrfs.h |  43 ++
 12 files changed, 341 insertions(+), 142 deletions(-)

-- 
2.10.2



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Darrick J. Wong
On Thu, Dec 08, 2016 at 05:45:40PM -0700, Chris Murphy wrote:
> OK something's wrong.
> 
> Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
> (mkfs.btrfs -dsingle -msingle, default mount options) and two
> identical files separately copied.
> 
> [chris@f25s]$ ls -li /mnt/test
> total 2811904
> 260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
> 259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
> Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
> 
> [chris@f25s]$ filefrag /mnt/test/*
> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found
> 
> 
> [chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
> Using 128K blocks
> Using hash: murmur3
> Gathering file list...
> Using 4 threads for file hashing phase
> [1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
> [2/2] (100.00%) csum: 
> /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
> Total files:  2
> Total hashes: 21968
> Loading only duplicated hashes from hashfile.
> Using 4 threads for dedupe phase
> [0xba8400] (1/10947) Try to dedupe extents with id e47862ea
> [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
> [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
> [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
> [0xba8540] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 1182924800 (4)
> [0xba8540] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 1182924800 (5)
> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"

Ew, it's deduping these two 1.4GB files 128K at a time, which results in
12000 ioctl calls.  Each of those 12000 calls has to lock the two
inodes, read the file contents, remap the blocks, etc.  instead of
finding the maximal identical range and making a single call for the
whole range.

That's probably why it's taking forever to dedupe.

--D

> [0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 543293440 (4)
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 543293440 (5)
> [0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
> [0xba8540] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 1182924800 (5)
> [0xba8540] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 1182924800 (4)
> [0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2"
> [0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 543293440 (5)
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 543293440 (4)
> [0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2"
> [0xba84f0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 101580800 (4)
> [0xba84f0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 101580800 (5)
> [0xba84f0] Dedupe 1 extents (id: ffeefcdd) with target: (101580800,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
> [0xba84a0] (5/10947) Try to dedupe extents with id ffe24eaf
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 171835392 (4)
> [0xba84a0] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 171835392 (5)
> [0xba84a0] Dedupe 1 extents (id: ffe24eaf) with target: (171835392,
> 131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
> [0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
> [0xba8540] (6/10947) Try to dedupe extents with id ffe116c8
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
> 52035584 (4)
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 52035584 (5)
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 52166656 (5)
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 60030976 (5)
> [0xba8400] Add extent for file
> "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
> 

[PATCH v3 2/2] fstests: btrfs: Use _require_btrfs_qgroup_report to replace open code

2016-12-08 Thread Qu Wenruo
Introduce new _require_btrfs_qgroup_report function, which will check
the accessibility to "btrfs check --qgroup-report", then set a global
flag to info _check_scratch_fs() to do extra qgroup check.

Signed-off-by: Qu Wenruo 
---
v2:
  Use "${RESULT_DIR}/require_scratch.require_qgroup_report" instead of
  global variant
  Rebased to latest master
  Replace btrfsck with $BTRFS_UTIL_PROG check.
v3:
  Fix a rebase error, which removes a necessary check of btrfs/042
---
 check   |  4 ++--
 common/btrfs| 22 +-
 common/rc   |  5 +++--
 tests/btrfs/022 |  4 
 tests/btrfs/028 |  5 ++---
 tests/btrfs/042 |  1 +
 tests/btrfs/099 |  1 +
 tests/btrfs/104 | 20 +---
 tests/btrfs/122 |  9 +++--
 tests/btrfs/123 |  5 ++---
 tests/btrfs/126 |  1 +
 11 files changed, 45 insertions(+), 32 deletions(-)

diff --git a/check b/check
index 8f2a1bb..76eb054 100755
--- a/check
+++ b/check
@@ -405,11 +405,11 @@ _check_filesystems()
 {
if [ -f ${RESULT_DIR}/require_test ]; then
_check_test_fs || err=true
-   rm -f ${RESULT_DIR}/require_test
+   rm -f ${RESULT_DIR}/require_test*
fi
if [ -f ${RESULT_DIR}/require_scratch ]; then
_check_scratch_fs || err=true
-   rm -f ${RESULT_DIR}/require_scratch
+   rm -f ${RESULT_DIR}/require_scratch*
fi
 }
 
diff --git a/common/btrfs b/common/btrfs
index 302edc6..a860228 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -40,6 +40,13 @@ _require_btrfs_command()
[ $? -eq 0 ] || _notrun "$BTRFS_UTIL_PROG too old (must support $cmd 
$param)"
 }
 
+# Require extra check on btrfs qgroup numbers
+_require_btrfs_qgroup_report()
+{
+   _require_btrfs_command check --qgroup-report
+   touch ${RESULT_DIR}/require_scratch.require_qgroup_report
+}
+
 _run_btrfs_util_prog()
 {
run_check $BTRFS_UTIL_PROG $*
@@ -97,7 +104,20 @@ _check_btrfs_filesystem()
mountpoint=`_umount_or_remount_ro $device`
fi
 
-   btrfsck $device >$tmp.fsck 2>&1
+   if [ -f ${RESULT_DIR}/require_scratch.require_qgroup_report ]; then
+   $BTRFS_UTIL_PROG check $device --qgroup-report > 
$tmp.qgroup_report 2>&1
+   if grep -qE "Counts for qgroup.*are different" 
$tmp.qgroup_report ; then
+   echo "_check_btrfs_filesystem: filesystem on $device 
has wrong qgroup numbers (see $seqres.full)"
+   echo "_check_btrfs_filesystem: filesystem on $device 
has wrong qgroup numbers" \
+   >> $seqres.full
+   echo "*** qgroup_report.$FSTYP output ***"  
>>$seqres.full
+   cat $tmp.qgroup_report  
>>$seqres.full
+   echo "*** qgroup_report.$FSTYP output ***"  
>>$seqres.full
+   fi
+   rm -f $tmp.qgroup_report
+   fi
+
+   $BTRFS_UTIL_PROG check $device >$tmp.fsck 2>&1
if [ $? -ne 0 ]; then
echo "_check_btrfs_filesystem: filesystem on $device is 
inconsistent (see $seqres.full)"
 
diff --git a/common/rc b/common/rc
index 2719b23..b50283c 100644
--- a/common/rc
+++ b/common/rc
@@ -1222,8 +1222,9 @@ _notrun()
 {
 echo "$*" > $seqres.notrun
 echo "$seq not run: $*"
-rm -f ${RESULT_DIR}/require_test
-rm -f ${RESULT_DIR}/require_scratch
+rm -f ${RESULT_DIR}/require_test*
+rm -f ${RESULT_DIR}/require_scratch*
+
 status=0
 exit
 }
diff --git a/tests/btrfs/022 b/tests/btrfs/022
index 56d4f3d..9abad6c 100755
--- a/tests/btrfs/022
+++ b/tests/btrfs/022
@@ -43,6 +43,7 @@ _cleanup()
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch
+_require_btrfs_qgroup_report
 
 rm -f $seqres.full
 
@@ -125,16 +126,19 @@ _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
 _basic_test
 _scratch_unmount
+_check_scratch_fs
 
 _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
 _rescan_test
 _scratch_unmount
+_check_scratch_fs
 
 _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
 _limit_test_exceed
 _scratch_unmount
+_check_scratch_fs
 
 _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
diff --git a/tests/btrfs/028 b/tests/btrfs/028
index 1425609..a3d9a27 100755
--- a/tests/btrfs/028
+++ b/tests/btrfs/028
@@ -51,6 +51,7 @@ rm -f $seqres.full
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch
+_require_btrfs_qgroup_report
 
 _scratch_mkfs
 _scratch_mount
@@ -86,9 +87,7 @@ _run_btrfs_util_prog filesystem sync $SCRATCH_MNT
 
 _scratch_unmount
 
-# generate a qgroup report and look for inconsistent groups
-$BTRFS_UTIL_PROG check --qgroup-report $SCRATCH_DEV 2>&1 | \
-   grep -E "Counts for qgroup.*are different"
+# qgroup will be checked at _check_scratch_fs() by fstest.
 echo "Silence is golden"
 status=0
 
diff --git a/tests/btrfs/042 b/tests/btrfs/042
index 498ccc9..cf3eac2 100755
--- a/tests/btrfs/042
+++ b/tests/btrfs/042
@@ -43,6 +43,7 @@ _cleanup()
 

WARNING at extent-tree.c:1621 lookup_inline_extent_backref and extent-tree.c:2960 btrfs_run_delayed_refs (Linux 4.8.11)

2016-12-08 Thread Dāvis Mosāns
Basically some sectors on multi-disk btrfs filesystem got corrupted
due HDD fault.
Now after scrub have finished (with some uncorrectable errors) when
doing metadata balance

$ btrfs balance start -m /mnt/Data

[  612.683979] WARNING: CPU: 0 PID: 6 at
/mnt/linux/fs/btrfs/extent-tree.c:1621
lookup_inline_extent_backref+0x45a/0x5a0 [btrfs]
[  612.683980] Modules linked in: xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun cfg80211 rfkill nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT xt_tcpudp ip6t_rpfilter ipt_REJECT
nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set
nfnetlink ebtable_broute ebtable_nat ip6table_mangle ip6table_security
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw
iptable_mangle iptable_security iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw
ebtable_filter ebtables ip6table_filter bridge ip6_tables stp
iptable_filter llc it87 hwmon_vid kvm_amd kvm irqbypass
crct10dif_pclmul saa7134_alsa crc32_pclmul snd_hda_codec_hdmi
snd_hda_codec_realtek ghash_clmulni_intel aesni_intel
snd_hda_codec_generic aes_x86_64 saa7134 lrw gf128mul snd_hda_intel
[  612.684023]  tveeprom snd_hda_codec glue_helper rc_core ablk_helper
snd_hda_core v4l2_common cryptd videobuf2_dma_sg videobuf2_memops
videobuf2_v4l2 videobuf2_core videodev mac_hid evdev mxm_wmi pcspkr
k10temp fam15h_power media r8169 mii snd_hwdep snd_pcm snd_timer snd
sp5100_tco acpi_cpufreq shpchp tpm_infineon tpm_tis tpm_tis_core wmi
button tpm i2c_piix4 soundcore nfsd auth_rpcgss oid_registry nfs_acl
lockd sch_fq_codel grace sunrpc nvidia_uvm(PO) btrfs xor zlib_deflate
raid6_pq sd_mod ata_generic pata_acpi serio_raw atkbd libps2
nvidia_drm(PO) nvidia_modeset(PO) ohci_pci nvidia(PO) drm_kms_helper
syscopyarea sysfillrect sysimgblt fb_sys_fops xhci_pci drm xhci_hcd
firewire_ohci ohci_hcd ehci_pci ehci_hcd firewire_core crc_itu_t
pata_atiixp usbcore ahci libahci usb_common i2c_core i8042 serio
[  612.684077]  mvsas libsas scsi_transport_sas libata scsi_mod
crc32c_generic crc32c_intel vfat fat ip_tables x_tables
[  612.684088] CPU: 0 PID: 6 Comm: kworker/u16:0 Tainted: P
O4.8.11-ARCH-dirty #1
[  612.684089] Hardware name: Gigabyte Technology Co., Ltd.
GA-990FXA-UD3/GA-990FXA-UD3, BIOS FFe 11/08/2013
[  612.684107] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[  612.684110]  0286 defb6477 880612f27a10
812e9a00
[  612.684113]    880612f27a50
8107bcab
[  612.684116]  06550053 880477ed6e00 009626bbf000

[  612.684119] Call Trace:
[  612.684125]  [] dump_stack+0x63/0x83
[  612.684128]  [] __warn+0xcb/0xf0
[  612.684131]  [] warn_slowpath_null+0x1d/0x20
[  612.684145]  []
lookup_inline_extent_backref+0x45a/0x5a0 [btrfs]
[  612.684159]  [] ? leaf_space_used+0xeb/0x120 [btrfs]
[  612.684174]  []
insert_inline_extent_backref+0x68/0xf0 [btrfs]
[  612.684188]  [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[  612.684202]  []
__btrfs_inc_extent_ref.isra.22+0x9c/0x260 [btrfs]
[  612.684218]  [] ? find_ref_head+0x6f/0x80 [btrfs]
[  612.684233]  []
__btrfs_run_delayed_refs+0x768/0x12e0 [btrfs]
[  612.684249]  [] btrfs_run_delayed_refs+0xa1/0x2a0 [btrfs]
[  612.684264]  [] delayed_ref_async_start+0x94/0xb0 [btrfs]
[  612.684281]  [] btrfs_scrubparity_helper+0x7d/0x340 [btrfs]
[  612.684298]  [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
[  612.684302]  [] process_one_work+0x1e5/0x470
[  612.684304]  [] worker_thread+0x48/0x4e0
[  612.684307]  [] ? process_one_work+0x470/0x470
[  612.684310]  [] kthread+0xd8/0xf0
[  612.684312]  [] ? __switch_to+0x300/0x720
[  612.684316]  [] ret_from_fork+0x1f/0x40
[  612.684319]  [] ? kthread_worker_fn+0x170/0x170
[  612.684320] ---[ end trace 89b3284677d454e0 ]---
[  612.684323] [ cut here ]
[  612.684339] WARNING: CPU: 0 PID: 6 at
/mnt/linux/fs/btrfs/extent-tree.c:2960
btrfs_run_delayed_refs+0x277/0x2a0 [btrfs]
[  612.684340] BTRFS: Transaction aborted (error -5)
[  612.684340] Modules linked in: xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun cfg80211 rfkill nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT xt_tcpudp ip6t_rpfilter ipt_REJECT
nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set
nfnetlink ebtable_broute ebtable_nat ip6table_mangle ip6table_security
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw
iptable_mangle iptable_security iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw
ebtable_filter ebtables ip6table_filter bridge ip6_tables stp
iptable_filter llc it87 hwmon_vid kvm_amd kvm irqbypass
crct10dif_pclmul saa7134_alsa crc32_pclmul snd_hda_codec_hdmi
snd_hda_codec_realtek ghash_clmulni_intel aesni_intel
snd_hda_codec_generic aes_x86_64 saa7134 lrw gf128mul snd_hda_intel
[  612.684375]  tveeprom snd_hda_codec glue_helper rc_core ablk_helper
snd_hda_core v4l2_common cryptd videobuf2_dma_sg videobuf2_memops

Re: [PATCH 2/2] btrfs-convert: Fix migrate_super_block() to work with 64k sectorsize

2016-12-08 Thread Qu Wenruo



At 12/08/2016 09:56 PM, Chandan Rajendra wrote:

migrate_super_block() uses sectorsize to refer to the size of the
superblock. Hence on 64k sectorsize filesystems, it ends up computing
checksum beyond the super block length (i.e.
BTRFS_SUPER_INFO_SIZE). This commit fixes the bug by using
BTRFS_SUPER_INFO_SIZE instead of sectorsize of the underlying
filesystem.

Signed-off-by: Chandan Rajendra 


Reviewed-by: Qu Wenruo 

BTW would you please enhance the convert tests?
Current convert tests only uses 4K as block size.
So adding 64K blocksize would definitely improve the tests.

Thanks,
Qu


---
 convert/main.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index 1148a36..fd6f77b 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1360,7 +1360,7 @@ err:
 /*
  * Migrate super block to its default position and zero 0 ~ 16k
  */
-static int migrate_super_block(int fd, u64 old_bytenr, u32 sectorsize)
+static int migrate_super_block(int fd, u64 old_bytenr)
 {
int ret;
struct extent_buffer *buf;
@@ -1368,13 +1368,13 @@ static int migrate_super_block(int fd, u64 old_bytenr, 
u32 sectorsize)
u32 len;
u32 bytenr;

-   buf = malloc(sizeof(*buf) + sectorsize);
+   buf = malloc(sizeof(*buf) + BTRFS_SUPER_INFO_SIZE);
if (!buf)
return -ENOMEM;

-   buf->len = sectorsize;
-   ret = pread(fd, buf->data, sectorsize, old_bytenr);
-   if (ret != sectorsize)
+   buf->len = BTRFS_SUPER_INFO_SIZE;
+   ret = pread(fd, buf->data, BTRFS_SUPER_INFO_SIZE, old_bytenr);
+   if (ret != BTRFS_SUPER_INFO_SIZE)
goto fail;

super = (struct btrfs_super_block *)buf->data;
@@ -1382,19 +1382,20 @@ static int migrate_super_block(int fd, u64 old_bytenr, 
u32 sectorsize)
btrfs_set_super_bytenr(super, BTRFS_SUPER_INFO_OFFSET);

csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
-   ret = pwrite(fd, buf->data, sectorsize, BTRFS_SUPER_INFO_OFFSET);
-   if (ret != sectorsize)
+   ret = pwrite(fd, buf->data, BTRFS_SUPER_INFO_SIZE,
+   BTRFS_SUPER_INFO_OFFSET);
+   if (ret != BTRFS_SUPER_INFO_SIZE)
goto fail;

ret = fsync(fd);
if (ret)
goto fail;

-   memset(buf->data, 0, sectorsize);
+   memset(buf->data, 0, BTRFS_SUPER_INFO_SIZE);
for (bytenr = 0; bytenr < BTRFS_SUPER_INFO_OFFSET; ) {
len = BTRFS_SUPER_INFO_OFFSET - bytenr;
-   if (len > sectorsize)
-   len = sectorsize;
+   if (len > BTRFS_SUPER_INFO_SIZE)
+   len = BTRFS_SUPER_INFO_SIZE;
ret = pwrite(fd, buf->data, len, bytenr);
if (ret != len) {
fprintf(stderr, "unable to zero fill device\n");
@@ -2519,7 +2520,7 @@ static int do_convert(const char *devname, int datacsum, 
int packing,
 * If this step succeed, we get a mountable btrfs. Otherwise
 * the source fs is left unchanged.
 */
-   ret = migrate_super_block(fd, mkfs_cfg.super_bytenr, blocksize);
+   ret = migrate_super_block(fd, mkfs_cfg.super_bytenr);
if (ret) {
error("unable to migrate super block: %d", ret);
goto fail;




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: btrfs-convert: Prevent accounting blocks beyond end of device

2016-12-08 Thread Qu Wenruo

Hi Chandan,

Thanks for the patch.

At 12/08/2016 09:56 PM, Chandan Rajendra wrote:

When looping across data block bitmap, __ext2_add_one_block() may add
blocks which do not exist on the underlying disk. This commit prevents
this from happening by checking the block index against the maximum
block count that was present in the ext4 filesystem instance that is
being converted.


The patch looks good to me.

Reviewed-by: Qu Wenruo 

Just curious about if such image can pass e2fsck.
And would you please upload a minimal image as btrfs-progs test case?

Thanks,
Qu



Signed-off-by: Chandan Rajendra 
---
 convert/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert/main.c b/convert/main.c
index 4b4cea4..1148a36 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1525,6 +1525,9 @@ static int __ext2_add_one_block(ext2_filsys fs, char 
*bitmap,
offset /= EXT2FS_CLUSTER_RATIO(fs);
offset += group_nr * EXT2_CLUSTERS_PER_GROUP(fs->super);
for (i = 0; i < EXT2_CLUSTERS_PER_GROUP(fs->super); i++) {
+   if ((i + offset) >= ext2fs_blocks_count(fs->super))
+   break;
+
if (ext2fs_test_bit(i, bitmap)) {
u64 start;





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: btrfs-convert: Prevent accounting blocks beyond end of device

2016-12-08 Thread Qu Wenruo

Hi Chandan,

Thanks for the patch.

At 12/08/2016 09:56 PM, Chandan Rajendra wrote:

When looping across data block bitmap, __ext2_add_one_block() may add
blocks which do not exist on the underlying disk. This commit prevents
this from happening by checking the block index against the maximum
block count that was present in the ext4 filesystem instance that is
being converted.


The patch looks good to me.

Reviewed-by: Qu Wenruo 

Just curious about if such image can pass e2fsck.
And would you please upload a minimal image as btrfs-progs test case?

Thanks,
Qu



Signed-off-by: Chandan Rajendra 
---
 convert/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert/main.c b/convert/main.c
index 4b4cea4..1148a36 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1525,6 +1525,9 @@ static int __ext2_add_one_block(ext2_filsys fs, char 
*bitmap,
offset /= EXT2FS_CLUSTER_RATIO(fs);
offset += group_nr * EXT2_CLUSTERS_PER_GROUP(fs->super);
for (i = 0; i < EXT2_CLUSTERS_PER_GROUP(fs->super); i++) {
+   if ((i + offset) >= ext2fs_blocks_count(fs->super))
+   break;
+
if (ext2fs_test_bit(i, bitmap)) {
u64 start;





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Chris Murphy
OK something's wrong.

Kernel 4.8.12 and duperemove v0.11.beta4. Brand new file system
(mkfs.btrfs -dsingle -msingle, default mount options) and two
identical files separately copied.

[chris@f25s]$ ls -li /mnt/test
total 2811904
260 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
259 -rw-r--r--. 1 root root 1439694848 Dec  8 17:26
Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2

[chris@f25s]$ filefrag /mnt/test/*
/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso: 3 extents found
/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2: 2 extents found


[chris@f25s duperemove]$ sudo ./duperemove -dv /mnt/test/*
Using 128K blocks
Using hash: murmur3
Gathering file list...
Using 4 threads for file hashing phase
[1/2] (50.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso
[2/2] (100.00%) csum: /mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2
Total files:  2
Total hashes: 21968
Loading only duplicated hashes from hashfile.
Using 4 threads for dedupe phase
[0xba8400] (1/10947) Try to dedupe extents with id e47862ea
[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
1182924800 (4)
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
1182924800 (5)
[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
[0xba8540] (4/10947) Try to dedupe extents with id ffe4cf64
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
543293440 (4)
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
543293440 (5)
[0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
1182924800 (5)
[0xba8540] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
1182924800 (4)
[0xba8540] Dedupe 1 extents (id: ffe4cf64) with target: (1182924800,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2"
[0xba84a0] (3/10947) Try to dedupe extents with id ffed44f2
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
543293440 (5)
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
543293440 (4)
[0xba84a0] Dedupe 1 extents (id: ffed44f2) with target: (543293440,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2"
[0xba84f0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
101580800 (4)
[0xba84f0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
101580800 (5)
[0xba84f0] Dedupe 1 extents (id: ffeefcdd) with target: (101580800,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
[0xba84a0] (5/10947) Try to dedupe extents with id ffe24eaf
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
171835392 (4)
[0xba84a0] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
171835392 (5)
[0xba84a0] Dedupe 1 extents (id: ffe24eaf) with target: (171835392,
131072), "/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso"
[0xba84f0] (2/10947) Try to dedupe extents with id ffeefcdd
[0xba8540] (6/10947) Try to dedupe extents with id ffe116c8
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso" at offset
52035584 (4)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
52035584 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
52166656 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60030976 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60162048 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60293120 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60424192 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60555264 (5)
[0xba8400] Add extent for file
"/mnt/test/Fedora-Workstation-Live-x86_64-25_Beta-1.1.iso2" at offset
60686336 (5)

[...snip...]

10 minutes later...

[0xba84f0] (06233/10947) Try to dedupe extents with id 703ebf5c
[0xba8400] (06234/10947) Try to dedupe extents with 

Re: [PATCH 1/2] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions

2016-12-08 Thread Qu Wenruo



At 12/08/2016 10:29 PM, Chandan Rajendra wrote:

On Friday, December 02, 2016 10:03:06 AM Qu Wenruo wrote:

Introduce a new parameter, struct extent_changeset for
btrfs_qgroup_reserved_data() and its callers.

Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
which range it reserved in current reserve, so it can free it at error
path.

The reason we need to export it to callers is, at buffered write error
path, without knowing what exactly which range we reserved in current
allocation, we can free space which is not reserved by us.

This will lead to qgroup reserved space underflow.



Hi Qu,

On which git tree are these patches based off? This patch fails to apply
cleanly on kdave/for-next branch that is available as of today.

It's based on David's for-next-20161125 branch, but with several other 
qgroup fixes as prerequisite.
btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered 
write and quota enable

btrfs: qgroup: Return actually freed bytes for qgroup release or free data
btrfs: Add trace point for qgroup reserved space
btrfs: Add WARN_ON for qgroup reserved underflow

Or you can get the branch from my github repo:
https://github.com/adam900710/linux.git qgroup_fixes

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Marc Joliet
On Thursday 08 December 2016 13:41:36 Chris Murphy wrote:
> Pretty sure it will not dedupe extents that are referenced in a read
> only subvolume.

I've used duperemove to de-duplicate files in read-only snapshots (of 
different systems) on my backup drive, so unless you're referencing some 
specific issue, I'm pretty sure you're wrong about that.  Maybe you're 
thinking of the occasionally mentioned old dedup kernel implementation?

-- 
Marc Joliet
--
"People who think they know everything really annoy those of us who know we
don't" - Bjarne Stroustrup


signature.asc
Description: This is a digitally signed message part.


Re: out-of-band dedup status?

2016-12-08 Thread Christoph Anton Mitterer
On Thu, 2016-12-08 at 13:41 -0700, Chris Murphy wrote:
> Pretty sure it will not dedupe extents that are referenced in a read
> only subvolume.

Oh... hm.. well that would be quite some limitation, cause as soon as
one has a snapshot of the full fs (which is probably not so unlikely) i
won't work anymore, cause everything is referenced by the backup ro-
snapshots...

:(


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Austin S. Hemmelgarn

On 2016-12-08 15:07, Jeff Mahoney wrote:

On 12/8/16 10:42 AM, Austin S. Hemmelgarn wrote:

On 2016-12-08 10:11, Swâmi Petaramesh wrote:

Hi, Some real world figures about running duperemove deduplication on
BTRFS :

I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
typically at the same update level, and all of them more of less sharing
the entirety or part of the same set of user files.

For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
for having complete backups at different points in time.

The HD was full to 93% and made a good testbed for deduplicating.

So I ran duperemove on this HD, on a machine doing "only this", using a
hashfile. The machine being an Intel i5 with 6 GB of RAM.

Well, the damn thing has been running for 15 days uninterrupted !
...Until I [Ctrl]-C it this morning as I had to move with the machine (I
wasn't expecting it to last THAT long...).

It took about 48 hours just for calculating the files hashes.

Then it took another 48 hours just for "loading the hashes of duplicate
extents".

Then it took 11 days deduplicating until I killed it.

At the end, the disk that was 93% full is now 76% full, so I saved 17%
of 1 TB (170 GB) by deduplicating for 15 days.

Well the thing "works" and my disk isn't full anymore, so that's a very
partial success, but still l wonder if the gain is worth the effort...

So, some general explanation here:
Duperemove hashes data in blocks of (by default) 128kB, which means for
~930GB, you've got about 7618560 blocks to hash, which partly explains
why it took so long to hash.  Once that's done, it then has to compare
hashes for all combinations of those blocks, which totals to
58042456473600 comparisons (hence that taking a long time).  The block
size thus becomes a trade-off between performance when hashing and
actual space savings (smaller block size makes hashing take longer, but
gives overall slightly better results for deduplication).


IIRC, the core of the duperemove duplicate matcher isn't an O(n^2)
algorithm.  I think Mark used a bloom filter to reduce the data set
prior to matching, but I haven't looked at the code in a while.


You're right, I had completely forgotten about that.

Regardless of that though, it's still a lot of processing that needs done.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Chris Murphy
Pretty sure it will not dedupe extents that are referenced in a read
only subvolume.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: out-of-band dedup status?

2016-12-08 Thread Jeff Mahoney
On 12/8/16 1:36 PM, Christoph Anton Mitterer wrote:
> Hey.
> 
> I just wondered whether out-of-band/"offline" dedup is safe for general
> use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so
> (it tells about unspecified performance issues), but this seems again
> already outdated (kernel 4.7)...
> :-(

SUSE supports it in SLE12 using our 3.12 and 4.4 -based kernels.  There
haven't been a lot of changes to the kernel component of it.  It's
pretty simple: check to see if the ranges are identical between two
files and then reflink between them.

> My intention was to use it with duperemove, but AFAIU, the kernel
> itself will anyway do a byte-by-byte comparison before any
> deduplication, so in principle it should be totally safe regardless of
> the stability of the userland tool, right?
> Especially I wouldn't want that "identity" is only assumed because of
> some checksum identity (or collision ;) ).

Yep.  It does a full check in the kernel for precisely that reason.
It's not even enough to do it in userspace because we don't want dedupe
to be race prone.  It's either atomically identical or it's not, and we
don't dedupe if it's not.  If it changes immediately after the ioctl
returns, that's fine -- the cloned range will be CoW'd properly.

> Also, is there anything to take note of when this is used with
> compression and snapshots?

I don't believe so.  IIRC dedupe maps the file to see if it's already
cloned, so it's safe for snapshots (or could relink extents in a
snapshot that diverged and then were restored to their original
contents.  Dedupe works with the uncompressed data, so compression
shouldn't matter here.  I haven't tested it, though.

> What when I use it with incremental send/receive... i.e. I dedupe the
> "master" and then send/receive this to another btrfs... will it work
> (that is will the copy be also deduplicated, with no longer needed
> extents properly being freed)... or at least not cause any corruptions?

It should.  IIRC send also maps the file (using a different mechanism)
and receive will clone those ranges on the other end.

> Any other things in terms of possible issues, data corruption, etc.
> that one should know when using deduplication?

There shouldn't be.  We haven't had any bug reports at SUSE.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Jeff Mahoney
On 12/8/16 10:42 AM, Austin S. Hemmelgarn wrote:
> On 2016-12-08 10:11, Swâmi Petaramesh wrote:
>> Hi, Some real world figures about running duperemove deduplication on
>> BTRFS :
>>
>> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
>> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
>> typically at the same update level, and all of them more of less sharing
>> the entirety or part of the same set of user files.
>>
>> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
>> for having complete backups at different points in time.
>>
>> The HD was full to 93% and made a good testbed for deduplicating.
>>
>> So I ran duperemove on this HD, on a machine doing "only this", using a
>> hashfile. The machine being an Intel i5 with 6 GB of RAM.
>>
>> Well, the damn thing has been running for 15 days uninterrupted !
>> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
>> wasn't expecting it to last THAT long...).
>>
>> It took about 48 hours just for calculating the files hashes.
>>
>> Then it took another 48 hours just for "loading the hashes of duplicate
>> extents".
>>
>> Then it took 11 days deduplicating until I killed it.
>>
>> At the end, the disk that was 93% full is now 76% full, so I saved 17%
>> of 1 TB (170 GB) by deduplicating for 15 days.
>>
>> Well the thing "works" and my disk isn't full anymore, so that's a very
>> partial success, but still l wonder if the gain is worth the effort...
> So, some general explanation here:
> Duperemove hashes data in blocks of (by default) 128kB, which means for
> ~930GB, you've got about 7618560 blocks to hash, which partly explains
> why it took so long to hash.  Once that's done, it then has to compare
> hashes for all combinations of those blocks, which totals to
> 58042456473600 comparisons (hence that taking a long time).  The block
> size thus becomes a trade-off between performance when hashing and
> actual space savings (smaller block size makes hashing take longer, but
> gives overall slightly better results for deduplication).

IIRC, the core of the duperemove duplicate matcher isn't an O(n^2)
algorithm.  I think Mark used a bloom filter to reduce the data set
prior to matching, but I haven't looked at the code in a while.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Jeff Mahoney
On 12/8/16 10:11 AM, Swâmi Petaramesh wrote:
> Hi, Some real world figures about running duperemove deduplication on
> BTRFS :
> 
> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
> typically at the same update level, and all of them more of less sharing
> the entirety or part of the same set of user files.
> 
> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
> for having complete backups at different points in time.
> 
> The HD was full to 93% and made a good testbed for deduplicating.
> 
> So I ran duperemove on this HD, on a machine doing "only this", using a
> hashfile. The machine being an Intel i5 with 6 GB of RAM.
> 
> Well, the damn thing has been running for 15 days uninterrupted !
> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
> wasn't expecting it to last THAT long...).
> 
> It took about 48 hours just for calculating the files hashes.
> 
> Then it took another 48 hours just for "loading the hashes of duplicate
> extents".
> 
> Then it took 11 days deduplicating until I killed it.
> 
> At the end, the disk that was 93% full is now 76% full, so I saved 17%
> of 1 TB (170 GB) by deduplicating for 15 days.
> 
> Well the thing "works" and my disk isn't full anymore, so that's a very
> partial success, but still l wonder if the gain is worth the effort...

What version were you using?  I know Mark had put a bunch of effort into
reducing the memory footprint and runtime.  The earlier versions were
"can we get this thing working" while the newer versions are more efficient.

What throughput are you getting to that disk?  I get that it's USB3, but
reading 1TB doesn't take a terribly long time so 15 days is pretty
ridiculous.

At any rate, the good news is that when you run it again, assuming you
used the hash file, it will not have to rescan most of your data set.

-Jeff

-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


out-of-band dedup status?

2016-12-08 Thread Christoph Anton Mitterer
Hey.

I just wondered whether out-of-band/"offline" dedup is safe for general
use... https://btrfs.wiki.kernel.org/index.php/Status kinda implies so
(it tells about unspecified performance issues), but this seems again
already outdated (kernel 4.7)...
:-(

My intention was to use it with duperemove, but AFAIU, the kernel
itself will anyway do a byte-by-byte comparison before any
deduplication, so in principle it should be totally safe regardless of
the stability of the userland tool, right?
Especially I wouldn't want that "identity" is only assumed because of
some checksum identity (or collision ;) ).

Also, is there anything to take note of when this is used with
compression and snapshots?

What when I use it with incremental send/receive... i.e. I dedupe the
"master" and then send/receive this to another btrfs... will it work
(that is will the copy be also deduplicated, with no longer needed
extents properly being freed)... or at least not cause any corruptions?

Any other things in terms of possible issues, data corruption, etc.
that one should know when using deduplication?


Thanks :)

Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Timofey Titovets
2016-12-08 18:42 GMT+03:00 Austin S. Hemmelgarn :
> On 2016-12-08 10:11, Swâmi Petaramesh wrote:
>>
>> Hi, Some real world figures about running duperemove deduplication on
>> BTRFS :
>>
>> I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
>> BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
>> typically at the same update level, and all of them more of less sharing
>> the entirety or part of the same set of user files.
>>
>> For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
>> for having complete backups at different points in time.
>>
>> The HD was full to 93% and made a good testbed for deduplicating.
>>
>> So I ran duperemove on this HD, on a machine doing "only this", using a
>> hashfile. The machine being an Intel i5 with 6 GB of RAM.
>>
>> Well, the damn thing has been running for 15 days uninterrupted !
>> ...Until I [Ctrl]-C it this morning as I had to move with the machine (I
>> wasn't expecting it to last THAT long...).
>>
>> It took about 48 hours just for calculating the files hashes.
>>
>> Then it took another 48 hours just for "loading the hashes of duplicate
>> extents".
>>
>> Then it took 11 days deduplicating until I killed it.
>>
>> At the end, the disk that was 93% full is now 76% full, so I saved 17%
>> of 1 TB (170 GB) by deduplicating for 15 days.
>>
>> Well the thing "works" and my disk isn't full anymore, so that's a very
>> partial success, but still l wonder if the gain is worth the effort...
>
> So, some general explanation here:
> Duperemove hashes data in blocks of (by default) 128kB, which means for
> ~930GB, you've got about 7618560 blocks to hash, which partly explains why
> it took so long to hash.  Once that's done, it then has to compare hashes
> for all combinations of those blocks, which totals to 58042456473600
> comparisons (hence that taking a long time).  The block size thus becomes a
> trade-off between performance when hashing and actual space savings (smaller
> block size makes hashing take longer, but gives overall slightly better
> results for deduplication).
>
> As far as the rest, given your hashing performance (which is not
> particularly good I might add, roughly 5.6MB/s), the amount of time it was
> taking to do the actual deduplication is reasonable since the deduplication
> ioctl does a byte-wise comparison of the extents to be deduplicated prior to
> actually ref-linking them to ensure you don't lose data.
>
> Because of this, generic batch deduplication is not all that great on BTRFS.
> There are cases where it can work, but usually they're pretty specific
> cases.  In most cases though, you're better off doing a custom tool that
> knows about how your data is laid out and what's likely to be duplicated
> (I've actually got two tools for this for the two cases where I use
> deduplication, they use knowledge of the data-set itself to figure out
> what's duplicated, then just call the ioctl through a wrapper (previously
> the one included in duperemove, currently xfs_io)).
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zygo do the good job on this too.
Try:
https://github.com/Zygo/bees

It's cool and can work better on large massive of data, because it
dedup in the same time with scanning phase.
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH v2] btrfs-progs: add dev stats returncode option

2016-12-08 Thread Austin S. Hemmelgarn

On 2016-12-08 12:20, David Sterba wrote:

On Mon, Dec 05, 2016 at 01:35:20PM -0500, Austin S. Hemmelgarn wrote:

Currently, `btrfs device stats` returns non-zero only when there was an
error getting the counter values.  This is fine for when it gets run by a
user directly, but is a serious pain when trying to use it in a script or
for monitoring since you need to parse the (not at all machine friendly)
output to check the counter values.

This patch adds an option ('-s') which causes `btrfs device stats`
to set bit 6 in the return code if any of the counters are non-zero.
This greatly simplifies checking from a script or monitoring software if
any errors have been recorded.  In the event that this switch is passed
and an error occurs reading the stats, the return code will have bit
0 set (so if there are errors reading counters, and the counters which
were read were non-zero, the return value will be 65).


So a typical check in a script would look for either 64 or 65 returned
from the command, I don't think we can do it simpler. The option naming
is a bit confusing to me, as it duplicates the 'stats' from the command
itself. I'd suggest to use '--check' instead, does it sound OK to you?

I'll apply the patch as-is for now (and maybe do some cleanups in the
surrounding code).

Yeah, --check is fine.  Like I said, I'm not too picky about the name as 
long as it works.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND][PATCH v2] btrfs-progs: add dev stats returncode option

2016-12-08 Thread David Sterba
On Mon, Dec 05, 2016 at 01:35:20PM -0500, Austin S. Hemmelgarn wrote:
> Currently, `btrfs device stats` returns non-zero only when there was an
> error getting the counter values.  This is fine for when it gets run by a
> user directly, but is a serious pain when trying to use it in a script or
> for monitoring since you need to parse the (not at all machine friendly)
> output to check the counter values.
> 
> This patch adds an option ('-s') which causes `btrfs device stats`
> to set bit 6 in the return code if any of the counters are non-zero.
> This greatly simplifies checking from a script or monitoring software if
> any errors have been recorded.  In the event that this switch is passed
> and an error occurs reading the stats, the return code will have bit
> 0 set (so if there are errors reading counters, and the counters which
> were read were non-zero, the return value will be 65).

So a typical check in a script would look for either 64 or 65 returned
from the command, I don't think we can do it simpler. The option naming
is a bit confusing to me, as it duplicates the 'stats' from the command
itself. I'd suggest to use '--check' instead, does it sound OK to you?

I'll apply the patch as-is for now (and maybe do some cleanups in the
surrounding code).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] btrfs dax IO

2016-12-08 Thread Liu Bo
On Thu, Dec 08, 2016 at 07:48:18AM +0100, Xin Zhou wrote:
> Hi Liu,
>  
> From the patch, is the snapshot disabled by disabling the COW in the mounting 
> path?
> It seems the create_snapshot() in ioctl.c does not get changed.

Well, I think I made a mistake in this cover letter, snapshot still
works while mounting with dax, but if a snapshot is taken, then we'll
get -EIO while writing to dax inodes that belong to either snapshot tree
or its source tree.  So in fact, only a readonly snapshot makes sense in
practice, I'll update the patch so that we only allow a readonly
snapshot to be taken.

COW is disabled by letting the dax mount option imply the "nodatacow"
option.

Thanks for spotting this!

Thanks,

-liubo

> 
> I experienced some similar system but am a bit new to the brtfs code.
>   
> Thanks, 
> Xin
>  
>  
> 
> Subject: [PATCH 0/6] btrfs dax IOFrom: Liu Bo Date: 
> Wed, 7 Dec 2016 13:45:04 -0800Cc: Chris Mason , Jan Kara 
> , David Sterba 
> This is a prelimanary patch set to add dax support for btrfs, with
> this we can do normal read/write to dax files and can mmap dax files
> to userspace so that applications have the ability to access
> persistent memory directly.
> 
> Please note that currently this is limited to nocow, i.e. all dax
> inodes do not have COW behaviour.
> 
> COW:  no
> mutliple device:  no
> clone/reflink:no
> snapshot: no
> compression:  no
> checksum: no
> 
> Right now snapshot is disabled while mounting with -odax, but snapshot
> can be created without -odax, and writing to a dax file in snapshot
> will get -EIO.
> 
> Clone/reflink is dealt with as same as snapshot, -EIO will be returned
> when writing to shared extents.
> 
> This has adopted the latest iomap framework for dax read/write
> and dax mmap.
> 
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] Btrfs: add DAX support for nocow btrfs

2016-12-08 Thread Liu Bo
On Thu, Dec 08, 2016 at 11:47:41AM +0100, Jan Kara wrote:
> On Wed 07-12-16 17:15:42, Chris Mason wrote:
> > On 12/07/2016 04:45 PM, Liu Bo wrote:
> > >This has implemented DAX support for btrfs with nocow and single-device.
> > >
> > >DAX is developed for block devices that are memory-like in order to avoid
> > >double buffer in both page cache and the storage, so DAX can performs 
> > >reads and
> > >writes directly to the storage device, and for those who prefer to using
> > >filesystem, filesystem dax support can help to map the storage into 
> > >userspace
> > >for file-mapping.
> > >
> > >Since I haven't figure out how to map multiple devices to userspace without
> > >pagecache, this DAX support is only for single-device, and I don't think
> > >DAX(Direct Access) can work with cow, this is limited to nocow case.  I 
> > >made
> > >this by setting nodatacow in dax mount option.
> > 
> > Interesting, this is a nice small start.  It might make more sense to limit
> > snapshots to readonly in DAX mode until we can figure out how to cow
> > properly.  I think it can be done, I just need to sit down with the dax code
> > to do a good review.
> > 
> > But bigger picture, if we can't cow and we can't crc and we can't
> > multi-device, I'd rather let XFS/ext4 sort out the dax space until we pull
> > in more of the btrfs features too.
> 
> So normal DAX IO (via read(2) and write(2)) is very similar to direct IO so
> I don't think there would be any obstacle to support all the features with
> that.

For DAX IO via read(2)/write(2), cow is OK while the mutliple devices is
a problem as currently iomap_dax_actor only takes one 
pair:

- raid 0, one device is written once a time
- raid 1/10 and others, 2 or more devices need to be written each time

> For mmap(2) things get more difficult but still: The filesystem gets
> normal ->fault notifications when the page is first faulted in. So you
> can COW if you need to at that moment.

Right.

> Also DAX PTEs can be write-protected (well, as of the coming merge
> window) as normal PTEs and then you'll get ->pfn_mkwrite /
> ->page_mkwrite notification when someone tries to write via mmap and
> you can do your stuff at that point.

That's right, but I think the problem comes from the fact that only
->fault with FAULT_FLAG_WRITE gets to space allocation where we could
cow to new location.

For page_mkwrite, btrfs does cow while writing back a dirty page, but
dax doesn't do delayed allocation so dax_writeback_one doesn't have
place to do cow.

Also thank you for the great write-protected patch, since another reason
I decided to disable cow is that there is no write-protected on DAX
PTEs, so without that even if we can do cow, we don't have a way to
update every pte pointing to our cow'd dax pfn.

> So DAX mappings are not that
> different from filesystem point of view. There are some differences wrt.
> locking (you don't have page lock, but you use a lock bit in radix tree
> entry instead for that) but that's about it. So I don't see a principial
> reason why we cannot support all btrfs features for DAX... But if you see
> some problem, let me know and we can talk if we could somehow help from the
> DAX side.

Yeah, looks like we have two problems at least, one is dax_writeback_one
and the other is iomap.

> 
> BTW, I also don't see how the multiple devices are a problem. Actually XFS
> supports that (with its real-time devices) just fine - your ->iomap_begin()
> returns a  pair and that should be all that's needed,
> no?

xfs is a bit different, it only writes to one device at a time, sort of
a raid0.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Austin S. Hemmelgarn

On 2016-12-08 10:11, Swâmi Petaramesh wrote:

Hi, Some real world figures about running duperemove deduplication on
BTRFS :

I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
typically at the same update level, and all of them more of less sharing
the entirety or part of the same set of user files.

For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
for having complete backups at different points in time.

The HD was full to 93% and made a good testbed for deduplicating.

So I ran duperemove on this HD, on a machine doing "only this", using a
hashfile. The machine being an Intel i5 with 6 GB of RAM.

Well, the damn thing has been running for 15 days uninterrupted !
...Until I [Ctrl]-C it this morning as I had to move with the machine (I
wasn't expecting it to last THAT long...).

It took about 48 hours just for calculating the files hashes.

Then it took another 48 hours just for "loading the hashes of duplicate
extents".

Then it took 11 days deduplicating until I killed it.

At the end, the disk that was 93% full is now 76% full, so I saved 17%
of 1 TB (170 GB) by deduplicating for 15 days.

Well the thing "works" and my disk isn't full anymore, so that's a very
partial success, but still l wonder if the gain is worth the effort...

So, some general explanation here:
Duperemove hashes data in blocks of (by default) 128kB, which means for 
~930GB, you've got about 7618560 blocks to hash, which partly explains 
why it took so long to hash.  Once that's done, it then has to compare 
hashes for all combinations of those blocks, which totals to 
58042456473600 comparisons (hence that taking a long time).  The block 
size thus becomes a trade-off between performance when hashing and 
actual space savings (smaller block size makes hashing take longer, but 
gives overall slightly better results for deduplication).


As far as the rest, given your hashing performance (which is not 
particularly good I might add, roughly 5.6MB/s), the amount of time it 
was taking to do the actual deduplication is reasonable since the 
deduplication ioctl does a byte-wise comparison of the extents to be 
deduplicated prior to actually ref-linking them to ensure you don't lose 
data.


Because of this, generic batch deduplication is not all that great on 
BTRFS.  There are cases where it can work, but usually they're pretty 
specific cases.  In most cases though, you're better off doing a custom 
tool that knows about how your data is laid out and what's likely to be 
duplicated (I've actually got two tools for this for the two cases where 
I use deduplication, they use knowledge of the data-set itself to figure 
out what's duplicated, then just call the ioctl through a wrapper 
(previously the one included in duperemove, currently xfs_io)).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LSF/MM 2017: Call for Proposals

2016-12-08 Thread Michal Hocko
On Thu 08-12-16 07:30:43, James Bottomley wrote:
> On Thu, 2016-12-08 at 13:26 +0100, Michal Hocko wrote:
> > On Wed 07-12-16 06:57:06, James Bottomley wrote:
> > [...]
> > > Just on this point, since there seems to be a lot of confusion: lsf
> > > -pc
> > > is the list for contacting the programme committee, so you cannot
> > > subscribe to it.
> > > 
> > > There is no -discuss equivalent, like kernel summit has, because we
> > > expect you to cc the relevant existing mailing list and have the
> > > discussion there instead rather than expecting people to subscribe
> > > to a
> > > new list.
> > 
> > There used to be l...@lists.linux-foundation.org. Is it not active
> > anymore?
> 
> Not until we get the list of invitees sorted out.  And then it's only
> for stuff actually to do with lsf (like logistics, bof or session
> requests or other meetups).  Any technical stuff should still be cc'd
> to the relevant mailing list.

OK, I tend to forget about that. Thanks for the clarification!
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LSF/MM 2017: Call for Proposals

2016-12-08 Thread James Bottomley
On Thu, 2016-12-08 at 13:26 +0100, Michal Hocko wrote:
> On Wed 07-12-16 06:57:06, James Bottomley wrote:
> [...]
> > Just on this point, since there seems to be a lot of confusion: lsf
> > -pc
> > is the list for contacting the programme committee, so you cannot
> > subscribe to it.
> > 
> > There is no -discuss equivalent, like kernel summit has, because we
> > expect you to cc the relevant existing mailing list and have the
> > discussion there instead rather than expecting people to subscribe
> > to a
> > new list.
> 
> There used to be l...@lists.linux-foundation.org. Is it not active
> anymore?

Not until we get the list of invitees sorted out.  And then it's only
for stuff actually to do with lsf (like logistics, bof or session
requests or other meetups).  Any technical stuff should still be cc'd
to the relevant mailing list.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: mkfs, balance convert: warn about RAID5/6 in fiery letters

2016-12-08 Thread Adam Borowski
On Wed, Nov 30, 2016 at 04:12:46PM +0100, David Sterba wrote:
> On Mon, Nov 28, 2016 at 07:51:53PM +0100, Adam Borowski wrote:
> > People who don't frequent IRC nor the mailing list tend to believe RAID 5/6
> > are stable; this leads to data loss.  Thus, let's do warn them.
> > 
> > At this point, I think fiery letters that won't be missed are warranted.
> > 
> > Kernel 4.9 and its -progs will be a part of LTS of multiple distributions,
> > so leaving experimental features without a warning is inappropriate.
> 
> I'm ok with adding the warning about raid56 feature, but I have some
> comments to how it's implemented.
> 
> Special case warning for the raid56 is ok, as it corresponds to the
> 'mkfs_features' table where the missing value for 'safe' should lead to
> a similar warning. This is planned to be more generic, so I just want to
> make sure we can adjust it later without problems.

There are at least two tools that can make a raid56: mkfs and balance
convert.  mkfs_features is used for the former but not for the latter.  It
doesn't appear to be machine-parseable as well (but that's a matter of
editing the #define).

Of course, making this generic is doable, I just don't know how do you plan
to do that.

Should I try to use mkfs_features somehow, or do it as a variant of the
current patch for now?

> The warning should go last, after the final summary (and respect verbosity
> level).

and for balance convert, before the output.

> The colors seem a bit too much to me, red text or just emphasize
> 'warning' would IMHO suffice.

(in the paragraph before:)
> If the message were not colored, I'd completely miss the warning.

... which means the color works!  It _is_ intended to be offensive, so
people can't possibly miss it.

XFS uses something far more tame: black text on red background (\e[0;30;41m)
but I want something that conveys the message appropriately.

Obviously I'm not married to a particular color for this bikeshed, I just
want a color that makes the neighbours complain :p


English lacks a proper word for this, something akin to
https://en.wiktionary.org/wiki/oczojebny

-- 
u-boot problems can be solved with the help of your old SCSI manuals, the
parts that deal with goat termination.  You need a black-handled knife, and
an appropriate set of candles (number and color matters).  Or was it a
silver-handled knife?  Crap, need to look that up.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] Btrfs: set single device limit for dax usecase

2016-12-08 Thread Liu Bo
On Thu, Dec 08, 2016 at 02:35:59PM +0100, David Sterba wrote:
> On Wed, Dec 07, 2016 at 01:45:06PM -0800, Liu Bo wrote:
> > Dax on btrfs is not ready for multiple device.
> 
> How about DUP? Technically it's not multi-device but still stores
> multiple copies, so I don't know if the implementation is ok with that.

Good question.

For this patch set, meta DUP is OK while data DUP is not, the main
obstacle is that the actual copy-to-device operation happens in iomap
code (iomap_dax_actor), not inside filesystem itself, and it only takes
one  pair to copy to device.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Initialize ret to suppress compiler warning

2016-12-08 Thread David Sterba
On Thu, Dec 01, 2016 at 11:28:20AM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues 

The path that leaves ret unintialized goes through the second if block
and requires dback->found_ref to be 0. Quick search leads to several
places where it's set according to found items so we won't reach the for
loop with found_ref 0.

Cleaning up such compiler warnings is desired, but the fix description
should also say why it is ok to just set the value. Patch applied with
updated description.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


duperemove : some real world figures on BTRFS deduplication

2016-12-08 Thread Swâmi Petaramesh
Hi, Some real world figures about running duperemove deduplication on
BTRFS :

I have an external 2,5", 5400 RPM, 1 TB HD, USB3, on which I store the
BTRFS backups (full rsync) of 5 PCs, using 2 different distros,
typically at the same update level, and all of them more of less sharing
the entirety or part of the same set of user files.

For each of these PCs I keep a series of 4-5 BTRFS subvolume snapshots
for having complete backups at different points in time.

The HD was full to 93% and made a good testbed for deduplicating.

So I ran duperemove on this HD, on a machine doing "only this", using a
hashfile. The machine being an Intel i5 with 6 GB of RAM.

Well, the damn thing has been running for 15 days uninterrupted !
...Until I [Ctrl]-C it this morning as I had to move with the machine (I
wasn't expecting it to last THAT long...).

It took about 48 hours just for calculating the files hashes.

Then it took another 48 hours just for "loading the hashes of duplicate
extents".

Then it took 11 days deduplicating until I killed it.

At the end, the disk that was 93% full is now 76% full, so I saved 17%
of 1 TB (170 GB) by deduplicating for 15 days.

Well the thing "works" and my disk isn't full anymore, so that's a very
partial success, but still l wonder if the gain is worth the effort...

Best regards.

ॐ

-- 
Swâmi Petaramesh  PGP 9076E32E

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Fix extents after finding all errors

2016-12-08 Thread David Sterba
On Thu, Dec 01, 2016 at 11:28:08AM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues 
> 
> Simplifying the logic of fixing.
> 
> Calling fixup_extent_ref() after encountering every error causes
> more error messages after the extent is fixed. In case of multiple errors,
> this is confusing because the error message is displayed after the fix
> message and it works on stale data. It is best to show all errors and
> then fix the extents.
> 
> Set a variable and call fixup_extent_ref() if it is set. err is not used,
> so cleared it.
> 
> Changes since v1:
>  + assign fix from ret for a correct repair_abort code path
> 
> Signed-off-by: Goldwyn Rodrigues 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: man mkfs: warn about RAID5/6 being experimental

2016-12-08 Thread David Sterba
On Mon, Nov 28, 2016 at 07:52:43PM +0100, Adam Borowski wrote:
> Signed-off-by: Adam Borowski 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: qgroup: Introduce extent changeset for qgroup reserve functions

2016-12-08 Thread Chandan Rajendra
On Friday, December 02, 2016 10:03:06 AM Qu Wenruo wrote:
> Introduce a new parameter, struct extent_changeset for
> btrfs_qgroup_reserved_data() and its callers.
> 
> Such extent_changeset was used in btrfs_qgroup_reserve_data() to record
> which range it reserved in current reserve, so it can free it at error
> path.
> 
> The reason we need to export it to callers is, at buffered write error
> path, without knowing what exactly which range we reserved in current
> allocation, we can free space which is not reserved by us.
> 
> This will lead to qgroup reserved space underflow.
>

Hi Qu,

On which git tree are these patches based off? This patch fails to apply
cleanly on kdave/for-next branch that is available as of today.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-convert: Fix migrate_super_block() to work with 64k sectorsize

2016-12-08 Thread Chandan Rajendra
migrate_super_block() uses sectorsize to refer to the size of the
superblock. Hence on 64k sectorsize filesystems, it ends up computing
checksum beyond the super block length (i.e.
BTRFS_SUPER_INFO_SIZE). This commit fixes the bug by using
BTRFS_SUPER_INFO_SIZE instead of sectorsize of the underlying
filesystem.

Signed-off-by: Chandan Rajendra 
---
 convert/main.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/convert/main.c b/convert/main.c
index 1148a36..fd6f77b 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1360,7 +1360,7 @@ err:
 /*
  * Migrate super block to its default position and zero 0 ~ 16k
  */
-static int migrate_super_block(int fd, u64 old_bytenr, u32 sectorsize)
+static int migrate_super_block(int fd, u64 old_bytenr)
 {
int ret;
struct extent_buffer *buf;
@@ -1368,13 +1368,13 @@ static int migrate_super_block(int fd, u64 old_bytenr, 
u32 sectorsize)
u32 len;
u32 bytenr;
 
-   buf = malloc(sizeof(*buf) + sectorsize);
+   buf = malloc(sizeof(*buf) + BTRFS_SUPER_INFO_SIZE);
if (!buf)
return -ENOMEM;
 
-   buf->len = sectorsize;
-   ret = pread(fd, buf->data, sectorsize, old_bytenr);
-   if (ret != sectorsize)
+   buf->len = BTRFS_SUPER_INFO_SIZE;
+   ret = pread(fd, buf->data, BTRFS_SUPER_INFO_SIZE, old_bytenr);
+   if (ret != BTRFS_SUPER_INFO_SIZE)
goto fail;
 
super = (struct btrfs_super_block *)buf->data;
@@ -1382,19 +1382,20 @@ static int migrate_super_block(int fd, u64 old_bytenr, 
u32 sectorsize)
btrfs_set_super_bytenr(super, BTRFS_SUPER_INFO_OFFSET);
 
csum_tree_block_size(buf, BTRFS_CRC32_SIZE, 0);
-   ret = pwrite(fd, buf->data, sectorsize, BTRFS_SUPER_INFO_OFFSET);
-   if (ret != sectorsize)
+   ret = pwrite(fd, buf->data, BTRFS_SUPER_INFO_SIZE,
+   BTRFS_SUPER_INFO_OFFSET);
+   if (ret != BTRFS_SUPER_INFO_SIZE)
goto fail;
 
ret = fsync(fd);
if (ret)
goto fail;
 
-   memset(buf->data, 0, sectorsize);
+   memset(buf->data, 0, BTRFS_SUPER_INFO_SIZE);
for (bytenr = 0; bytenr < BTRFS_SUPER_INFO_OFFSET; ) {
len = BTRFS_SUPER_INFO_OFFSET - bytenr;
-   if (len > sectorsize)
-   len = sectorsize;
+   if (len > BTRFS_SUPER_INFO_SIZE)
+   len = BTRFS_SUPER_INFO_SIZE;
ret = pwrite(fd, buf->data, len, bytenr);
if (ret != len) {
fprintf(stderr, "unable to zero fill device\n");
@@ -2519,7 +2520,7 @@ static int do_convert(const char *devname, int datacsum, 
int packing,
 * If this step succeed, we get a mountable btrfs. Otherwise
 * the source fs is left unchanged.
 */
-   ret = migrate_super_block(fd, mkfs_cfg.super_bytenr, blocksize);
+   ret = migrate_super_block(fd, mkfs_cfg.super_bytenr);
if (ret) {
error("unable to migrate super block: %d", ret);
goto fail;
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs-progs: btrfs-convert: Prevent accounting blocks beyond end of device

2016-12-08 Thread Chandan Rajendra
When looping across data block bitmap, __ext2_add_one_block() may add
blocks which do not exist on the underlying disk. This commit prevents
this from happening by checking the block index against the maximum
block count that was present in the ext4 filesystem instance that is
being converted.

Signed-off-by: Chandan Rajendra 
---
 convert/main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/convert/main.c b/convert/main.c
index 4b4cea4..1148a36 100644
--- a/convert/main.c
+++ b/convert/main.c
@@ -1525,6 +1525,9 @@ static int __ext2_add_one_block(ext2_filsys fs, char 
*bitmap,
offset /= EXT2FS_CLUSTER_RATIO(fs);
offset += group_nr * EXT2_CLUSTERS_PER_GROUP(fs->super);
for (i = 0; i < EXT2_CLUSTERS_PER_GROUP(fs->super); i++) {
+   if ((i + offset) >= ext2fs_blocks_count(fs->super))
+   break;
+
if (ext2fs_test_bit(i, bitmap)) {
u64 start;
 
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] Btrfs: set single device limit for dax usecase

2016-12-08 Thread David Sterba
On Wed, Dec 07, 2016 at 01:45:06PM -0800, Liu Bo wrote:
> Dax on btrfs is not ready for multiple device.

How about DUP? Technically it's not multi-device but still stores
multiple copies, so I don't know if the implementation is ok with that.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PULL] Btrfs cleanups for 4.10

2016-12-08 Thread David Sterba
Hi,

now that the regular patches are on the way to 4.10, it's a good time to
squeeze the cleanups before we start merging further patches. This pull request
contains the series from Jeff, the code changes are all over the map. I think
we won't find a better time to merge that so be it now.

The branch is based on my previous pull request head, and with a signed tag.

The following changes since commit 515bdc479097ec9d5f389202842345af3162f71c:

  Merge branch 'misc-4.10' into for-chris-4.10-20161130 (2016-11-30 14:02:20 
+0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git 
for-chris-4.10-20161206

for you to fetch changes up to 34441361c4f52a5f6e41d8de8e5debbeb415dbf0:

  btrfs: opencode chunk locking, remove helpers (2016-12-06 16:07:00 +0100)


David Sterba (1):
  btrfs: opencode chunk locking, remove helpers

Jeff Mahoney (19):
  btrfs: call functions that overwrite their root parameter with fs_info
  btrfs: call functions that always use the same root with fs_info instead
  btrfs: btrfs_init_new_device should use fs_info->dev_root
  btrfs: alloc_reserved_file_extent trace point should use extent_root
  btrfs: struct btrfsic_state->root should be an fs_info
  btrfs: struct reada_control.root -> reada_control.fs_info
  btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere
  btrfs: root->fs_info cleanup, io_ctl_init
  btrfs: pull node/sector/stripe sizes out of root and into fs_info
  btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size
  btrfs: root->fs_info cleanup, lock/unlock_chunks
  btrfs: root->fs_info cleanup, update_block_group{,flags}
  btrfs: root->fs_info cleanup, add fs_info convenience variables
  btrfs: root->fs_info cleanup, access fs_info->delayed_root directly
  btrfs: convert extent-tree tracepoints to use fs_info
  btrfs: simplify btrfs_wait_cache_io prototype
  btrfs: take an fs_info directly when the root is not used otherwise
  btrfs: split btrfs_wait_marked_extents into normal and tree log functions
  btrfs: remove root parameter from transaction commit/end routines

 fs/btrfs/backref.c |   10 +-
 fs/btrfs/check-integrity.c |   73 +-
 fs/btrfs/check-integrity.h |5 +-
 fs/btrfs/compression.c |   54 +-
 fs/btrfs/ctree.c   |  468 ++--
 fs/btrfs/ctree.h   |  216 +++---
 fs/btrfs/delayed-inode.c   |  138 ++--
 fs/btrfs/delayed-inode.h   |   19 +-
 fs/btrfs/dev-replace.c |   68 +-
 fs/btrfs/dev-replace.h |4 +-
 fs/btrfs/dir-item.c|   45 +-
 fs/btrfs/disk-io.c |  546 +++---
 fs/btrfs/disk-io.h |   30 +-
 fs/btrfs/export.c  |   10 +-
 fs/btrfs/extent-tree.c | 1300 
 fs/btrfs/extent_io.c   |   63 +-
 fs/btrfs/extent_io.h   |8 +-
 fs/btrfs/file-item.c   |  152 ++--
 fs/btrfs/file.c|  214 +++---
 fs/btrfs/free-space-cache.c|  154 ++--
 fs/btrfs/free-space-cache.h|   12 +-
 fs/btrfs/free-space-tree.c |   44 +-
 fs/btrfs/inode-item.c  |   11 +-
 fs/btrfs/inode-map.c   |   22 +-
 fs/btrfs/inode.c   |  765 ++-
 fs/btrfs/ioctl.c   |  577 +++---
 fs/btrfs/ordered-data.c|   38 +-
 fs/btrfs/ordered-data.h|4 +-
 fs/btrfs/print-tree.c  |   19 +-
 fs/btrfs/print-tree.h  |4 +-
 fs/btrfs/props.c   |5 +-
 fs/btrfs/qgroup.c  |   64 +-
 fs/btrfs/qgroup.h  |2 +-
 fs/btrfs/raid56.c  |   62 +-
 fs/btrfs/raid56.h  |8 +-
 fs/btrfs/reada.c   |   34 +-
 fs/btrfs/relocation.c  |  257 ---
 fs/btrfs/root-tree.c   |   28 +-
 fs/btrfs/scrub.c   |  166 ++--
 fs/btrfs/send.c|   33 +-
 fs/btrfs/super.c   |  134 ++--
 fs/btrfs/tests/btrfs-tests.c   |   13 +-
 fs/btrfs/tests/btrfs-tests.h   |4 +-
 fs/btrfs/tests/extent-buffer-tests.c   |7 +-
 fs/btrfs/tests/extent-io-tests.c   |5 +-
 fs/btrfs/tests/free-space-tests.c  |   18 +-
 fs/btrfs/tests/free-space-tree-tests.c |9 +-
 fs/btrfs/tests/inode-tests.c   |   16 +-
 fs/btrfs/tests/qgroup-tests.c  |   11 +-
 fs/btrfs/transaction.c |  615 +++
 fs/btrfs/transaction.h |   29 +-
 fs/btrfs/tree-log.c|  195 ++---
 fs/btrfs/uuid-tree.c 

Re: LSF/MM 2017: Call for Proposals

2016-12-08 Thread Michal Hocko
On Wed 07-12-16 06:57:06, James Bottomley wrote:
[...]
> Just on this point, since there seems to be a lot of confusion: lsf-pc
> is the list for contacting the programme committee, so you cannot
> subscribe to it.
> 
> There is no -discuss equivalent, like kernel summit has, because we
> expect you to cc the relevant existing mailing list and have the
> discussion there instead rather than expecting people to subscribe to a
> new list.

There used to be l...@lists.linux-foundation.org. Is it not active
anymore?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Lsf-pc] LSF/MM 2017: Call for Proposals

2016-12-08 Thread Jan Kara
On Thu 08-12-16 13:26:19, Michal Hocko wrote:
> On Wed 07-12-16 06:57:06, James Bottomley wrote:
> [...]
> > Just on this point, since there seems to be a lot of confusion: lsf-pc
> > is the list for contacting the programme committee, so you cannot
> > subscribe to it.
> > 
> > There is no -discuss equivalent, like kernel summit has, because we
> > expect you to cc the relevant existing mailing list and have the
> > discussion there instead rather than expecting people to subscribe to a
> > new list.
> 
> There used to be l...@lists.linux-foundation.org. Is it not active
> anymore?

No sure about the current state but usually it gets populated by the
selected attendees each year.

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] Btrfs: add DAX support for nocow btrfs

2016-12-08 Thread Jan Kara
On Wed 07-12-16 17:15:42, Chris Mason wrote:
> On 12/07/2016 04:45 PM, Liu Bo wrote:
> >This has implemented DAX support for btrfs with nocow and single-device.
> >
> >DAX is developed for block devices that are memory-like in order to avoid
> >double buffer in both page cache and the storage, so DAX can performs reads 
> >and
> >writes directly to the storage device, and for those who prefer to using
> >filesystem, filesystem dax support can help to map the storage into userspace
> >for file-mapping.
> >
> >Since I haven't figure out how to map multiple devices to userspace without
> >pagecache, this DAX support is only for single-device, and I don't think
> >DAX(Direct Access) can work with cow, this is limited to nocow case.  I made
> >this by setting nodatacow in dax mount option.
> 
> Interesting, this is a nice small start.  It might make more sense to limit
> snapshots to readonly in DAX mode until we can figure out how to cow
> properly.  I think it can be done, I just need to sit down with the dax code
> to do a good review.
> 
> But bigger picture, if we can't cow and we can't crc and we can't
> multi-device, I'd rather let XFS/ext4 sort out the dax space until we pull
> in more of the btrfs features too.

So normal DAX IO (via read(2) and write(2)) is very similar to direct IO so
I don't think there would be any obstacle to support all the features with
that. For mmap(2) things get more difficult but still: The filesystem gets
normal ->fault notifications when the page is first faulted in. So you can
COW if you need to at that moment. Also DAX PTEs can be write-protected
(well, as of the coming merge window) as normal PTEs and then you'll get
->pfn_mkwrite / ->page_mkwrite notification when someone tries to write via
mmap and you can do your stuff at that point. So DAX mappings are not that
different from filesystem point of view. There are some differences wrt.
locking (you don't have page lock, but you use a lock bit in radix tree
entry instead for that) but that's about it. So I don't see a principial
reason why we cannot support all btrfs features for DAX... But if you see
some problem, let me know and we can talk if we could somehow help from the
DAX side.

BTW, I also don't see how the multiple devices are a problem. Actually XFS
supports that (with its real-time devices) just fine - your ->iomap_begin()
returns a  pair and that should be all that's needed,
no?

Honza

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] fstests: btrfs: Use _require_btrfs_qgroup_report to replace open code

2016-12-08 Thread Eryu Guan
On Thu, Dec 08, 2016 at 05:20:49PM +0800, Qu Wenruo wrote:
> 
> 
> At 12/08/2016 04:47 PM, Eryu Guan wrote:
> > On Thu, Dec 08, 2016 at 04:12:13PM +0800, Qu Wenruo wrote:
> > > Introduce new _require_btrfs_qgroup_report function, which will check
> > > the accessibility to "btrfs check --qgroup-report", then set a global
> > > flag to info _check_scratch_fs() to do extra qgroup check.
> > > 
> > > Signed-off-by: Qu Wenruo 
> > > ---
> > > v2:
> > >   Use "${RESULT_DIR}/require_scratch.require_qgroup_report" instead of
> > >   global variant
> > >   Rebased to latest master
> > >   Replace btrfsck with $BTRFS_UTIL_PROG check.
> > [snip]
> > > diff --git a/tests/btrfs/042 b/tests/btrfs/042
> > > index 498ccc9..dc9b762 100755
> > > --- a/tests/btrfs/042
> > > +++ b/tests/btrfs/042
> > > @@ -43,6 +43,7 @@ _cleanup()
> > >  _supported_fs btrfs
> > >  _supported_os Linux
> > >  _require_scratch
> > > +_require_btrfs_qgroup_report
> > > 
> > >  rm -f $seqres.full
> > > 
> > > @@ -84,10 +85,7 @@ for i in `seq 10 -1 1`; do
> > >   total_written=$(($total_written+$filesize))
> > >  done
> > > 
> > > -#check if total written exceeds limit
> > > -if [ $total_written -gt $LIMIT_SIZE ];then
> > > - _fail "total written should be less than $LIMIT_SIZE"
> > > -fi
> > > +# qgroup will be checked automatically at _check_scratch_fs() by fstest
> > 
> > This doesn't look like an equivalent replacement, and btrfs/042 fails
> > for me after this update (wrong qgroup numbers) on 4.9-rc4 kernel. Is
> > this change intentional?
> > 
> > Thanks,
> > Eryu
> > 
> > 
> Sorry, just a typo.For 042 only the _require_btrfs_qgroup_report line is
> need.
> 
> I also ran the test with v4.9-rc6 with some new btrfs patches (David's
> for-next branch), it ran without problem.
> 
> So the error seems to be a fixed btrfs qgroup problem.

Ok, good to know. Can you please send an updated patch?

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] fstests: btrfs: Use _require_btrfs_qgroup_report to replace open code

2016-12-08 Thread Qu Wenruo



At 12/08/2016 04:47 PM, Eryu Guan wrote:

On Thu, Dec 08, 2016 at 04:12:13PM +0800, Qu Wenruo wrote:

Introduce new _require_btrfs_qgroup_report function, which will check
the accessibility to "btrfs check --qgroup-report", then set a global
flag to info _check_scratch_fs() to do extra qgroup check.

Signed-off-by: Qu Wenruo 
---
v2:
  Use "${RESULT_DIR}/require_scratch.require_qgroup_report" instead of
  global variant
  Rebased to latest master
  Replace btrfsck with $BTRFS_UTIL_PROG check.

[snip]

diff --git a/tests/btrfs/042 b/tests/btrfs/042
index 498ccc9..dc9b762 100755
--- a/tests/btrfs/042
+++ b/tests/btrfs/042
@@ -43,6 +43,7 @@ _cleanup()
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch
+_require_btrfs_qgroup_report

 rm -f $seqres.full

@@ -84,10 +85,7 @@ for i in `seq 10 -1 1`; do
total_written=$(($total_written+$filesize))
 done

-#check if total written exceeds limit
-if [ $total_written -gt $LIMIT_SIZE ];then
-   _fail "total written should be less than $LIMIT_SIZE"
-fi
+# qgroup will be checked automatically at _check_scratch_fs() by fstest


This doesn't look like an equivalent replacement, and btrfs/042 fails
for me after this update (wrong qgroup numbers) on 4.9-rc4 kernel. Is
this change intentional?

Thanks,
Eryu


Sorry, just a typo.For 042 only the _require_btrfs_qgroup_report line is 
need.


I also ran the test with v4.9-rc6 with some new btrfs patches (David's 
for-next branch), it ran without problem.


So the error seems to be a fixed btrfs qgroup problem.

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/2] fstests: btrfs: Use _require_btrfs_qgroup_report to replace open code

2016-12-08 Thread Eryu Guan
On Thu, Dec 08, 2016 at 04:12:13PM +0800, Qu Wenruo wrote:
> Introduce new _require_btrfs_qgroup_report function, which will check
> the accessibility to "btrfs check --qgroup-report", then set a global
> flag to info _check_scratch_fs() to do extra qgroup check.
> 
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   Use "${RESULT_DIR}/require_scratch.require_qgroup_report" instead of
>   global variant
>   Rebased to latest master
>   Replace btrfsck with $BTRFS_UTIL_PROG check.
[snip]
> diff --git a/tests/btrfs/042 b/tests/btrfs/042
> index 498ccc9..dc9b762 100755
> --- a/tests/btrfs/042
> +++ b/tests/btrfs/042
> @@ -43,6 +43,7 @@ _cleanup()
>  _supported_fs btrfs
>  _supported_os Linux
>  _require_scratch
> +_require_btrfs_qgroup_report
>  
>  rm -f $seqres.full
>  
> @@ -84,10 +85,7 @@ for i in `seq 10 -1 1`; do
>   total_written=$(($total_written+$filesize))
>  done
>  
> -#check if total written exceeds limit
> -if [ $total_written -gt $LIMIT_SIZE ];then
> - _fail "total written should be less than $LIMIT_SIZE"
> -fi
> +# qgroup will be checked automatically at _check_scratch_fs() by fstest

This doesn't look like an equivalent replacement, and btrfs/042 fails
for me after this update (wrong qgroup numbers) on 4.9-rc4 kernel. Is
this change intentional?

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] fstests: btrfs: Use _require_btrfs_qgroup_report to replace open code

2016-12-08 Thread Qu Wenruo
Introduce new _require_btrfs_qgroup_report function, which will check
the accessibility to "btrfs check --qgroup-report", then set a global
flag to info _check_scratch_fs() to do extra qgroup check.

Signed-off-by: Qu Wenruo 
---
v2:
  Use "${RESULT_DIR}/require_scratch.require_qgroup_report" instead of
  global variant
  Rebased to latest master
  Replace btrfsck with $BTRFS_UTIL_PROG check.
---
 check   |  4 ++--
 common/btrfs| 22 +-
 common/rc   |  5 +++--
 tests/btrfs/022 |  4 
 tests/btrfs/028 |  5 ++---
 tests/btrfs/042 |  6 ++
 tests/btrfs/099 |  1 +
 tests/btrfs/104 | 20 +---
 tests/btrfs/122 |  9 +++--
 tests/btrfs/123 |  5 ++---
 tests/btrfs/126 |  1 +
 11 files changed, 46 insertions(+), 36 deletions(-)

diff --git a/check b/check
index 8f2a1bb..76eb054 100755
--- a/check
+++ b/check
@@ -405,11 +405,11 @@ _check_filesystems()
 {
if [ -f ${RESULT_DIR}/require_test ]; then
_check_test_fs || err=true
-   rm -f ${RESULT_DIR}/require_test
+   rm -f ${RESULT_DIR}/require_test*
fi
if [ -f ${RESULT_DIR}/require_scratch ]; then
_check_scratch_fs || err=true
-   rm -f ${RESULT_DIR}/require_scratch
+   rm -f ${RESULT_DIR}/require_scratch*
fi
 }
 
diff --git a/common/btrfs b/common/btrfs
index 302edc6..a860228 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -40,6 +40,13 @@ _require_btrfs_command()
[ $? -eq 0 ] || _notrun "$BTRFS_UTIL_PROG too old (must support $cmd 
$param)"
 }
 
+# Require extra check on btrfs qgroup numbers
+_require_btrfs_qgroup_report()
+{
+   _require_btrfs_command check --qgroup-report
+   touch ${RESULT_DIR}/require_scratch.require_qgroup_report
+}
+
 _run_btrfs_util_prog()
 {
run_check $BTRFS_UTIL_PROG $*
@@ -97,7 +104,20 @@ _check_btrfs_filesystem()
mountpoint=`_umount_or_remount_ro $device`
fi
 
-   btrfsck $device >$tmp.fsck 2>&1
+   if [ -f ${RESULT_DIR}/require_scratch.require_qgroup_report ]; then
+   $BTRFS_UTIL_PROG check $device --qgroup-report > 
$tmp.qgroup_report 2>&1
+   if grep -qE "Counts for qgroup.*are different" 
$tmp.qgroup_report ; then
+   echo "_check_btrfs_filesystem: filesystem on $device 
has wrong qgroup numbers (see $seqres.full)"
+   echo "_check_btrfs_filesystem: filesystem on $device 
has wrong qgroup numbers" \
+   >> $seqres.full
+   echo "*** qgroup_report.$FSTYP output ***"  
>>$seqres.full
+   cat $tmp.qgroup_report  
>>$seqres.full
+   echo "*** qgroup_report.$FSTYP output ***"  
>>$seqres.full
+   fi
+   rm -f $tmp.qgroup_report
+   fi
+
+   $BTRFS_UTIL_PROG check $device >$tmp.fsck 2>&1
if [ $? -ne 0 ]; then
echo "_check_btrfs_filesystem: filesystem on $device is 
inconsistent (see $seqres.full)"
 
diff --git a/common/rc b/common/rc
index 2719b23..b50283c 100644
--- a/common/rc
+++ b/common/rc
@@ -1222,8 +1222,9 @@ _notrun()
 {
 echo "$*" > $seqres.notrun
 echo "$seq not run: $*"
-rm -f ${RESULT_DIR}/require_test
-rm -f ${RESULT_DIR}/require_scratch
+rm -f ${RESULT_DIR}/require_test*
+rm -f ${RESULT_DIR}/require_scratch*
+
 status=0
 exit
 }
diff --git a/tests/btrfs/022 b/tests/btrfs/022
index 56d4f3d..9abad6c 100755
--- a/tests/btrfs/022
+++ b/tests/btrfs/022
@@ -43,6 +43,7 @@ _cleanup()
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch
+_require_btrfs_qgroup_report
 
 rm -f $seqres.full
 
@@ -125,16 +126,19 @@ _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
 _basic_test
 _scratch_unmount
+_check_scratch_fs
 
 _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
 _rescan_test
 _scratch_unmount
+_check_scratch_fs
 
 _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
 _limit_test_exceed
 _scratch_unmount
+_check_scratch_fs
 
 _scratch_mkfs > /dev/null 2>&1
 _scratch_mount
diff --git a/tests/btrfs/028 b/tests/btrfs/028
index 1425609..a3d9a27 100755
--- a/tests/btrfs/028
+++ b/tests/btrfs/028
@@ -51,6 +51,7 @@ rm -f $seqres.full
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch
+_require_btrfs_qgroup_report
 
 _scratch_mkfs
 _scratch_mount
@@ -86,9 +87,7 @@ _run_btrfs_util_prog filesystem sync $SCRATCH_MNT
 
 _scratch_unmount
 
-# generate a qgroup report and look for inconsistent groups
-$BTRFS_UTIL_PROG check --qgroup-report $SCRATCH_DEV 2>&1 | \
-   grep -E "Counts for qgroup.*are different"
+# qgroup will be checked at _check_scratch_fs() by fstest.
 echo "Silence is golden"
 status=0
 
diff --git a/tests/btrfs/042 b/tests/btrfs/042
index 498ccc9..dc9b762 100755
--- a/tests/btrfs/042
+++ b/tests/btrfs/042
@@ -43,6 +43,7 @@ _cleanup()
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch

[PATCH v2 1/2] fstests: common: rename and enhance _require_btrfs to _require_btrfs_command

2016-12-08 Thread Qu Wenruo
Rename _require_btrfs() to _require_btrfs_command() to avoid
confusion, as all other _require_btrfs_* has a quite clear suffix, like
_require_btrfs_mkfs_feature() or _require_btrfs_fs_feature().

Also enhance _require_btrfs_command() to accept 2nd level commands or
options.
Options will be determined by the first "-" char.
This is quite useful for case like "btrfs inspect-internal dump-tree"
and "btrfs check --qgroup-report".

Signed-off-by: Qu Wenruo 
---
v2:
  Replace _subcommand with _command.
  Rebase to latest master
  Use grep -w and -q to make it safer and less noisy
  Update the _notrun string
  Fix typo _not_run
---
 common/btrfs| 22 --
 tests/btrfs/004 |  3 ++-
 tests/btrfs/048 |  2 +-
 tests/btrfs/059 |  2 +-
 tests/btrfs/131 |  2 +-
 5 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/common/btrfs b/common/btrfs
index ab6497d..302edc6 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -10,16 +10,34 @@ _btrfs_get_subvolid()
$BTRFS_UTIL_PROG sub list $mnt | grep $name | awk '{ print $2 }'
 }
 
+# _require_btrfs_command  [|]
 # We check for btrfs and (optionally) features of the btrfs command
-_require_btrfs()
+# It can both subfunction like "inspect-internal dump-tree" and
+# options like "check --qgroup-report"
+_require_btrfs_command()
 {
cmd=$1
+   param=$2
+
_require_command "$BTRFS_UTIL_PROG" btrfs
if [ -z "$1" ]; then
return 1;
fi
-   $BTRFS_UTIL_PROG $cmd --help >/dev/null 2>&1
+   $BTRFS_UTIL_PROG $cmd --help &> /dev/null
[ $? -eq 0 ] || _notrun "$BTRFS_UTIL_PROG too old (must support $cmd)"
+
+   test -z "$param" && return
+
+   # If $param is an option, replace leading "-"s for grep
+   if [ ${param:0:1} == "-" ]; then
+   safe_param=$(echo $param | sed 's/^-*//')
+   $BTRFS_UTIL_PROG $cmd --help | grep -wq $safe_param || \
+   _notrun "$BTRFS_UTIL_PROG too old (must support $cmd 
$param)"
+   return
+   fi
+
+   $BTRFS_UTIL_PROG $cmd $param --help &> /dev/null
+   [ $? -eq 0 ] || _notrun "$BTRFS_UTIL_PROG too old (must support $cmd 
$param)"
 }
 
 _run_btrfs_util_prog()
diff --git a/tests/btrfs/004 b/tests/btrfs/004
index 905770a..3f8330f 100755
--- a/tests/btrfs/004
+++ b/tests/btrfs/004
@@ -51,7 +51,8 @@ _supported_fs btrfs
 _supported_os Linux
 _require_scratch
 _require_no_large_scratch_dev
-_require_btrfs inspect-internal
+_require_btrfs_command inspect-internal logical-resolve
+_require_btrfs_command inspect-internal inode-resolve
 _require_command "/usr/sbin/filefrag" filefrag
 
 rm -f $seqres.full
diff --git a/tests/btrfs/048 b/tests/btrfs/048
index 0b907b0..e03b3c5 100755
--- a/tests/btrfs/048
+++ b/tests/btrfs/048
@@ -48,7 +48,7 @@ _supported_fs btrfs
 _supported_os Linux
 _require_test
 _require_scratch
-_require_btrfs "property"
+_require_btrfs_command "property"
 
 send_files_dir=$TEST_DIR/btrfs-test-$seq
 
diff --git a/tests/btrfs/059 b/tests/btrfs/059
index 8f106d2..2d1ec23 100755
--- a/tests/btrfs/059
+++ b/tests/btrfs/059
@@ -51,7 +51,7 @@ _supported_fs btrfs
 _supported_os Linux
 _require_test
 _require_scratch
-_require_btrfs "property"
+_require_btrfs_command "property"
 
 rm -f $seqres.full
 
diff --git a/tests/btrfs/131 b/tests/btrfs/131
index d1a11d2..ce486e6 100755
--- a/tests/btrfs/131
+++ b/tests/btrfs/131
@@ -48,7 +48,7 @@ rm -f $seqres.full
 _supported_fs btrfs
 _supported_os Linux
 _require_scratch
-_require_btrfs inspect-internal
+_require_btrfs_command inspect-internal dump-super
 
 mkfs_v1()
 {
-- 
2.7.4



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html