Re: [patch] btrfs: array overflow in btrfs_ioctl_rm_dev_v2()

2016-02-17 Thread Anand Jain



Thanks Dan.
 Chris pointed out as well. We are working on it..
 Just one concern when device is added the max device length is
 BTRFS_PATH_NAME_MAX. However below fix is proper from the vol_args
 perspective.

Thanks,  Anand


On 02/18/2016 01:01 PM, Dan Carpenter wrote:

We were putting the NUL terminator at BTRFS_PATH_NAME_MAX (4087) bytes
instead of BTRFS_SUBVOL_NAME_MAX (4039) so it corrupted memory.

Fixes: 22af1a869288 ('btrfs: introduce device delete by devid')
Signed-off-by: Dan Carpenter 

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5224fc8..77c61b4 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2700,7 +2700,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void 
__user *arg)
if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
ret = btrfs_rm_device(root, NULL, vol_args->devid);
} else {
-   vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
+   vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
ret = btrfs_rm_device(root, vol_args->name, 0);
}
mutex_unlock(>fs_info->volume_mutex);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 12/13] btrfs: introduce device delete by devid

2016-02-17 Thread Anand Jain



On 02/17/2016 06:49 PM, David Sterba wrote:

On Sat, Feb 13, 2016 at 10:01:39AM +0800, Anand Jain wrote:

+   if (vol_args->flags & BTRFS_DEVICE_BY_ID) {
+   ret = btrfs_rm_device(root, NULL, vol_args->devid);
+   } else {
+   vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';


   BTRFS_SUBVOL_NAME_MAX

Spotted by Chris,

fs/btrfs/ioctl.c:2703: warning: array subscript is above array bounds

my gcc version does not report that. Fixed and for-next pushed.


 mine either. Sorry about that, thanks for the catch.

#define BTRFS_PATH_NAME_MAX 4087
#define BTRFS_SUBVOL_NAME_MAX 4039

 I am fine with using BTRFS_SUBVOL_NAME_MAX for now. But theoretical
 anomaly is that add-device code path will use BTRFS_PATH_NAME_MAX and
 delete device will use BTRFS_SUBVOL_NAME_MAX.. its only theoretical
 as most of the devices path are well below 4k IMO. So its a good
 trade off than other solutions like.. (just for the understanding),

   - Update add device code as well to use btrfs_ioctl_vol_args_v2
 Which means we need to introduce BTRFS_IOC_ADD_DEV_V2 (system
 PATH_MAX is 4096).

   OR

   - Create new btrfs_ioctl_vol_args_v3 with name[BTRFS_PATH_NAME_MAX+1]
   (instead of name[BTRFS_SUBVOL_NAME_MAX+1]) and BTRFS_IOC_RM_DEV_V2
   will be the only consumer of btrfs_ioctl_vol_args_v3 as of now.


Thanks, Anand




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 5/5] fstests: btrfs: Test inband dedup with balance.

2016-02-17 Thread Qu Wenruo
Btrfs balance will reloate date extent, but its hash is removed too late
at run_delayed_ref() time, which will cause extent ref increased
increased during balance, cause either find_data_references() gives
WARN_ON() or even run_delayed_refs() fails and cause transaction abort.

Add such concurrency test for inband dedup and balance.

Signed-off-by: Qu Wenruo 
---
 tests/btrfs/203 | 92 +
 tests/btrfs/203.out |  3 ++
 tests/btrfs/group   |  1 +
 3 files changed, 96 insertions(+)
 create mode 100755 tests/btrfs/203
 create mode 100644 tests/btrfs/203.out

diff --git a/tests/btrfs/203 b/tests/btrfs/203
new file mode 100755
index 000..f09af27
--- /dev/null
+++ b/tests/btrfs/203
@@ -0,0 +1,92 @@
+#! /bin/bash
+# FS QA Test 203
+#
+# Btrfs reflink with balance concurrency test
+#
+#---
+# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   kill $balance_pid &> /dev/null
+   wait
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/reflink
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_cp_reflink
+_need_to_be_root
+_require_btrfs_subcommand dedup
+_require_btrfs_kernel_feature dedup
+_require_btrfs_mkfs_feature dedup
+
+dedup_bs=$(( 128 * 1024 ))
+file=$SCRATCH_MNT/foo
+nr=2
+
+_scratch_mkfs "-O dedup" >> $seqres.full 2>&1
+_scratch_mount
+
+_run_btrfs_util_prog dedup enable -b $dedup_bs $SCRATCH_MNT
+
+# create the initial file
+$XFS_IO_PROG -f -c "pwrite -b $dedup_bs 0 $dedup_bs" $file | _filter_xfs_io
+
+# make sure hash is added into hash pool
+sync
+
+_btrfs_stress_balance $SCRATCH_MNT >/dev/null 2>&1 &
+balance_pid=$!
+
+for n in $(seq 1 $nr); do
+   $XFS_IO_PROG -f -c "pwrite -b $dedup_bs 0 $dedup_bs" \
+   ${file}_${n} > /dev/null 2>&1
+done
+
+kill $balance_pid &> /dev/null
+wait
+
+# Sometimes even we killed $balance_pid and wait returned,
+# balance may still be running, use balance cancel to wait it.
+_run_btrfs_util_prog balance cancel $SCRATCH_MNT &> /dev/null
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/203.out b/tests/btrfs/203.out
new file mode 100644
index 000..404394c
--- /dev/null
+++ b/tests/btrfs/203.out
@@ -0,0 +1,3 @@
+QA output created by 203
+wrote 131072/131072 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 34f49b8..ccd4cc7 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -121,3 +121,4 @@
 200 auto dedup
 201 auto dedup
 202 auto dedup
+203 auto dedup balance
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 2/5] fstests: btrfs: Add basic test for btrfs in-band de-duplication

2016-02-17 Thread Qu Wenruo
Add basic test for btrfs in-band de-duplication, including:
1) Enable
2) Re-enable
3) On disk extents are refering to same bytenr
4) Disable

Signed-off-by: Qu Wenruo 
---
 common/defrag   |   8 
 tests/btrfs/200 | 112 
 tests/btrfs/200.out |  17 
 tests/btrfs/group   |   1 +
 4 files changed, 138 insertions(+)
 create mode 100755 tests/btrfs/200
 create mode 100644 tests/btrfs/200.out

diff --git a/common/defrag b/common/defrag
index d2b137e..46175f5 100644
--- a/common/defrag
+++ b/common/defrag
@@ -47,6 +47,14 @@ _extent_count()
$XFS_IO_PROG -c "fiemap" $1 | tail -n +2 | grep -v hole | wc -l| 
$AWK_PROG '{print $1}'
 }
 
+_uniq_extent_count()
+{
+   file=$1
+   $XFS_IO_PROG -c "fiemap" $file >> $seqres.full 2>&1
+   $XFS_IO_PROG -c "fiemap" $file | tail -n +2 | grep -v hole |\
+   $AWK_PROG '{print $3}' | sort | uniq | wc -l
+}
+
 _check_extent_count()
 {
min=$1
diff --git a/tests/btrfs/200 b/tests/btrfs/200
new file mode 100755
index 000..856e1fb
--- /dev/null
+++ b/tests/btrfs/200
@@ -0,0 +1,112 @@
+#! /bin/bash
+# FS QA Test 200
+#
+# Basic btrfs inband dedup test, including:
+# 1) Enable
+# 2) Uniq file extent number
+# 3) Re-enable
+# 4) Disable
+#
+#---
+# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/defrag
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+_require_btrfs_subcommand dedup
+_require_btrfs_kernel_feature dedup
+_require_btrfs_mkfs_feature dedup
+
+# File size is twice the maximum file extent of btrfs
+# So even fallbacked to non-dedup, it will have at least 2 extents
+file_size=$(( 256 * 1024 * 1024 ))
+
+_scratch_mkfs "-O dedup" >> $seqres.full 2>&1
+_scratch_mount
+
+do_dedup_test()
+{
+   backend=$1
+   dedup_bs=$2
+   _run_btrfs_util_prog dedup enable -s $backend -b $dedup_bs $SCRATCH_MNT
+   $XFS_IO_PROG -f -c "pwrite -b $dedup_bs 0 $dedup_bs" \
+   $SCRATCH_MNT/initial_block | _filter_xfs_io
+
+   # sync to ensure dedup hash is added into dedup pool
+   sync
+   $XFS_IO_PROG -f -c "pwrite -b $dedup_bs 0 $file_size" \
+   $SCRATCH_MNT/real_file | _filter_xfs_io
+   # sync again to ensure data are all rewriten to disk
+   sync
+
+   # Test if real_file is de-duplicated
+   nr_uniq_extents=$(_uniq_extent_count $SCRATCH_MNT/real_file)
+   nr_total_extents=$(_extent_count $SCRATCH_MNT/real_file)
+
+   echo "uniq/total: $nr_uniq_extents/$nr_total_extents" >> $seqres.full
+   # Allow a small amount of dedup miss, as commit interval or
+   # memory pressure may break a dedup_bs block and cause
+   # smalll extent which won't go through dedup routine
+   if [ $nr_uniq_extents -ge $(( $nr_total_extents * 5 / 100 )) ]; then
+   echo "Too high dedup failure rate"
+   fi
+}
+
+# Test inmemory dedup first, use 64K dedup bs to keep compatibility
+# with 64K page size
+do_dedup_test inmemory 64K
+
+# Test ondisk backend, and re-enable function
+do_dedup_test ondisk 64K
+
+# Test 128K(default) dedup bs
+do_dedup_test inmemory 128K
+do_dedup_test ondisk 128K
+
+# Check dedup disable
+_run_btrfs_util_prog dedup disable $SCRATCH_MNT
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/200.out b/tests/btrfs/200.out
new file mode 100644
index 000..5197dbc
--- /dev/null
+++ b/tests/btrfs/200.out
@@ -0,0 +1,17 @@
+QA output created by 200
+wrote 65536/65536 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 268435456/268435456 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X 

[RFC PATCH v2 4/5] fstests: btrfs: Add per inode dedup flag test

2016-02-17 Thread Qu Wenruo
This test will check per inode dedup flag.

Signed-off-by: Qu Wenruo 
---
 tests/btrfs/202 | 117 
 tests/btrfs/202.out |  15 +++
 tests/btrfs/group   |   1 +
 3 files changed, 133 insertions(+)
 create mode 100755 tests/btrfs/202
 create mode 100644 tests/btrfs/202.out

diff --git a/tests/btrfs/202 b/tests/btrfs/202
new file mode 100755
index 000..95ccb04
--- /dev/null
+++ b/tests/btrfs/202
@@ -0,0 +1,117 @@
+#! /bin/bash
+# FS QA Test 202
+#
+# Btrfs per inode dedup flag test
+#
+#---
+# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/defrag
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+_require_btrfs_subcommand dedup
+_require_btrfs_subcommand property
+_require_btrfs_kernel_feature dedup
+_require_btrfs_mkfs_feature dedup
+
+# File size is twice the maximum file extent of btrfs
+# So even fallbacked to non-dedup, it will have at least 2 extents
+file_size=$(( 256 * 1024 * 1024 ))
+dedup_bs=$(( 64 * 1024 ))
+
+_scratch_mkfs "-O dedup" >> $seqres.full 2>&1
+_scratch_mount
+
+# Return 0 for not deduped at all , return 1 for part or full deduped
+test_file_deduped () {
+   file=$1
+
+   nr_uniq_extents=$(_uniq_extent_count $file)
+   nr_total_extents=$(_extent_count $file)
+
+   if [ $nr_uniq_extents -eq $nr_total_extents ]; then
+   echo "not de-duplicated"
+   else
+   echo "de-duplicated"
+   fi
+}
+
+dedup_write_file () {
+   file=$1
+   size=$2
+
+   $XFS_IO_PROG -f -c "pwrite -b $dedup_bs 0 $size" $file | _filter_xfs_io
+}
+
+print_result () {
+   file=$1
+
+   echo "$(basename $file): $(test_file_deduped $file)"
+}
+_run_btrfs_util_prog dedup enable -b $dedup_bs $SCRATCH_MNT
+touch $SCRATCH_MNT/dedup_file
+touch $SCRATCH_MNT/no_dedup_file
+mkdir $SCRATCH_MNT/dedup_dir
+mkdir $SCRATCH_MNT/no_dedup_dir
+
+_run_btrfs_util_prog property set $SCRATCH_MNT/no_dedup_file dedup disable
+_run_btrfs_util_prog property set $SCRATCH_MNT/no_dedup_dir dedup disable
+
+dedup_write_file $SCRATCH_MNT/tmp $dedup_bs
+# sync to ensure hash is added to dedup tree
+sync
+
+dedup_write_file $SCRATCH_MNT/dedup_file $file_size
+dedup_write_file $SCRATCH_MNT/no_dedup_file $file_size
+dedup_write_file $SCRATCH_MNT/dedup_dir/dedup_dir_default_file $file_size
+dedup_write_file $SCRATCH_MNT/no_dedup_dir/no_dedup_dir_default_file $file_size
+
+print_result $SCRATCH_MNT/dedup_file
+print_result $SCRATCH_MNT/no_dedup_file
+print_result $SCRATCH_MNT/dedup_dir/dedup_dir_default_file
+print_result $SCRATCH_MNT/no_dedup_dir/no_dedup_dir_default_file
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/202.out b/tests/btrfs/202.out
new file mode 100644
index 000..ced9e88
--- /dev/null
+++ b/tests/btrfs/202.out
@@ -0,0 +1,15 @@
+QA output created by 202
+wrote 65536/65536 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 268435456/268435456 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 268435456/268435456 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 268435456/268435456 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 268435456/268435456 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+dedup_file: de-duplicated
+no_dedup_file: not de-duplicated
+dedup_dir_default_file: de-duplicated
+no_dedup_dir_default_file: not de-duplicated
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 6ee90a4..34f49b8 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -120,3 +120,4 @@
 117 auto quick send clone
 200 auto dedup
 201 auto dedup
+202 auto dedup
-- 
2.7.1


[RFC PATCH v2 0/5] Btrfs in-band de-duplication tests cases

2016-02-17 Thread Qu Wenruo
Btrfs in-band de-duplication test cases for btrfs internal use only.

Sequence number is not modified and uses number from 200 to avoid
possible conflicts.
Will modify sequence number and send to xfstests mail list after kernel
and btrfs-progs patches get merged.

Qu Wenruo (5):
  fstests: Add support to check btrfs sysfs features
  fstests: btrfs: Add basic test for btrfs in-band de-duplication
  fstests: btrfs: Add testcase for btrfs dedup enable disable race test
  fstests: btrfs: Add per inode dedup flag test
  fstests: btrfs: Test inband dedup with balance.

 common/defrag   |   8 
 common/rc   |  12 +-
 tests/btrfs/004 |   2 +-
 tests/btrfs/048 |   2 +-
 tests/btrfs/059 |   2 +-
 tests/btrfs/200 | 112 +
 tests/btrfs/200.out |  17 
 tests/btrfs/201 | 101 +
 tests/btrfs/201.out |   1 +
 tests/btrfs/202 | 117 
 tests/btrfs/202.out |  15 +++
 tests/btrfs/203 |  92 +
 tests/btrfs/203.out |   3 ++
 tests/btrfs/group   |   4 ++
 14 files changed, 484 insertions(+), 4 deletions(-)
 create mode 100755 tests/btrfs/200
 create mode 100644 tests/btrfs/200.out
 create mode 100755 tests/btrfs/201
 create mode 100644 tests/btrfs/201.out
 create mode 100755 tests/btrfs/202
 create mode 100644 tests/btrfs/202.out
 create mode 100755 tests/btrfs/203
 create mode 100644 tests/btrfs/203.out

-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 1/5] fstests: Add support to check btrfs sysfs features

2016-02-17 Thread Qu Wenruo
Btrfs has its sysfs interface showing what features current kernel/btrfs
module support.

Add _require_btrfs_kernel_feature() to check such interface.

Also rename _require_btrfs() to _require_btrfs_subcommand() to avoid
confusion.

Signed-off-by: Qu Wenruo 
---
 common/rc   | 12 +++-
 tests/btrfs/004 |  2 +-
 tests/btrfs/048 |  2 +-
 tests/btrfs/059 |  2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/common/rc b/common/rc
index 52c4a36..971473b 100644
--- a/common/rc
+++ b/common/rc
@@ -2680,7 +2680,7 @@ _require_deletable_scratch_dev_pool()
 }
 
 # We check for btrfs and (optionally) features of the btrfs command
-_require_btrfs()
+_require_btrfs_subcommand()
 {
cmd=$1
_require_command "$BTRFS_UTIL_PROG" btrfs
@@ -2691,6 +2691,16 @@ _require_btrfs()
[ $? -eq 0 ] || _notrun "$BTRFS_UTIL_PROG too old (must support $cmd)"
 }
 
+# We check if the kernel support given btrfs feature from its sysfs interface
+_require_btrfs_kernel_feature()
+{
+   feat=$1
+   # Use /dev/btrfs-control to ensure btrfs is loaded
+   touch /dev/btrfs-control
+   [[ ! -f /sys/fs/btrfs/features/$feat ]] && \
+   _notrun "kernel does not support $feat feature"
+}
+
 # Check that fio is present, and it is able to execute given jobfile
 _require_fio()
 {
diff --git a/tests/btrfs/004 b/tests/btrfs/004
index d588c5b..b5686ec 100755
--- a/tests/btrfs/004
+++ b/tests/btrfs/004
@@ -52,7 +52,7 @@ _supported_fs btrfs
 _supported_os Linux
 _require_scratch
 _require_no_large_scratch_dev
-_require_btrfs inspect-internal
+_require_btrfs_subcommand inspect-internal
 _require_command "/usr/sbin/filefrag" filefrag
 
 rm -f $seqres.full
diff --git a/tests/btrfs/048 b/tests/btrfs/048
index dc7386d..9ddd23f 100755
--- a/tests/btrfs/048
+++ b/tests/btrfs/048
@@ -47,7 +47,7 @@ _supported_fs btrfs
 _supported_os Linux
 _require_test
 _require_scratch
-_require_btrfs "property"
+_require_btrfs_subcommand "property"
 _need_to_be_root
 
 send_files_dir=$TEST_DIR/btrfs-test-$seq
diff --git a/tests/btrfs/059 b/tests/btrfs/059
index 3379ead..21d246c 100755
--- a/tests/btrfs/059
+++ b/tests/btrfs/059
@@ -50,7 +50,7 @@ _supported_fs btrfs
 _supported_os Linux
 _require_test
 _require_scratch
-_require_btrfs "property"
+_require_btrfs_subcommand "property"
 _need_to_be_root
 
 rm -f $seqres.full
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH v2 3/5] fstests: btrfs: Add testcase for btrfs dedup enable disable race test

2016-02-17 Thread Qu Wenruo
Add test case to check btrfs dedup enable/disable race.

Signed-off-by: Qu Wenruo 
---
 tests/btrfs/201 | 101 
 tests/btrfs/201.out |   1 +
 tests/btrfs/group   |   1 +
 3 files changed, 103 insertions(+)
 create mode 100755 tests/btrfs/201
 create mode 100644 tests/btrfs/201.out

diff --git a/tests/btrfs/201 b/tests/btrfs/201
new file mode 100755
index 000..dd573a5
--- /dev/null
+++ b/tests/btrfs/201
@@ -0,0 +1,101 @@
+#! /bin/bash
+# FS QA Test 201
+#
+# Basic btrfs inband dedup enable/disable race test
+#
+#---
+# Copyright (c) 2016 Fujitsu.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+   kill $trigger_work &> /dev/null
+   kill $fsstress_work &> /dev/null
+   wait
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+_require_btrfs_subcommand dedup
+_require_btrfs_kernel_feature dedup
+_require_btrfs_mkfs_feature dedup
+
+# Use 64K dedup size to keep compatibility for 64K page size
+dedup_bs=64K
+
+_scratch_mkfs "-O dedup" >> $seqres.full 2>&1
+_scratch_mount
+
+mkdir -p $SCRATCH_MNT/stressdir
+
+fsstress_work()
+{
+   $FSSTRESS_PROG $(_scale_fsstress_args -p 8 -n 5000) $FSSTRESS_AVOID \
+   -d $SCRATCH_MNT/stressdir > /dev/null 2>&1
+}
+
+trigger_work()
+{
+   while true; do
+   _run_btrfs_util_prog dedup enable -s inmemory \
+   -b $dedup_bs $SCRATCH_MNT
+   sleep 5
+   _run_btrfs_util_prog dedup disable $SCRATCH_MNT
+   sleep 5
+   _run_btrfs_util_prog dedup enable -s ondisk \
+   -b $dedup_bs $SCRATCH_MNT
+   sleep 5
+   _run_btrfs_util_prog dedup disable $SCRATCH_MNT
+   sleep 5
+   done
+}
+
+fsstress_work &
+fsstress_pid=$!
+
+trigger_work &
+trigger_pid=$!
+
+wait $fsstress_pid
+kill $trigger_pid
+wait
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/201.out b/tests/btrfs/201.out
new file mode 100644
index 000..b8969af
--- /dev/null
+++ b/tests/btrfs/201.out
@@ -0,0 +1 @@
+QA output created by 201
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 9b08570..6ee90a4 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -119,3 +119,4 @@
 116 auto quick metadata
 117 auto quick send clone
 200 auto dedup
+201 auto dedup
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] btrfs-progs: Add show-super support for new DEDUP flag

2016-02-17 Thread Qu Wenruo
Now btrfs-show-super can handle DEDUP ro compat flag.

Signed-off-by: Qu Wenruo 
---
 btrfs-show-super.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/btrfs-show-super.c b/btrfs-show-super.c
index 051bd11..0bc0b1f 100644
--- a/btrfs-show-super.c
+++ b/btrfs-show-super.c
@@ -328,6 +328,15 @@ struct readable_flag_entry {
u64 bit;
char *output;
 };
+#define DEF_RO_COMPAT_FLAG_ENTRY(bit_name) \
+   {BTRFS_FEATURE_COMPAT_RO_##bit_name, #bit_name}
+
+struct readable_flag_entry ro_compat_flags_array[] = {
+   DEF_RO_COMPAT_FLAG_ENTRY(DEDUP)
+};
+
+static const int ro_compat_flags_num = sizeof(ro_compat_flags_array) /
+  sizeof(struct readable_flag_entry);
 
 #define DEF_INCOMPAT_FLAG_ENTRY(bit_name)  \
{BTRFS_FEATURE_INCOMPAT_##bit_name, #bit_name}
@@ -400,6 +409,13 @@ static void __print_readable_flag(u64 flag, struct 
readable_flag_entry *array,
printf(")\n");
 }
 
+static void print_readable_ro_compat_flag(u64 ro_flag)
+{
+   return __print_readable_flag(ro_flag, ro_compat_flags_array,
+ro_compat_flags_num,
+BTRFS_FEATURE_COMPAT_RO_SUPP);
+}
+
 static void print_readable_incompat_flag(u64 flag)
 {
return __print_readable_flag(flag, incompat_flags_array,
@@ -491,6 +507,7 @@ static void dump_superblock(struct btrfs_super_block *sb, 
int full)
   (unsigned long long)btrfs_super_compat_flags(sb));
printf("compat_ro_flags\t\t0x%llx\n",
   (unsigned long long)btrfs_super_compat_ro_flags(sb));
+   print_readable_ro_compat_flag(btrfs_super_compat_ro_flags(sb));
printf("incompat_flags\t\t0x%llx\n",
   (unsigned long long)btrfs_super_incompat_flags(sb));
print_readable_incompat_flag(btrfs_super_incompat_flags(sb));
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] btrfs-progs: dedup: Add disable support for inband deduplication

2016-02-17 Thread Qu Wenruo
Add disable subcommand for dedup command group.

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-dedup.asciidoc |  5 +
 cmds-dedup.c   | 42 ++
 2 files changed, 47 insertions(+)

diff --git a/Documentation/btrfs-dedup.asciidoc 
b/Documentation/btrfs-dedup.asciidoc
index 43d2fb7..458df62 100644
--- a/Documentation/btrfs-dedup.asciidoc
+++ b/Documentation/btrfs-dedup.asciidoc
@@ -21,6 +21,11 @@ use with caution.
 
 SUBCOMMAND
 --
+*disable* ::
+Disable in-band de-duplication for a filesystem.
++
+This will trash all stored dedup hash.
++
 *enable* [options] ::
 Enable in-band de-duplication for a filesystem.
 +
diff --git a/cmds-dedup.c b/cmds-dedup.c
index 1da416f..c85bb5b 100644
--- a/cmds-dedup.c
+++ b/cmds-dedup.c
@@ -173,9 +173,51 @@ out:
return ret;
 }
 
+static const char * const cmd_dedup_disable_usage[] = {
+   "btrfs dedup disable ",
+   "Disable in-band(write time) de-duplication of a btrfs.",
+   NULL
+};
+
+static int cmd_dedup_disable(int argc, char **argv)
+{
+   struct btrfs_ioctl_dedup_args dargs;
+   DIR *dirstream;
+   char *path;
+   int fd;
+   int ret;
+
+   if (check_argc_exact(argc, 2))
+   usage(cmd_dedup_disable_usage);
+
+   path = argv[1];
+   fd = open_file_or_dir(path, );
+   if (fd < 0) {
+   error("failed to open file or directory: %s", path);
+   return 1;
+   }
+   memset(, 0, sizeof(dargs));
+   dargs.cmd = BTRFS_DEDUP_CTL_DISABLE;
+
+   ret = ioctl(fd, BTRFS_IOC_DEDUP_CTL, );
+   if (ret < 0) {
+   error("failed to disable inband deduplication: %s",
+ strerror(errno));
+   ret = 1;
+   goto out;
+   }
+   ret = 0;
+
+out:
+   close_file_or_dir(fd, dirstream);
+   return 0;
+}
+
 const struct cmd_group dedup_cmd_group = {
dedup_cmd_group_usage, dedup_cmd_group_info, {
{ "enable", cmd_dedup_enable, cmd_dedup_enable_usage, NULL, 0},
+   { "disable", cmd_dedup_disable, cmd_dedup_disable_usage,
+ NULL, 0},
NULL_CMD_STRUCT
}
 };
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] btrfs-progs: property: add a dedup property

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Normally if we enable online dedup for a fs, it's filesystem wide
de-duplication. With this property, we can explicitly disable data
de-duplication for specified files.

Signed-off-by: Wang Xiaoguang 
---
 Documentation/btrfs-property.asciidoc |  2 +
 props.c   | 73 +++
 2 files changed, 75 insertions(+)

diff --git a/Documentation/btrfs-property.asciidoc 
b/Documentation/btrfs-property.asciidoc
index 8b9b7f0..2fbecf6 100644
--- a/Documentation/btrfs-property.asciidoc
+++ b/Documentation/btrfs-property.asciidoc
@@ -44,6 +44,8 @@ label
 label of device
 compression
 compression setting for an inode: lzo, zlib, or "" (empty string)
+dedup
+online dedup setting for an inode: disable or "" (empty string)
 
 *list* [-t ] ::
 Lists available properties with their descriptions for the given object.
diff --git a/props.c b/props.c
index 5b74932..f27eb8b 100644
--- a/props.c
+++ b/props.c
@@ -187,6 +187,77 @@ out:
return ret;
 }
 
+static int prop_dedup(enum prop_object_type type, const char *object,
+   const char *name, const char *value)
+{
+   int ret;
+   ssize_t sret;
+   int fd = -1;
+   DIR *dirstream = NULL;
+   char *buf = NULL;
+   char *xattr_name = NULL;
+   int open_flags = value ? O_RDWR : O_RDONLY;
+
+   fd = open_file_or_dir3(object, , open_flags);
+   if (fd == -1) {
+   ret = -errno;
+   fprintf(stderr, "ERROR: open %s failed. %s\n",
+   object, strerror(-ret));
+   goto out;
+   }
+
+   xattr_name = malloc(XATTR_BTRFS_PREFIX_LEN + strlen(name) + 1);
+   if (!xattr_name) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   memcpy(xattr_name, XATTR_BTRFS_PREFIX, XATTR_BTRFS_PREFIX_LEN);
+   memcpy(xattr_name + XATTR_BTRFS_PREFIX_LEN, name, strlen(name));
+   xattr_name[XATTR_BTRFS_PREFIX_LEN + strlen(name)] = '\0';
+
+   if (value)
+   sret = fsetxattr(fd, xattr_name, value, strlen(value), 0);
+   else
+   sret = fgetxattr(fd, xattr_name, NULL, 0);
+   if (sret < 0) {
+   ret = -errno;
+   if (ret != -ENOATTR)
+   fprintf(stderr,
+   "ERROR: failed to %s dedup for %s. %s\n",
+   value ? "set" : "get", object, strerror(-ret));
+   else
+   ret = 0;
+   goto out;
+   }
+   if (!value) {
+   size_t len = sret;
+
+   buf = malloc(len);
+   if (!buf) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   sret = fgetxattr(fd, xattr_name, buf, len);
+   if (sret < 0) {
+   ret = -errno;
+   fprintf(stderr,
+   "ERROR: failed to get dedup for %s. %s\n",
+   object, strerror(-ret));
+   goto out;
+   }
+   fprintf(stdout, "dedup=%.*s\n", (int)len, buf);
+   }
+
+   ret = 0;
+out:
+   free(xattr_name);
+   free(buf);
+   if (fd >= 0)
+   close_file_or_dir(fd, dirstream);
+
+   return ret;
+}
+
 const struct prop_handler prop_handlers[] = {
{"ro", "Set/get read-only flag of subvolume.", 0, prop_object_subvol,
 prop_read_only},
@@ -194,5 +265,7 @@ const struct prop_handler prop_handlers[] = {
 prop_object_dev | prop_object_root, prop_label},
{"compression", "Set/get compression for a file or directory", 0,
 prop_object_inode, prop_compression},
+   {"dedup", "Set/get dedup for a file or directory", 0,
+prop_object_inode, prop_dedup},
{NULL, NULL, 0, 0, NULL}
 };
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] btrfs-progs: Add dedup feature for mkfs and convert

2016-02-17 Thread Qu Wenruo
Add new DEDUP ro compat flag and corresponding mkfs/convert flag
'dedup'.

Since dedup tree is completely isolated from fs tree, so even old kernel
could do read mount.
So add it to RO compat flag instead of common incompat flags

Signed-off-by: Qu Wenruo 
---
 Documentation/mkfs.btrfs.asciidoc |  9 
 btrfs-convert.c   | 19 +++-
 mkfs.c|  8 +--
 utils.c   | 47 +--
 utils.h   |  7 +++---
 5 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/Documentation/mkfs.btrfs.asciidoc 
b/Documentation/mkfs.btrfs.asciidoc
index 6a49265..5fc97fd 100644
--- a/Documentation/mkfs.btrfs.asciidoc
+++ b/Documentation/mkfs.btrfs.asciidoc
@@ -208,6 +208,15 @@ reduced-size metadata for extent references, saves a few 
percent of metadata
 improved representation of file extents where holes are not explicitly
 stored as an extent, saves a few percent of metadata if sparse files are used
 
+*dedup*::
+allow btrfs to use new on-disk format designed for in-band(write time)
+de-duplication.
++
+on-disk storage backend and persist de-duplication status needs this feature.
++
+this feature is RO compat feature, means old kernel can still mount it
+read-only.
+
 BLOCK GROUPS, CHUNKS, RAID
 --
 
diff --git a/btrfs-convert.c b/btrfs-convert.c
index 4baa68e..ad25065 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -2453,7 +2453,7 @@ static int convert_open_fs(const char *devname,
 
 static int do_convert(const char *devname, int datacsum, int packing, int 
noxattr,
u32 nodesize, int copylabel, const char *fslabel, int progress,
-   u64 features)
+   u64 features, u64 ro_features)
 {
int i, ret, blocks_per_node;
int fd = -1;
@@ -2504,8 +2504,9 @@ static int do_convert(const char *devname, int datacsum, 
int packing, int noxatt
fprintf(stderr, "unable to open %s\n", devname);
goto fail;
}
-   btrfs_parse_features_to_string(features_buf, features);
-   if (features == BTRFS_MKFS_DEFAULT_FEATURES)
+   btrfs_parse_features_to_string(features_buf, features, ro_features);
+   if (features == BTRFS_MKFS_DEFAULT_FEATURES &&
+   ro_features == 0)
strcat(features_buf, " (default)");
 
printf("create btrfs filesystem:\n");
@@ -2521,6 +2522,7 @@ static int do_convert(const char *devname, int datacsum, 
int packing, int noxatt
mkfs_cfg.sectorsize = blocksize;
mkfs_cfg.stripesize = blocksize;
mkfs_cfg.features = features;
+   mkfs_cfg.ro_features = ro_features;
 
ret = make_btrfs(fd, _cfg);
if (ret) {
@@ -3071,6 +3073,7 @@ int main(int argc, char *argv[])
char *file;
char fslabel[BTRFS_LABEL_SIZE];
u64 features = BTRFS_MKFS_DEFAULT_FEATURES;
+   u64 ro_features = 0;
 
while(1) {
enum { GETOPT_VAL_NO_PROGRESS = 256 };
@@ -3129,7 +3132,8 @@ int main(int argc, char *argv[])
char *orig = strdup(optarg);
char *tmp = orig;
 
-   tmp = btrfs_parse_fs_features(tmp, );
+   tmp = btrfs_parse_fs_features(tmp, ,
+ _features);
if (tmp) {
fprintf(stderr,
"Unrecognized filesystem 
feature '%s'\n",
@@ -3147,7 +3151,9 @@ int main(int argc, char *argv[])
char buf[64];
 
btrfs_parse_features_to_string(buf,
-   features & 
~BTRFS_CONVERT_ALLOWED_FEATURES);
+   features &
+   ~BTRFS_CONVERT_ALLOWED_FEATURES,
+   ro_features);
fprintf(stderr,
"ERROR: features not allowed 
for convert: %s\n",
buf);
@@ -3198,7 +3204,8 @@ int main(int argc, char *argv[])
ret = do_rollback(file);
} else {
ret = do_convert(file, datacsum, packing, noxattr, nodesize,
-   copylabel, fslabel, progress, features);
+   copylabel, fslabel, progress, features,
+   ro_features);
}
if (ret)
return 1;
diff --git a/mkfs.c b/mkfs.c
index ea58404..184b9d2 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -1369,6 +1369,7 @@ int main(int ac, char **av)
int saved_optind;
char fs_uuid[BTRFS_UUID_UNPARSED_SIZE] = { 0 };
u64 

[PATCH 2/8] btrfs-progs: dedup: Add enable command for dedup command group

2016-02-17 Thread Qu Wenruo
Add enable subcommand for dedup commmand group.

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-dedup.asciidoc | 101 ++-
 cmds-dedup.c   | 138 +
 ioctl.h|   2 +
 3 files changed, 240 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-dedup.asciidoc 
b/Documentation/btrfs-dedup.asciidoc
index 917977b..43d2fb7 100644
--- a/Documentation/btrfs-dedup.asciidoc
+++ b/Documentation/btrfs-dedup.asciidoc
@@ -21,7 +21,106 @@ use with caution.
 
 SUBCOMMAND
 --
-Nothing yet
+*enable* [options] ::
+Enable in-band de-duplication for a filesystem.
++
+`Options`
++
+-s|--storage-backend 
+Specify de-duplication hash storage backend.
+Supported backends are 'ondisk' and 'inmemory'.
+If not specified, default value is 'inmemory'.
++
+Refer to *BACKENDS* sector for more information.
+
+-b|--blocksize 
+Specify dedup block size.
+Supported values are power of 2 from '16K' to '8M'.
+Default value is '128K'.
++
+Refer to *BLOCKSIZE* sector for more information.
+
+-a|--hash-algorithm 
+Specify hash algorithm.
+Only 'sha256' is supported yet.
+
+-l|--limit-hash 
+Specify maximum number of hashes stored in memory.
+Only works for 'inmemory' backend.
+Conflicts with '-m' option.
++
+Only positive values are valid.
+Default value is '32K'.
+
+-m|--limit-memory 
+Specify maximum memory used for hashes.
+Only works for 'inmemory' backend.
+Conflicts with '-l' option.
++
+Only positive values are valid.
+No default value.
+
+WARNING: Too large value for '-l' or '-m' will easily trigger OOM.
+Please use with caution according to system memory or use 'ondisk' backend
+if memory usage is critical.
+
+BACKENDS
+
+Btrfs in-band de-duplication support two different backends with their own
+features.
+
+In-memory backend::
+This backend provides backward-compatibility, and more fine-tuning options.
+But hash pool is non-persistent and may exhaust kernel memory if not setup
+properly.
++
+This backend can be used on old btrfs(without '-O dedup' mkfs option).
+When used on old btrfs, this backend needs to be enabled manually after mount.
++
+Designed for fast hash search speed, in-memory backend will keep all dedup
+hashes in memory. (Although overall performance is still much the same with
+'ondisk' backend)
++
+And only keeps limited number of hash in memory to avoid exhausting memory.
+Hashes over the limit will be dropped following Last-Recent-Use behavior.
+So this backend has a consistent overhead for given limit but can\'t ensure
+any all duplicated blocks will be de-duplicated.
++
+After umount and mount, in-memory backend need to refill its hash pool.
+
+On-disk backend::
+This backend provides persistent hash pool, with more smart memory management
+for hash pool.
+But it\'s not backward-compatible, meaning it must be used with '-O dedup' mkfs
+option and older kernel can\'t mount it read-write.
++
+Designed for de-duplication rate, hash pool is stored as B+ tree on disk.
+Although this behavior may cause extra disk IO for hash search under extreme
+high memory pressure,
+under most case the overall performance should be on par with 'inmemory'
+backend.
++
+After umount and mount, on-disk backend still has its hash on disk, no need to
+refill its dedup hash pool.
+
+DEDUP BLOCK SIZE
+
+In-band de-duplication is done at dedup block size.
+Any data smaller than dedup block size won\'t go through in-band
+de-duplication.
+
+And dedup block size affects dedup rate and fragmentation heavily.
+
+Smaller block size will cause more fragments, but higher dedup rate.
+
+Larger block size will cause less fragments, but lower dedup rate.
+
+In-band de-duplication rate is highly related to the workload pattern.
+So it\'s highly recommended to align dedup block size to the workload
+block size to make full use of de-duplication.
+
+And dedup block size larger than 128K will cause compression unavailable, as
+compression only support maximum extent size of 128K.
 
 EXIT STATUS
 ---
diff --git a/cmds-dedup.c b/cmds-dedup.c
index 800df34..1da416f 100644
--- a/cmds-dedup.c
+++ b/cmds-dedup.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ctree.h"
 #include "ioctl.h"
@@ -36,8 +37,145 @@ static const char * const dedup_cmd_group_usage[] = {
 static const char dedup_cmd_group_info[] =
 "manage inband(write time) de-duplication";
 
+static const char * const cmd_dedup_enable_usage[] = {
+   "btrfs dedup enable [options] ",
+   "Enable in-band(write time) de-duplication of a btrfs.",
+   "",
+   "-s|--storage-backend ",
+   "   specify dedup hash storage backend",
+   "   supported backend: 'ondisk', 'inmemory'",
+   "   inmemory is the default backend",
+   "-b|--blocksize ",
+   "   specify dedup block size",
+   "   default value is 128K",

[PATCH 7/8] btrfs-progs: debug-tree: Add dedup tree support

2016-02-17 Thread Qu Wenruo
Add dedup tree support for btrfs-debug-tree.

Signed-off-by: Qu Wenruo 
---
v2:
  Add support to print hex objectid/offset for dedup hash.
  Add support to print hex hash.
---
 cmds-inspect-dump-tree.c |  4 +++
 ctree.h  |  7 
 print-tree.c | 94 
 3 files changed, 105 insertions(+)

diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index 0dab506..5e324e0 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -395,6 +395,10 @@ again:
printf("multiple");
}
break;
+   case BTRFS_DEDUP_TREE_OBJECTID:
+   if (!skip)
+   printf("dedup");
+   break;
default:
if (!skip) {
printf("file");
diff --git a/ctree.h b/ctree.h
index fe0d521..691a4ab 100644
--- a/ctree.h
+++ b/ctree.h
@@ -79,6 +79,9 @@ struct btrfs_free_space_ctl;
 /* tracks free space in block groups. */
 #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
 
+/* on-disk dedup tree (EXPERIMENTAL) */
+#define BTRFS_DEDUP_TREE_OBJECTID 11ULL
+
 /* for storing balance parameters in the root tree */
 #define BTRFS_BALANCE_OBJECTID -4ULL
 
@@ -1216,6 +1219,10 @@ struct btrfs_root {
 #define BTRFS_DEV_ITEM_KEY 216
 #define BTRFS_CHUNK_ITEM_KEY   228
 
+#define BTRFS_DEDUP_STATUS_ITEM_KEY230
+#define BTRFS_DEDUP_HASH_ITEM_KEY  231
+#define BTRFS_DEDUP_BYTENR_ITEM_KEY232
+
 #define BTRFS_BALANCE_ITEM_KEY 248
 
 /*
diff --git a/print-tree.c b/print-tree.c
index 6704ff6..53a6813 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -25,6 +25,7 @@
 #include "disk-io.h"
 #include "print-tree.h"
 #include "utils.h"
+#include "dedup.h"
 
 
 static void print_dir_item_type(struct extent_buffer *eb,
@@ -667,11 +668,31 @@ static void print_key_type(u64 objectid, u8 type)
case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
printf("UUID_KEY_RECEIVED_SUBVOL");
break;
+   case BTRFS_DEDUP_STATUS_ITEM_KEY:
+   printf("DEDUP_STATUS_ITEM");
+   break;
+   case BTRFS_DEDUP_HASH_ITEM_KEY:
+   printf("DEDUP_HASH_ITEM");
+   break;
+   case BTRFS_DEDUP_BYTENR_ITEM_KEY:
+   printf("DEDUP_BYTENR_ITEM");
+   break;
default:
printf("UNKNOWN.%d", type);
};
 }
 
+static void print_64bit_hash(u64 hash)
+{
+   int i;
+   unsigned char buf[8];
+
+   memcpy(buf, , 8);
+   printf("0x");
+   for (i = 0; i < 8; i++)
+   printf("%02x", buf[i]);
+}
+
 static void print_objectid(u64 objectid, u8 type)
 {
switch (type) {
@@ -686,6 +707,9 @@ static void print_objectid(u64 objectid, u8 type)
case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
printf("0x%016llx", (unsigned long long)objectid);
return;
+   case BTRFS_DEDUP_HASH_ITEM_KEY:
+   print_64bit_hash(objectid);
+   return;
}
 
switch (objectid) {
@@ -752,6 +776,9 @@ static void print_objectid(u64 objectid, u8 type)
case BTRFS_MULTIPLE_OBJECTIDS:
printf("MULTIPLE");
break;
+   case BTRFS_DEDUP_TREE_OBJECTID:
+   printf("DEDUP_TREE");
+   break;
case (u64)-1:
printf("-1");
break;
@@ -787,6 +814,9 @@ void btrfs_print_key(struct btrfs_disk_key *disk_key)
case BTRFS_UUID_KEY_RECEIVED_SUBVOL:
printf(" 0x%016llx)", (unsigned long long)offset);
break;
+   case BTRFS_DEDUP_BYTENR_ITEM_KEY:
+   print_64bit_hash(offset);
+   break;
default:
if (offset == (u64)-1)
printf(" -1)");
@@ -815,6 +845,54 @@ static void print_uuid_item(struct extent_buffer *l, 
unsigned long offset,
}
 }
 
+static void print_dedup_status(struct extent_buffer *node, int slot)
+{
+   struct btrfs_dedup_status_item *status_item;
+   u64 blocksize;
+   u64 limit;
+   u16 hash_type;
+   u16 backend;
+
+   status_item = btrfs_item_ptr(node, slot,
+   struct btrfs_dedup_status_item);
+   blocksize = btrfs_dedup_status_blocksize(node, status_item);
+   limit = btrfs_dedup_status_limit(node, status_item);
+   hash_type = btrfs_dedup_status_hash_type(node, status_item);
+   backend = btrfs_dedup_status_backend(node, status_item);
+
+   printf("\t\tdedup status item ");
+   if (backend == BTRFS_DEDUP_BACKEND_INMEMORY)
+   printf("backend: inmemory\n");
+   else if (backend == BTRFS_DEDUP_BACKEND_ONDISK)
+   printf("backend: ondisk\n");
+   else
+   printf("backend: 

[PATCH 1/8] btrfs-progs: Basic framework for dedup command group

2016-02-17 Thread Qu Wenruo
Add basic ioctl header and command group framework for later use.
Alone with basic man page doc.

Signed-off-by: Qu Wenruo 
---
 Documentation/Makefile.in  |  1 +
 Documentation/btrfs-dedup.asciidoc | 39 +++
 Documentation/btrfs.asciidoc   |  4 
 Makefile.in|  2 +-
 btrfs.c|  1 +
 cmds-dedup.c   | 48 ++
 commands.h |  2 ++
 ctree.h| 34 ++-
 dedup.h| 42 +
 ioctl.h| 21 +
 10 files changed, 192 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/btrfs-dedup.asciidoc
 create mode 100644 cmds-dedup.c
 create mode 100644 dedup.h

diff --git a/Documentation/Makefile.in b/Documentation/Makefile.in
index f046abd..9756570 100644
--- a/Documentation/Makefile.in
+++ b/Documentation/Makefile.in
@@ -30,6 +30,7 @@ MAN8_TXT += btrfs-qgroup.asciidoc
 MAN8_TXT += btrfs-replace.asciidoc
 MAN8_TXT += btrfs-restore.asciidoc
 MAN8_TXT += btrfs-property.asciidoc
+MAN8_TXT += btrfs-dedup.asciidoc
 
 # Category 5 manual page
 MAN5_TXT += btrfs-man5.asciidoc
diff --git a/Documentation/btrfs-dedup.asciidoc 
b/Documentation/btrfs-dedup.asciidoc
new file mode 100644
index 000..917977b
--- /dev/null
+++ b/Documentation/btrfs-dedup.asciidoc
@@ -0,0 +1,39 @@
+btrfs-dedup(8)
+==
+
+NAME
+
+btrfs-dedup - manage in-band (write time) de-duplication of a btrfs filesystem
+
+SYNOPSIS
+
+*btrfs dedup*  
+
+DESCRIPTION
+---
+*btrfs dedup* is used to enable/disable or show current in-band de-duplication
+status of a btrfs filesystem.
+
+Kernel support for in-band de-duplication starts from 4.6.
+
+WARNING: In-band de-duplication is still an experimental feautre of btrfs,
+use with caution.
+
+SUBCOMMAND
+--
+Nothing yet
+
+EXIT STATUS
+---
+*btrfs dedup* returns a zero exit status if it succeeds. Non zero is
+returned in case of failure.
+
+AVAILABILITY
+
+*btrfs* is part of btrfs-progs.
+Please refer to the btrfs wiki http://btrfs.wiki.kernel.org for
+further details.
+
+SEE ALSO
+
+`mkfs.btrfs`(8),
diff --git a/Documentation/btrfs.asciidoc b/Documentation/btrfs.asciidoc
index abf1ff8..27c8883 100644
--- a/Documentation/btrfs.asciidoc
+++ b/Documentation/btrfs.asciidoc
@@ -43,6 +43,10 @@ COMMANDS
Do off-line check on a btrfs filesystem. +
See `btrfs-check`(8) for details.
 
+*dedup*::
+   Control btrfs in-band(write time) de-duplication. +
+   See `btrfs-dedup`(8) for details.
+
 *device*::
Manage devices managed by btrfs, including add/delete/scan and so
on. +
diff --git a/Makefile.in b/Makefile.in
index 3bb073a..432eff3 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -75,7 +75,7 @@ cmds_objects = cmds-subvolume.o cmds-filesystem.o 
cmds-device.o cmds-scrub.o \
   cmds-inspect.o cmds-balance.o cmds-send.o cmds-receive.o \
   cmds-quota.o cmds-qgroup.o cmds-replace.o cmds-check.o \
   cmds-restore.o cmds-rescue.o chunk-recover.o super-recover.o \
-  cmds-property.o cmds-fi-usage.o
+  cmds-property.o cmds-fi-usage.o cmds-dedup.o
 libbtrfs_objects = send-stream.o send-utils.o rbtree.o btrfs-list.o crc32c.o \
   uuid-tree.o utils-lib.o rbtree-utils.o
 libbtrfs_headers = send-stream.h send-utils.h send.h rbtree.h btrfs-list.h \
diff --git a/btrfs.c b/btrfs.c
index cc70515..3775dfa 100644
--- a/btrfs.c
+++ b/btrfs.c
@@ -199,6 +199,7 @@ static const struct cmd_group btrfs_cmd_group = {
{ "receive", cmd_receive, cmd_receive_usage, NULL, 0 },
{ "quota", cmd_quota, NULL, _cmd_group, 0 },
{ "qgroup", cmd_qgroup, NULL, _cmd_group, 0 },
+   { "dedup", cmd_dedup, NULL, _cmd_group, 0 },
{ "replace", cmd_replace, NULL, _cmd_group, 0 },
{ "help", cmd_help, cmd_help_usage, NULL, 0 },
{ "version", cmd_version, cmd_version_usage, NULL, 0 },
diff --git a/cmds-dedup.c b/cmds-dedup.c
new file mode 100644
index 000..800df34
--- /dev/null
+++ b/cmds-dedup.c
@@ -0,0 +1,48 @@
+/*
+ * Copyright (C) 2015 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, 

[PATCH 4/8] btrfs-progs: dedup: Add status subcommand

2016-02-17 Thread Qu Wenruo
Add status subcommand for dedup command group.

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-dedup.asciidoc |  3 ++
 cmds-dedup.c   | 76 ++
 2 files changed, 79 insertions(+)

diff --git a/Documentation/btrfs-dedup.asciidoc 
b/Documentation/btrfs-dedup.asciidoc
index 458df62..f8df471 100644
--- a/Documentation/btrfs-dedup.asciidoc
+++ b/Documentation/btrfs-dedup.asciidoc
@@ -69,6 +69,9 @@ WARNING: Too large value for '-l' or '-m' will easily trigger 
OOM.
 Please use with caution according to system memory or use 'ondisk' backend
 if memory usage is critical.
 
+*status* ::
+Show current in-band de-duplication status of a filesystem.
+
 BACKENDS
 
 Btrfs in-band de-duplication support two different backends with their own
diff --git a/cmds-dedup.c b/cmds-dedup.c
index c85bb5b..64f898b 100644
--- a/cmds-dedup.c
+++ b/cmds-dedup.c
@@ -213,11 +213,87 @@ out:
return 0;
 }
 
+static const char * const cmd_dedup_status_usage[] = {
+   "btrfs dedup status ",
+   "Show current in-band(write time) de-duplication status of a btrfs.",
+   NULL
+};
+
+static int cmd_dedup_status(int argc, char **argv)
+{
+   struct btrfs_ioctl_dedup_args dargs;
+   DIR *dirstream;
+   char *path;
+   int fd;
+   int ret;
+   int print_limit = 1;
+
+   if (check_argc_exact(argc, 2))
+   usage(cmd_dedup_status_usage);
+
+   path = argv[1];
+   fd = open_file_or_dir(path, );
+   if (fd < 0) {
+   error("failed to open file or directory: %s", path);
+   ret = 1;
+   goto out;
+   }
+   memset(, 0, sizeof(dargs));
+   dargs.cmd = BTRFS_DEDUP_CTL_STATUS;
+
+   ret = ioctl(fd, BTRFS_IOC_DEDUP_CTL, );
+   if (ret < 0) {
+   error("failed to get inband deduplication status: %s",
+ strerror(errno));
+   ret = 1;
+   goto out;
+   }
+   ret = 0;
+   if (dargs.status == 0) {
+   printf("Status: \t\t\tDisabled\n");
+   goto out;
+   }
+   printf("Status:\t\t\tEnabled\n");
+
+   if (dargs.hash_type == BTRFS_DEDUP_HASH_SHA256)
+   printf("Hash algorithm:\t\tSHA-256\n");
+   else
+   printf("Hash algorithm:\t\tUnrecognized(%x)\n",
+   dargs.hash_type);
+
+   if (dargs.backend == BTRFS_DEDUP_BACKEND_INMEMORY) {
+   printf("Backend:\t\tIn-memory\n");
+   print_limit = 1;
+   } else if (dargs.backend == BTRFS_DEDUP_BACKEND_ONDISK) {
+   printf("Backend:\t\tOn-disk\n");
+   print_limit = 0;
+   } else  {
+   printf("Backend:\t\tUnrecognized(%x)\n",
+   dargs.backend);
+   }
+
+   printf("Dedup Blocksize:\t%llu\n", dargs.blocksize);
+
+   if (print_limit) {
+   printf("Number of hash: \t[%llu/%llu]\n", dargs.current_nr,
+   dargs.limit_nr);
+   printf("Memory usage: \t\t[%s/%s]\n",
+   pretty_size(dargs.current_nr *
+   (dargs.limit_mem / dargs.limit_nr)),
+   pretty_size(dargs.limit_mem));
+   }
+out:
+   close_file_or_dir(fd, dirstream);
+   return ret;
+}
+
 const struct cmd_group dedup_cmd_group = {
dedup_cmd_group_usage, dedup_cmd_group_info, {
{ "enable", cmd_dedup_enable, cmd_dedup_enable_usage, NULL, 0},
{ "disable", cmd_dedup_disable, cmd_dedup_disable_usage,
  NULL, 0},
+   { "status", cmd_dedup_status, cmd_dedup_status_usage,
+ NULL, 0},
NULL_CMD_STRUCT
}
 };
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 02/20] btrfs: dedup: Introduce function to initialize dedup info

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Add generic function to initialize dedup info.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/Makefile |  2 +-
 fs/btrfs/dedup.c  | 97 +++
 fs/btrfs/dedup.h  | 17 --
 3 files changed, 113 insertions(+), 3 deletions(-)
 create mode 100644 fs/btrfs/dedup.c

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 128ce17..a6207ff 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -9,7 +9,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o 
root-tree.o dir-item.o \
   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
-  uuid-tree.o props.o hash.o free-space-tree.o
+  uuid-tree.o props.o hash.o free-space-tree.o dedup.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
new file mode 100644
index 000..2abe178
--- /dev/null
+++ b/fs/btrfs/dedup.c
@@ -0,0 +1,97 @@
+/*
+ * Copyright (C) 2015 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+#include "ctree.h"
+#include "dedup.h"
+#include "btrfs_inode.h"
+#include "transaction.h"
+#include "delayed-ref.h"
+
+int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
+  u64 blocksize, u64 limit_nr)
+{
+   struct btrfs_dedup_info *dedup_info;
+   u64 limit = limit_nr;
+   int ret = 0;
+
+   /* Sanity check */
+   if (blocksize > BTRFS_DEDUP_BLOCKSIZE_MAX ||
+   blocksize < BTRFS_DEDUP_BLOCKSIZE_MIN ||
+   blocksize < fs_info->tree_root->sectorsize ||
+   !is_power_of_2(blocksize))
+   return -EINVAL;
+   if (type >= ARRAY_SIZE(btrfs_dedup_sizes))
+   return -EINVAL;
+   if (backend >= BTRFS_DEDUP_BACKEND_LAST)
+   return -EINVAL;
+
+   if (backend == BTRFS_DEDUP_BACKEND_INMEMORY && limit_nr == 0)
+   limit = 4096; /* default value */
+   if (backend == BTRFS_DEDUP_BACKEND_ONDISK && limit_nr != 0)
+   limit = 0;
+
+   dedup_info = fs_info->dedup_info;
+   if (dedup_info) {
+   /* Check if we are re-enable for different dedup config */
+   if (dedup_info->blocksize != blocksize ||
+   dedup_info->hash_type != type ||
+   dedup_info->backend != backend) {
+   btrfs_dedup_disable(fs_info);
+   goto enable;
+   }
+
+   /* On-fly limit change is OK */
+   mutex_lock(_info->lock);
+   fs_info->dedup_info->limit_nr = limit;
+   mutex_unlock(_info->lock);
+   return 0;
+   }
+
+enable:
+   dedup_info = kzalloc(sizeof(*dedup_info), GFP_NOFS);
+   if (dedup_info)
+   return -ENOMEM;
+
+   dedup_info->hash_type = type;
+   dedup_info->backend = backend;
+   dedup_info->blocksize = blocksize;
+   dedup_info->limit_nr = limit;
+
+   /* Only support SHA256 yet */
+   dedup_info->dedup_driver = crypto_alloc_shash("sha256", 0, 0);
+   if (IS_ERR(dedup_info->dedup_driver)) {
+   btrfs_err(fs_info, "failed to init sha256 driver");
+   ret = PTR_ERR(dedup_info->dedup_driver);
+   goto out;
+   }
+
+   dedup_info->hash_root = RB_ROOT;
+   dedup_info->bytenr_root = RB_ROOT;
+   dedup_info->current_nr = 0;
+   INIT_LIST_HEAD(_info->lru_list);
+   mutex_init(_info->lock);
+
+   fs_info->dedup_info = dedup_info;
+   /* We must ensure dedup_enabled is modified after dedup_info */
+   smp_wmb();
+   fs_info->dedup_enabled = 1;
+
+out:
+   if (ret < 0)
+   kfree(dedup_info);
+   return ret;
+}
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
index 8e1ff03..83a7512 100644
--- a/fs/btrfs/dedup.h
+++ b/fs/btrfs/dedup.h
@@ -37,6 +37,9 @@
 #define BTRFS_DEDUP_BLOCKSIZE_MIN  (16 * 1024)
 #define BTRFS_DEDUP_BLOCKSIZE_DEFAULT  (128 * 1024)
 
+/* Default dedup limit on number of hash */
+#define 

[PATCH v7 03/20] btrfs: dedup: Introduce function to add hash into in-memory tree

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce static function inmem_add() to add hash into in-memory tree.
And now we can implement the btrfs_dedup_add() interface.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/dedup.c | 162 +++
 1 file changed, 162 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 2abe178..5fa15ae 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -21,6 +21,25 @@
 #include "transaction.h"
 #include "delayed-ref.h"
 
+struct inmem_hash {
+   struct rb_node hash_node;
+   struct rb_node bytenr_node;
+   struct list_head lru_list;
+
+   u64 bytenr;
+   u32 num_bytes;
+
+   u8 hash[];
+};
+
+static inline struct inmem_hash *inmem_alloc_hash(u16 type)
+{
+   if (WARN_ON(type >= ARRAY_SIZE(btrfs_dedup_sizes)))
+   return NULL;
+   return kzalloc(sizeof(struct inmem_hash) + btrfs_dedup_sizes[type],
+   GFP_NOFS);
+}
+
 int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
   u64 blocksize, u64 limit_nr)
 {
@@ -95,3 +114,146 @@ out:
kfree(dedup_info);
return ret;
 }
+
+static int inmem_insert_hash(struct rb_root *root,
+struct inmem_hash *hash, int hash_len)
+{
+   struct rb_node **p = >rb_node;
+   struct rb_node *parent = NULL;
+   struct inmem_hash *entry = NULL;
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct inmem_hash, hash_node);
+   if (memcmp(hash->hash, entry->hash, hash_len) < 0)
+   p = &(*p)->rb_left;
+   else if (memcmp(hash->hash, entry->hash, hash_len) > 0)
+   p = &(*p)->rb_right;
+   else
+   return 1;
+   }
+   rb_link_node(>hash_node, parent, p);
+   rb_insert_color(>hash_node, root);
+   return 0;
+}
+
+static int inmem_insert_bytenr(struct rb_root *root,
+  struct inmem_hash *hash)
+{
+   struct rb_node **p = >rb_node;
+   struct rb_node *parent = NULL;
+   struct inmem_hash *entry = NULL;
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct inmem_hash, bytenr_node);
+   if (hash->bytenr < entry->bytenr)
+   p = &(*p)->rb_left;
+   else if (hash->bytenr > entry->bytenr)
+   p = &(*p)->rb_right;
+   else
+   return 1;
+   }
+   rb_link_node(>bytenr_node, parent, p);
+   rb_insert_color(>bytenr_node, root);
+   return 0;
+}
+
+static void __inmem_del(struct btrfs_dedup_info *dedup_info,
+   struct inmem_hash *hash)
+{
+   list_del(>lru_list);
+   rb_erase(>hash_node, _info->hash_root);
+   rb_erase(>bytenr_node, _info->bytenr_root);
+
+   if (!WARN_ON(dedup_info->current_nr == 0))
+   dedup_info->current_nr--;
+
+   kfree(hash);
+}
+
+/*
+ * Insert a hash into in-memory dedup tree
+ * Will remove exceeding last recent use hash.
+ *
+ * If the hash mathced with existing one, we won't insert it, to
+ * save memory
+ */
+static int inmem_add(struct btrfs_dedup_info *dedup_info,
+struct btrfs_dedup_hash *hash)
+{
+   int ret = 0;
+   u16 type = dedup_info->hash_type;
+   struct inmem_hash *ihash;
+
+   ihash = inmem_alloc_hash(type);
+
+   if (!ihash)
+   return -ENOMEM;
+
+   /* Copy the data out */
+   ihash->bytenr = hash->bytenr;
+   ihash->num_bytes = hash->num_bytes;
+   memcpy(ihash->hash, hash->hash, btrfs_dedup_sizes[type]);
+
+   mutex_lock(_info->lock);
+
+   ret = inmem_insert_bytenr(_info->bytenr_root, ihash);
+   if (ret > 0) {
+   kfree(ihash);
+   ret = 0;
+   goto out;
+   }
+
+   ret = inmem_insert_hash(_info->hash_root, ihash,
+   btrfs_dedup_sizes[type]);
+   if (ret > 0) {
+   /*
+* We only keep one hash in tree to save memory, so if
+* hash conflicts, free the one to insert.
+*/
+   rb_erase(>bytenr_node, _info->bytenr_root);
+   kfree(ihash);
+   ret = 0;
+   goto out;
+   }
+
+   list_add(>lru_list, _info->lru_list);
+   dedup_info->current_nr++;
+
+   /* Remove the last dedup hash if we exceed limit */
+   while (dedup_info->current_nr > dedup_info->limit_nr) {
+   struct inmem_hash *last;
+
+   last = list_entry(dedup_info->lru_list.prev,
+ struct inmem_hash, lru_list);
+   __inmem_del(dedup_info, last);
+   }
+out:
+   mutex_unlock(_info->lock);
+   

[PATCH v7 04/20] btrfs: dedup: Introduce function to remove hash from in-memory tree

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce static function inmem_del() to remove hash from in-memory
dedup tree.
And implement btrfs_dedup_del() and btrfs_dedup_destroy() interfaces.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/dedup.c | 105 +++
 1 file changed, 105 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 5fa15ae..259b32d 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -257,3 +257,108 @@ int btrfs_dedup_add(struct btrfs_trans_handle *trans,
return inmem_add(dedup_info, hash);
return -EINVAL;
 }
+
+static struct inmem_hash *
+inmem_search_bytenr(struct btrfs_dedup_info *dedup_info, u64 bytenr)
+{
+   struct rb_node **p = _info->bytenr_root.rb_node;
+   struct rb_node *parent = NULL;
+   struct inmem_hash *entry = NULL;
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct inmem_hash, bytenr_node);
+
+   if (bytenr < entry->bytenr)
+   p = &(*p)->rb_left;
+   else if (bytenr > entry->bytenr)
+   p = &(*p)->rb_right;
+   else
+   return entry;
+   }
+
+   return NULL;
+}
+
+/* Delete a hash from in-memory dedup tree */
+static int inmem_del(struct btrfs_dedup_info *dedup_info, u64 bytenr)
+{
+   struct inmem_hash *hash;
+
+   mutex_lock(_info->lock);
+   hash = inmem_search_bytenr(dedup_info, bytenr);
+   if (!hash) {
+   mutex_unlock(_info->lock);
+   return 0;
+   }
+
+   __inmem_del(dedup_info, hash);
+   mutex_unlock(_info->lock);
+   return 0;
+}
+
+/* Remove a dedup hash from dedup tree */
+int btrfs_dedup_del(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info, u64 bytenr)
+{
+   struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+
+   if (!fs_info->dedup_enabled)
+   return 0;
+
+   if (WARN_ON(dedup_info == NULL))
+   return -EINVAL;
+
+   if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
+   return inmem_del(dedup_info, bytenr);
+   return -EINVAL;
+}
+
+static void inmem_destroy(struct btrfs_dedup_info *dedup_info)
+{
+   struct inmem_hash *entry, *tmp;
+
+   mutex_lock(_info->lock);
+   list_for_each_entry_safe(entry, tmp, _info->lru_list, lru_list)
+   __inmem_del(dedup_info, entry);
+   mutex_unlock(_info->lock);
+}
+
+int btrfs_dedup_disable(struct btrfs_fs_info *fs_info)
+{
+   struct btrfs_dedup_info *dedup_info;
+   int ret;
+
+   /* Here we don't want to increase refs of dedup_info */
+   fs_info->dedup_enabled = 0;
+
+   dedup_info = fs_info->dedup_info;
+
+   if (!dedup_info)
+   return 0;
+
+   /* Don't allow disable status change in RO mount */
+   if (fs_info->sb->s_flags & MS_RDONLY)
+   return -EROFS;
+
+   /*
+* Wait for all unfinished write to complete dedup routine
+* As disable operation is not a frequent operation, we are
+* OK to use heavy but safe sync_filesystem().
+*/
+   down_read(_info->sb->s_umount);
+   ret = sync_filesystem(fs_info->sb);
+   up_read(_info->sb->s_umount);
+   if (ret < 0)
+   return ret;
+
+   fs_info->dedup_info = NULL;
+
+   /* now we are OK to clean up everything */
+   if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
+   inmem_destroy(dedup_info);
+
+   crypto_free_shash(dedup_info->dedup_driver);
+   kfree(dedup_info);
+   return 0;
+}
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 19/20] btrfs: try more times to alloc metadata reserve space

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

In btrfs_delalloc_reserve_metadata(), the number of metadata bytes we try
to reserve is calculated by the difference between outstanding_extents and
reserved_extents.

When reserve_metadata_bytes() fails to reserve desited metadata space,
it has already done some reclaim work, such as write ordered extents.

In that case, outstanding_extents and reserved_extents may already
changed, and we may reserve enough metadata space then.

So this patch will try to call reserve_metadata_bytes() at most 3 times
to ensure we really run out of space.

Such false ENOSPC is mainly caused by small file extents and time
consuming delalloc functions, which mainly affects in-band
de-duplication. (Compress should also be affected, but LZO/zlib is
faster than SHA256, so still harder to trigger than dedup).

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/extent-tree.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 2a17c88..c60e24a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5669,6 +5669,7 @@ int btrfs_delalloc_reserve_metadata(struct inode *inode, 
u64 num_bytes)
bool delalloc_lock = true;
u64 to_free = 0;
unsigned dropped;
+   int loops = 0;
 
/* If we are a free space inode we need to not flush since we will be in
 * the middle of a transaction commit.  We also don't need the delalloc
@@ -5684,11 +5685,12 @@ int btrfs_delalloc_reserve_metadata(struct inode 
*inode, u64 num_bytes)
btrfs_transaction_in_commit(root->fs_info))
schedule_timeout(1);
 
+   num_bytes = ALIGN(num_bytes, root->sectorsize);
+
+again:
if (delalloc_lock)
mutex_lock(_I(inode)->delalloc_mutex);
 
-   num_bytes = ALIGN(num_bytes, root->sectorsize);
-
spin_lock(_I(inode)->lock);
nr_extents = (unsigned)div64_u64(num_bytes +
 BTRFS_MAX_EXTENT_SIZE - 1,
@@ -5809,6 +5811,23 @@ out_fail:
}
if (delalloc_lock)
mutex_unlock(_I(inode)->delalloc_mutex);
+   /*
+* The number of metadata bytes is calculated by the difference
+* between outstanding_extents and reserved_extents. Sometimes though
+* reserve_metadata_bytes() fails to reserve the wanted metadata bytes,
+* indeed it has already done some work to reclaim metadata space, hence
+* both outstanding_extents and reserved_extents would have changed and
+* the bytes we try to reserve would also has changed(may be smaller).
+* So here we try to reserve again. This is much useful for online
+* dedup, which will easily eat almost all meta space.
+*
+* XXX: Indeed here 3 is arbitrarily choosed, it's a good workaround for
+* online dedup, later we should find a better method to avoid dedup
+* enospc issue.
+*/
+   if (unlikely(ret == -ENOSPC && loops++ < 3))
+   goto again;
+
return ret;
 }
 
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 18/20] btrfs: dedup: add per-file online dedup control

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce inode_need_dedup() to implement per-file online dedup control.

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/inode.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 92b3bdd..07c8f89 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -676,6 +676,18 @@ static void free_async_extent_pages(struct async_extent 
*async_extent)
async_extent->pages = NULL;
 }
 
+static inline int inode_need_dedup(struct btrfs_fs_info *fs_info,
+  struct inode *inode)
+{
+   if (!fs_info->dedup_enabled)
+   return 0;
+
+   if (BTRFS_I(inode)->flags & BTRFS_INODE_NODEDUP)
+   return 0;
+
+   return 1;
+}
+
 /*
  * phase two of compressed writeback.  This is the ordered portion
  * of the code, which only gets called in the order the work was
@@ -1637,7 +1649,8 @@ static int run_delalloc_range(struct inode *inode, struct 
page *locked_page,
} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
ret = run_delalloc_nocow(inode, locked_page, start, end,
 page_started, 0, nr_written);
-   } else if (!inode_need_compress(inode) && !fs_info->dedup_enabled) {
+   } else if (!inode_need_compress(inode) &&
+  !inode_need_dedup(fs_info, inode)) {
ret = cow_file_range(inode, locked_page, start, end,
  page_started, nr_written, 1, NULL);
} else {
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 07/20] btrfs: dedup: Implement btrfs_dedup_calc_hash interface

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Unlike in-memory or on-disk dedup method, only SHA256 hash method is
supported yet, so implement btrfs_dedup_calc_hash() interface using
SHA256.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/dedup.c | 49 +
 1 file changed, 49 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index dbcfcc9..9777355 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -543,3 +543,52 @@ int btrfs_dedup_search(struct btrfs_fs_info *fs_info,
}
return ret;
 }
+
+int btrfs_dedup_calc_hash(struct btrfs_fs_info *fs_info,
+ struct inode *inode, u64 start,
+ struct btrfs_dedup_hash *hash)
+{
+   int i;
+   int ret;
+   struct page *p;
+   struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+   struct crypto_shash *tfm = dedup_info->dedup_driver;
+   struct {
+   struct shash_desc desc;
+   char ctx[crypto_shash_descsize(tfm)];
+   } sdesc;
+   u64 dedup_bs;
+   u64 sectorsize = BTRFS_I(inode)->root->sectorsize;
+
+   if (!fs_info->dedup_enabled || !hash)
+   return 0;
+
+   if (WARN_ON(dedup_info == NULL))
+   return -EINVAL;
+
+   WARN_ON(!IS_ALIGNED(start, sectorsize));
+
+   dedup_bs = dedup_info->blocksize;
+
+   sdesc.desc.tfm = tfm;
+   sdesc.desc.flags = 0;
+   ret = crypto_shash_init();
+   if (ret)
+   return ret;
+   for (i = 0; sectorsize * i < dedup_bs; i++) {
+   char *d;
+
+   p = find_get_page(inode->i_mapping,
+ (start >> PAGE_CACHE_SHIFT) + i);
+   if (WARN_ON(!p))
+   return -ENOENT;
+   d = kmap(p);
+   ret = crypto_shash_update(, d, sectorsize);
+   kunmap(p);
+   page_cache_release(p);
+   if (ret)
+   return ret;
+   }
+   ret = crypto_shash_final(, hash->hash);
+   return ret;
+}
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 10/20] btrfs: dedup: Add basic tree structure for on-disk dedup method

2016-02-17 Thread Qu Wenruo
Introduce a new tree, dedup tree to record on-disk dedup hash.
As a persist hash storage instead of in-memeory only implement.

Unlike Liu Bo's implement, in this version we won't do hack for
bytenr -> hash search, but add a new type, DEDUP_BYTENR_ITEM for such
search case, just like in-memory backend.

Signed-off-by: Liu Bo 
Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/ctree.h | 67 +++-
 fs/btrfs/dedup.h |  5 
 fs/btrfs/disk-io.c   |  1 +
 include/trace/events/btrfs.h |  3 +-
 4 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 094db5c..174af5c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -100,6 +100,9 @@ struct btrfs_ordered_sum;
 /* tracks free space in block groups. */
 #define BTRFS_FREE_SPACE_TREE_OBJECTID 10ULL
 
+/* on-disk dedup tree (EXPERIMENTAL) */
+#define BTRFS_DEDUP_TREE_OBJECTID 11ULL
+
 /* device stats in the device tree */
 #define BTRFS_DEV_STATS_OBJECTID 0ULL
 
@@ -508,6 +511,7 @@ struct btrfs_super_block {
  * ones specified below then we will fail to mount
  */
 #define BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE(1ULL << 0)
+#define BTRFS_FEATURE_COMPAT_RO_DEDUP  (1ULL << 1)
 
 #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF   (1ULL << 0)
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL  (1ULL << 1)
@@ -537,7 +541,8 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_COMPAT_SAFE_CLEAR0ULL
 
 #define BTRFS_FEATURE_COMPAT_RO_SUPP   \
-   (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE)
+   (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |  \
+BTRFS_FEATURE_COMPAT_RO_DEDUP)
 
 #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET   0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR 0ULL
@@ -967,6 +972,46 @@ struct btrfs_csum_item {
u8 csum;
 } __attribute__ ((__packed__));
 
+/*
+ * Objectid: 0
+ * Type: BTRFS_DEDUP_STATUS_ITEM_KEY
+ * Offset: 0
+ */
+struct btrfs_dedup_status_item {
+   __le64 blocksize;
+   __le64 limit_nr;
+   __le16 hash_type;
+   __le16 backend;
+} __attribute__ ((__packed__));
+
+/*
+ * Objectid: Last 64 bit of the hash
+ * Type: BTRFS_DEDUP_HASH_ITEM_KEY
+ * Offset: Bytenr of the hash
+ *
+ * Used for hash <-> bytenr search
+ * XXX: On-disk format not stable yet, see the unsed one
+ */
+struct btrfs_dedup_hash_item {
+   /* on disk length of dedup range */
+   __le64 len;
+
+   /* Spare space */
+   u8 __unused[16];
+
+   /* Hash follows */
+} __attribute__ ((__packed__));
+
+/*
+ * Objectid: bytenr
+ * Type: BTRFS_DEDUP_BYTENR_ITEM_KEY
+ * offset: Last 64 bit of the hash
+ *
+ * Used for bytenr <-> hash search (for free_extent)
+ * all its content is hash.
+ * So no special item struct is needed.
+ */
+
 struct btrfs_dev_stats_item {
/*
 * grow this item struct at the end for future enhancements and keep
@@ -2173,6 +2218,13 @@ struct btrfs_ioctl_defrag_range_args {
 #define BTRFS_CHUNK_ITEM_KEY   228
 
 /*
+ * Dedup item and status
+ */
+#define BTRFS_DEDUP_STATUS_ITEM_KEY230
+#define BTRFS_DEDUP_HASH_ITEM_KEY  231
+#define BTRFS_DEDUP_BYTENR_ITEM_KEY232
+
+/*
  * Records the overall state of the qgroups.
  * There's only one instance of this key present,
  * (0, BTRFS_QGROUP_STATUS_KEY, 0)
@@ -3269,6 +3321,19 @@ static inline unsigned long btrfs_leaf_data(struct 
extent_buffer *l)
return offsetof(struct btrfs_leaf, items);
 }
 
+/* btrfs_dedup_status */
+BTRFS_SETGET_FUNCS(dedup_status_blocksize, struct btrfs_dedup_status_item,
+  blocksize, 64);
+BTRFS_SETGET_FUNCS(dedup_status_limit, struct btrfs_dedup_status_item,
+  limit_nr, 64);
+BTRFS_SETGET_FUNCS(dedup_status_hash_type, struct btrfs_dedup_status_item,
+  hash_type, 16);
+BTRFS_SETGET_FUNCS(dedup_status_backend, struct btrfs_dedup_status_item,
+  backend, 16);
+
+/* btrfs_dedup_hash_item */
+BTRFS_SETGET_FUNCS(dedup_hash_len, struct btrfs_dedup_hash_item, len, 64);
+
 /* struct btrfs_file_extent_item */
 BTRFS_SETGET_FUNCS(file_extent_type, struct btrfs_file_extent_item, type, 8);
 BTRFS_SETGET_STACK_FUNCS(stack_file_extent_disk_bytenr,
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
index 83a7512..4f681bb 100644
--- a/fs/btrfs/dedup.h
+++ b/fs/btrfs/dedup.h
@@ -58,6 +58,8 @@ struct btrfs_dedup_hash {
u8 hash[];
 };
 
+struct btrfs_root;
+
 struct btrfs_dedup_info {
/* dedup blocksize */
u64 blocksize;
@@ -73,6 +75,9 @@ struct btrfs_dedup_info {
struct list_head lru_list;
u64 limit_nr;
u64 current_nr;
+
+   /* for persist data like dedup-hash and dedup status */
+   struct btrfs_root *dedup_root;
 };
 
 struct btrfs_trans_handle;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index bbc17f2..baefe33 100644
--- 

[PATCH v7 15/20] btrfs: dedup: Add ioctl for inband deduplication

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Add ioctl interface for inband deduplication, which includes:
1) enable
2) disable
3) status

We will later add ioctl to disable inband dedup for given file/dir.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/dedup.c   | 48 ++
 fs/btrfs/dedup.h   | 10 +++-
 fs/btrfs/ioctl.c   | 64 ++
 fs/btrfs/sysfs.c   |  2 ++
 include/uapi/linux/btrfs.h | 24 +
 5 files changed, 142 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 8972cff..b7ecb8f 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -135,12 +135,12 @@ out:
 }
 
 int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
-  u64 blocksize, u64 limit_nr)
+  u64 blocksize, u64 limit_nr, u64 limit_mem)
 {
struct btrfs_dedup_info *dedup_info;
int create_tree;
u64 compat_ro_flag = btrfs_super_compat_ro_flags(fs_info->super_copy);
-   u64 limit = limit_nr;
+   u64 limit;
int ret = 0;
 
/* Sanity check */
@@ -153,11 +153,22 @@ int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 
type, u16 backend,
return -EINVAL;
if (backend >= BTRFS_DEDUP_BACKEND_LAST)
return -EINVAL;
+   /* Only one limit is accept */
+   if (limit_nr && limit_mem)
+   return -EINVAL;
 
-   if (backend == BTRFS_DEDUP_BACKEND_INMEMORY && limit_nr == 0)
-   limit = 4096; /* default value */
-   if (backend == BTRFS_DEDUP_BACKEND_ONDISK && limit_nr != 0)
+   if (backend == BTRFS_DEDUP_BACKEND_INMEMORY) {
+   if (!limit_nr && !limit_mem)
+   limit = BTRFS_DEDUP_LIMIT_NR_DEFAULT;
+   else if (limit_nr)
+   limit = limit_nr;
+   else
+   limit = limit_mem / (sizeof(struct inmem_hash) +
+   btrfs_dedup_sizes[type]);
+   }
+   if (backend == BTRFS_DEDUP_BACKEND_ONDISK)
limit = 0;
+
/* Ondisk backend needs DEDUP RO compat feature */
if (!(compat_ro_flag & BTRFS_FEATURE_COMPAT_RO_DEDUP) &&
backend == BTRFS_DEDUP_BACKEND_ONDISK)
@@ -208,6 +219,33 @@ out:
return ret;
 }
 
+void btrfs_dedup_status(struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_dedup_args *dargs)
+{
+   struct btrfs_dedup_info *dedup_info = fs_info->dedup_info;
+
+   if (!fs_info->dedup_enabled || !dedup_info) {
+   dargs->status = 0;
+   dargs->blocksize = 0;
+   dargs->backend = 0;
+   dargs->hash_type = 0;
+   dargs->limit_nr = 0;
+   dargs->current_nr = 0;
+   return;
+   }
+   mutex_lock(_info->lock);
+   dargs->status = 1;
+   dargs->blocksize = dedup_info->blocksize;
+   dargs->backend = dedup_info->backend;
+   dargs->hash_type = dedup_info->hash_type;
+   dargs->limit_nr = dedup_info->limit_nr;
+   dargs->limit_mem = dedup_info->limit_nr *
+   (sizeof(struct inmem_hash) +
+btrfs_dedup_sizes[dedup_info->hash_type]);
+   dargs->current_nr = dedup_info->current_nr;
+   mutex_unlock(_info->lock);
+}
+
 int btrfs_dedup_resume(struct btrfs_fs_info *fs_info,
   struct btrfs_root *dedup_root)
 {
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
index a580072..8a99a7f 100644
--- a/fs/btrfs/dedup.h
+++ b/fs/btrfs/dedup.h
@@ -100,7 +100,15 @@ static inline struct btrfs_dedup_hash 
*btrfs_dedup_alloc_hash(u16 type)
  * Called at dedup enable time.
  */
 int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
-  u64 blocksize, u64 limit_nr);
+  u64 blocksize, u64 limit_nr, u64 limit_mem);
+
+/*
+ * Get inband dedup info
+ * Since it needs to access different backends' hash size, which
+ * is not exported, we need such simple function.
+ */
+void btrfs_dedup_status(struct btrfs_fs_info *fs_info,
+   struct btrfs_ioctl_dedup_args *dargs);
 
 /*
  * Disable dedup and invalidate all its dedup data.
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 77c61b4..eb37a3d 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -59,6 +59,7 @@
 #include "props.h"
 #include "sysfs.h"
 #include "qgroup.h"
+#include "dedup.h"
 
 #ifdef CONFIG_64BIT
 /* If we have a 32-bit userspace and 64-bit kernel, then the UAPI
@@ -3257,6 +3258,67 @@ ssize_t btrfs_dedupe_file_range(struct file *src_file, 
u64 loff, u64 olen,
return olen;
 }
 
+static long btrfs_ioctl_dedup_ctl(struct btrfs_root *root, void __user *args)
+{
+   struct btrfs_ioctl_dedup_args *dargs;
+   struct btrfs_fs_info *fs_info = 

[PATCH v7 06/20] btrfs: dedup: Introduce function to search for an existing hash

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce static function inmem_search() to handle the job for in-memory
hash tree.

The trick is, we must ensure the delayed ref head is not being run at
the time we search the for the hash.

With inmem_search(), we can implement the btrfs_dedup_search()
interface.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/dedup.c | 180 +++
 1 file changed, 180 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index ed18fc9..dbcfcc9 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -363,3 +363,183 @@ int btrfs_dedup_disable(struct btrfs_fs_info *fs_info)
kfree(dedup_info);
return 0;
 }
+
+/*
+ * Caller must ensure the corresponding ref head is not being run.
+ */
+static struct inmem_hash *
+inmem_search_hash(struct btrfs_dedup_info *dedup_info, u8 *hash)
+{
+   struct rb_node **p = _info->hash_root.rb_node;
+   struct rb_node *parent = NULL;
+   struct inmem_hash *entry = NULL;
+   u16 hash_type = dedup_info->hash_type;
+   int hash_len = btrfs_dedup_sizes[hash_type];
+
+   while (*p) {
+   parent = *p;
+   entry = rb_entry(parent, struct inmem_hash, hash_node);
+
+   if (memcmp(hash, entry->hash, hash_len) < 0) {
+   p = &(*p)->rb_left;
+   } else if (memcmp(hash, entry->hash, hash_len) > 0) {
+   p = &(*p)->rb_right;
+   } else {
+   /* Found, need to re-add it to LRU list head */
+   list_del(>lru_list);
+   list_add(>lru_list, _info->lru_list);
+   return entry;
+   }
+   }
+   return NULL;
+}
+
+static int inmem_search(struct btrfs_dedup_info *dedup_info,
+   struct inode *inode, u64 file_pos,
+   struct btrfs_dedup_hash *hash)
+{
+   int ret;
+   struct btrfs_root *root = BTRFS_I(inode)->root;
+   struct btrfs_trans_handle *trans;
+   struct btrfs_delayed_ref_root *delayed_refs;
+   struct btrfs_delayed_ref_head *head;
+   struct btrfs_delayed_ref_head *insert_head;
+   struct btrfs_delayed_data_ref *insert_dref;
+   struct btrfs_qgroup_extent_record *insert_qrecord = NULL;
+   struct inmem_hash *found_hash;
+   int free_insert = 1;
+   u64 bytenr;
+   u32 num_bytes;
+
+   insert_head = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
+   if (!insert_head)
+   return -ENOMEM;
+   insert_head->extent_op = NULL;
+   insert_dref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
+   if (!insert_dref) {
+   kmem_cache_free(btrfs_delayed_ref_head_cachep, insert_head);
+   return -ENOMEM;
+   }
+   if (root->fs_info->quota_enabled &&
+   is_fstree(root->root_key.objectid)) {
+   insert_qrecord = kmalloc(sizeof(*insert_qrecord), GFP_NOFS);
+   if (!insert_qrecord) {
+   kmem_cache_free(btrfs_delayed_ref_head_cachep,
+   insert_head);
+   kmem_cache_free(btrfs_delayed_data_ref_cachep,
+   insert_dref);
+   return -ENOMEM;
+   }
+   }
+
+   trans = btrfs_join_transaction(root);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto free_mem;
+   }
+
+again:
+   mutex_lock(_info->lock);
+   found_hash = inmem_search_hash(dedup_info, hash->hash);
+   /* If we don't find a duplicated extent, just return. */
+   if (!found_hash) {
+   ret = 0;
+   goto out;
+   }
+   bytenr = found_hash->bytenr;
+   num_bytes = found_hash->num_bytes;
+
+   delayed_refs = >transaction->delayed_refs;
+
+   spin_lock(_refs->lock);
+   head = btrfs_find_delayed_ref_head(trans, bytenr);
+   if (!head) {
+   /*
+* We can safely insert a new delayed_ref as long as we
+* hold delayed_refs->lock.
+* Only need to use atomic inc_extent_ref()
+*/
+   btrfs_add_delayed_data_ref_locked(root->fs_info, trans,
+   insert_dref, insert_head, insert_qrecord,
+   bytenr, num_bytes, 0, root->root_key.objectid,
+   btrfs_ino(inode), file_pos, 0,
+   BTRFS_ADD_DELAYED_REF);
+   spin_unlock(_refs->lock);
+
+   /* add_delayed_data_ref_locked will free unused memory */
+   free_insert = 0;
+   hash->bytenr = bytenr;
+   hash->num_bytes = num_bytes;
+   ret = 1;
+   goto out;
+   }
+
+   /*
+

[PATCH v7 13/20] btrfs: dedup: Add support to delete hash for on-disk backend

2016-02-17 Thread Qu Wenruo
Now on-disk backend can delete hash now.

Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/dedup.c | 100 +++
 1 file changed, 100 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 61a5d7a..59b32a5 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -467,6 +467,104 @@ static int inmem_del(struct btrfs_dedup_info *dedup_info, 
u64 bytenr)
return 0;
 }
 
+/*
+ * If prepare_del is given, this will setup search_slot() for delete.
+ * Caller needs to do proper locking.
+ *
+ * Return > 0 for found.
+ * Return 0 for not found.
+ * Return < 0 for error.
+ */
+static int ondisk_search_bytenr(struct btrfs_trans_handle *trans,
+   struct btrfs_dedup_info *dedup_info,
+   struct btrfs_path *path, u64 bytenr,
+   int prepare_del)
+{
+   struct btrfs_key key;
+   struct btrfs_root *dedup_root = dedup_info->dedup_root;
+   int ret;
+   int ins_len = 0;
+   int cow = 0;
+
+   if (prepare_del) {
+   if (WARN_ON(trans == NULL))
+   return -EINVAL;
+   cow = 1;
+   ins_len = -1;
+   }
+
+   key.objectid = bytenr;
+   key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
+   key.offset = (u64)-1;
+
+   ret = btrfs_search_slot(trans, dedup_root, , path,
+   ins_len, cow);
+
+   if (ret < 0)
+   return ret;
+   /*
+* Although it's almost impossible, it's still possible that
+* the last 64bits are all 1.
+*/
+   if (ret == 0)
+   return 1;
+
+   ret = btrfs_previous_item(dedup_root, path, bytenr,
+ BTRFS_DEDUP_BYTENR_ITEM_KEY);
+   if (ret < 0)
+   return ret;
+   if (ret > 0)
+   return 0;
+   return 1;
+}
+
+static int ondisk_del(struct btrfs_trans_handle *trans,
+ struct btrfs_dedup_info *dedup_info, u64 bytenr)
+{
+   struct btrfs_root *dedup_root = dedup_info->dedup_root;
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   key.objectid = bytenr;
+   key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
+   key.offset = 0;
+
+   mutex_lock(_info->lock);
+
+   ret = ondisk_search_bytenr(trans, dedup_info, path, bytenr, 1);
+   if (ret <= 0)
+   goto out;
+
+   btrfs_item_key_to_cpu(path->nodes[0], , path->slots[0]);
+   ret = btrfs_del_item(trans, dedup_root, path);
+   btrfs_release_path(path);
+   if (ret < 0)
+   goto out;
+   /* Search for hash item and delete it */
+   key.objectid = key.offset;
+   key.type = BTRFS_DEDUP_HASH_ITEM_KEY;
+   key.offset = bytenr;
+
+   ret = btrfs_search_slot(trans, dedup_root, , path, -1, 1);
+   if (WARN_ON(ret > 0)) {
+   ret = -ENOENT;
+   goto out;
+   }
+   if (ret < 0)
+   goto out;
+   ret = btrfs_del_item(trans, dedup_root, path);
+
+out:
+   btrfs_free_path(path);
+   mutex_unlock(_info->lock);
+   return ret;
+}
+
 /* Remove a dedup hash from dedup tree */
 int btrfs_dedup_del(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info, u64 bytenr)
@@ -481,6 +579,8 @@ int btrfs_dedup_del(struct btrfs_trans_handle *trans,
 
if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
return inmem_del(dedup_info, bytenr);
+   if (dedup_info->backend == BTRFS_DEDUP_BACKEND_ONDISK)
+   return ondisk_del(trans, dedup_info, bytenr);
return -EINVAL;
 }
 
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 20/20] btrfs: dedup: Fix a bug when running inband dedup with balance

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

When running inband dedup with balance, it's possible that inband dedup
still increase ref on extents which are in RO chunk.

This may cause either find_data_references() gives warning, or make
run_delayed_refs() return -EIO and cause trans abort.

The cause is, normal dedup_del() is only called at run_delayed_ref()
time, which is too late for balance case.

This patch fixes this bug by calling dedup_del() at extent searching
time of balance.

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/relocation.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 2bd0011..913a7b1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -31,6 +31,7 @@
 #include "async-thread.h"
 #include "free-space-cache.h"
 #include "inode-map.h"
+#include "dedup.h"
 
 /*
  * backref_node, mapping_node and tree_block start with this
@@ -3909,6 +3910,7 @@ static noinline_for_stack int relocate_block_group(struct 
reloc_control *rc)
struct btrfs_trans_handle *trans = NULL;
struct btrfs_path *path;
struct btrfs_extent_item *ei;
+   struct btrfs_fs_info *fs_info = rc->extent_root->fs_info;
u64 flags;
u32 item_size;
int ret;
@@ -4039,6 +4041,17 @@ restart:
if (rc->stage == MOVE_DATA_EXTENTS &&
(flags & BTRFS_EXTENT_FLAG_DATA)) {
rc->found_file_extent = 1;
+   /*
+* This data extent will be replaced, but normal
+* dedup_del() will only happen at run_delayed_ref()
+* time, which is too late, so delete dedup hash early
+* to prevent its ref get increased.
+*/
+   ret = btrfs_dedup_del(trans, fs_info, key.objectid);
+   if (ret < 0) {
+   err = ret;
+   break;
+   }
ret = relocate_data_extent(rc->data_inode,
   , >cluster);
if (ret < 0) {
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 14/20] btrfs: dedup: Add support for adding hash for on-disk backend

2016-02-17 Thread Qu Wenruo
Now on-disk backend can add hash now.

Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/dedup.c | 83 
 1 file changed, 83 insertions(+)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 59b32a5..8972cff 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -404,6 +404,87 @@ out:
return 0;
 }
 
+static int ondisk_search_bytenr(struct btrfs_trans_handle *trans,
+   struct btrfs_dedup_info *dedup_info,
+   struct btrfs_path *path, u64 bytenr,
+   int prepare_del);
+static int ondisk_search_hash(struct btrfs_dedup_info *dedup_info, u8 *hash,
+ u64 *bytenr_ret, u32 *num_bytes_ret);
+static int ondisk_add(struct btrfs_trans_handle *trans,
+ struct btrfs_dedup_info *dedup_info,
+ struct btrfs_dedup_hash *hash)
+{
+   struct btrfs_path *path;
+   struct btrfs_root *dedup_root = dedup_info->dedup_root;
+   struct btrfs_key key;
+   struct btrfs_dedup_hash_item *hash_item;
+   u64 bytenr;
+   u32 num_bytes;
+   int hash_len = btrfs_dedup_sizes[dedup_info->hash_type];
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   mutex_lock(_info->lock);
+
+   ret = ondisk_search_bytenr(NULL, dedup_info, path, hash->bytenr, 0);
+   if (ret < 0)
+   goto out;
+   if (ret > 0) {
+   ret = 0;
+   goto out;
+   }
+   btrfs_release_path(path);
+
+   ret = ondisk_search_hash(dedup_info, hash->hash, , _bytes);
+   if (ret < 0)
+   goto out;
+   /* Same hash found, don't re-add to save dedup tree space */
+   if (ret > 0) {
+   ret = 0;
+   goto out;
+   }
+
+   /* Insert hash->bytenr item */
+   memcpy(, hash->hash + hash_len - 8, 8);
+   key.type = BTRFS_DEDUP_HASH_ITEM_KEY;
+   key.offset = hash->bytenr;
+
+   ret = btrfs_insert_empty_item(trans, dedup_root, path, ,
+   sizeof(*hash_item) + hash_len);
+   WARN_ON(ret == -EEXIST);
+   if (ret < 0)
+   goto out;
+   hash_item = btrfs_item_ptr(path->nodes[0], path->slots[0],
+  struct btrfs_dedup_hash_item);
+   btrfs_set_dedup_hash_len(path->nodes[0], hash_item, hash->num_bytes);
+   write_extent_buffer(path->nodes[0], hash->hash,
+   (unsigned long)(hash_item + 1), hash_len);
+   btrfs_mark_buffer_dirty(path->nodes[0]);
+   btrfs_release_path(path);
+
+   /* Then bytenr->hash item */
+   key.objectid = hash->bytenr;
+   key.type = BTRFS_DEDUP_BYTENR_ITEM_KEY;
+   memcpy(, hash->hash + hash_len - 8, 8);
+
+   ret = btrfs_insert_empty_item(trans, dedup_root, path, , hash_len);
+   WARN_ON(ret == -EEXIST);
+   if (ret < 0)
+   goto out;
+   write_extent_buffer(path->nodes[0], hash->hash,
+   btrfs_item_ptr_offset(path->nodes[0], path->slots[0]),
+   hash_len);
+   btrfs_mark_buffer_dirty(path->nodes[0]);
+
+out:
+   mutex_unlock(_info->lock);
+   btrfs_free_path(path);
+   return ret;
+}
+
 int btrfs_dedup_add(struct btrfs_trans_handle *trans,
struct btrfs_fs_info *fs_info,
struct btrfs_dedup_hash *hash)
@@ -425,6 +506,8 @@ int btrfs_dedup_add(struct btrfs_trans_handle *trans,
 
if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY)
return inmem_add(dedup_info, hash);
+   if (dedup_info->backend == BTRFS_DEDUP_BACKEND_ONDISK)
+   return ondisk_add(trans, dedup_info, hash);
return -EINVAL;
 }
 
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 08/20] btrfs: ordered-extent: Add support for dedup

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Add ordered-extent support for dedup.

Note, current ordered-extent support only supports non-compressed source
extent.
Support for compressed source extent will be added later.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/ordered-data.c | 43 +++
 fs/btrfs/ordered-data.h | 13 +
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 8c27292..973014b 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -25,6 +25,7 @@
 #include "btrfs_inode.h"
 #include "extent_io.h"
 #include "disk-io.h"
+#include "dedup.h"
 
 static struct kmem_cache *btrfs_ordered_extent_cache;
 
@@ -183,7 +184,8 @@ static inline struct rb_node *tree_search(struct 
btrfs_ordered_inode_tree *tree,
  */
 static int __btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
  u64 start, u64 len, u64 disk_len,
- int type, int dio, int compress_type)
+ int type, int dio, int compress_type,
+ struct btrfs_dedup_hash *hash)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_ordered_inode_tree *tree;
@@ -203,6 +205,30 @@ static int __btrfs_add_ordered_extent(struct inode *inode, 
u64 file_offset,
entry->inode = igrab(inode);
entry->compress_type = compress_type;
entry->truncated_len = (u64)-1;
+   entry->hash = NULL;
+   /*
+* Hash hit must go through dedup routine or we will screw
+* delayed refs
+*/
+   if (hash && (hash->bytenr || root->fs_info->dedup_enabled)) {
+   struct btrfs_dedup_info *dedup_info = root->fs_info->dedup_info;
+
+   if (WARN_ON(dedup_info == NULL)) {
+   kmem_cache_free(btrfs_ordered_extent_cache,
+   entry);
+   return -EINVAL;
+   }
+   entry->hash = btrfs_dedup_alloc_hash(dedup_info->hash_type);
+   if (!entry->hash) {
+   kmem_cache_free(btrfs_ordered_extent_cache, entry);
+   return -ENOMEM;
+   }
+   entry->hash->bytenr = hash->bytenr;
+   entry->hash->num_bytes = hash->num_bytes;
+   memcpy(entry->hash->hash, hash->hash,
+  btrfs_dedup_sizes[dedup_info->hash_type]);
+   }
+
if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE)
set_bit(type, >flags);
 
@@ -249,15 +275,23 @@ int btrfs_add_ordered_extent(struct inode *inode, u64 
file_offset,
 {
return __btrfs_add_ordered_extent(inode, file_offset, start, len,
  disk_len, type, 0,
- BTRFS_COMPRESS_NONE);
+ BTRFS_COMPRESS_NONE, NULL);
 }
 
+int btrfs_add_ordered_extent_dedup(struct inode *inode, u64 file_offset,
+  u64 start, u64 len, u64 disk_len, int type,
+  struct btrfs_dedup_hash *hash)
+{
+   return __btrfs_add_ordered_extent(inode, file_offset, start, len,
+ disk_len, type, 0,
+ BTRFS_COMPRESS_NONE, hash);
+}
 int btrfs_add_ordered_extent_dio(struct inode *inode, u64 file_offset,
 u64 start, u64 len, u64 disk_len, int type)
 {
return __btrfs_add_ordered_extent(inode, file_offset, start, len,
  disk_len, type, 1,
- BTRFS_COMPRESS_NONE);
+ BTRFS_COMPRESS_NONE, NULL);
 }
 
 int btrfs_add_ordered_extent_compress(struct inode *inode, u64 file_offset,
@@ -266,7 +300,7 @@ int btrfs_add_ordered_extent_compress(struct inode *inode, 
u64 file_offset,
 {
return __btrfs_add_ordered_extent(inode, file_offset, start, len,
  disk_len, type, 0,
- compress_type);
+ compress_type, NULL);
 }
 
 /*
@@ -576,6 +610,7 @@ void btrfs_put_ordered_extent(struct btrfs_ordered_extent 
*entry)
list_del(>list);
kfree(sum);
}
+   kfree(entry->hash);
kmem_cache_free(btrfs_ordered_extent_cache, entry);
}
 }
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 23c9605..58519ce 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -139,6 +139,16 @@ struct btrfs_ordered_extent {
struct completion completion;
struct 

[PATCH v7 11/20] btrfs: dedup: Introduce interfaces to resume and cleanup dedup info

2016-02-17 Thread Qu Wenruo
Since we will introduce a new on-disk based dedup method, introduce new
interfaces to resume previous dedup setup.

And since we introduce a new tree for status, also add disable handler
for it.

Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/dedup.c   | 270 +
 fs/btrfs/dedup.h   |  13 +++
 fs/btrfs/disk-io.c |  21 -
 fs/btrfs/disk-io.h |   1 +
 4 files changed, 283 insertions(+), 22 deletions(-)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 9777355..ad7b7e1 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -21,6 +21,8 @@
 #include "transaction.h"
 #include "delayed-ref.h"
 #include "qgroup.h"
+#include "disk-io.h"
+#include "locking.h"
 
 struct inmem_hash {
struct rb_node hash_node;
@@ -41,10 +43,103 @@ static inline struct inmem_hash *inmem_alloc_hash(u16 type)
GFP_NOFS);
 }
 
+static int init_dedup_info(struct btrfs_dedup_info **ret_info, u16 type,
+  u16 backend, u64 blocksize, u64 limit)
+{
+   struct btrfs_dedup_info *dedup_info;
+
+   dedup_info = kzalloc(sizeof(*dedup_info), GFP_NOFS);
+   if (!dedup_info)
+   return -ENOMEM;
+
+   dedup_info->hash_type = type;
+   dedup_info->backend = backend;
+   dedup_info->blocksize = blocksize;
+   dedup_info->limit_nr = limit;
+
+   /* only support SHA256 yet */
+   dedup_info->dedup_driver = crypto_alloc_shash("sha256", 0, 0);
+   if (IS_ERR(dedup_info->dedup_driver)) {
+   int ret;
+
+   ret = PTR_ERR(dedup_info->dedup_driver);
+   kfree(dedup_info);
+   return ret;
+   }
+
+   dedup_info->hash_root = RB_ROOT;
+   dedup_info->bytenr_root = RB_ROOT;
+   dedup_info->current_nr = 0;
+   INIT_LIST_HEAD(_info->lru_list);
+   mutex_init(_info->lock);
+
+   *ret_info = dedup_info;
+   return 0;
+}
+
+static int init_dedup_tree(struct btrfs_fs_info *fs_info,
+  struct btrfs_dedup_info *dedup_info)
+{
+   struct btrfs_root *dedup_root;
+   struct btrfs_key key;
+   struct btrfs_path *path;
+   struct btrfs_dedup_status_item *status;
+   struct btrfs_trans_handle *trans;
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   trans = btrfs_start_transaction(fs_info->tree_root, 2);
+   if (IS_ERR(trans)) {
+   ret = PTR_ERR(trans);
+   goto out;
+   }
+   dedup_root = btrfs_create_tree(trans, fs_info,
+  BTRFS_DEDUP_TREE_OBJECTID);
+   if (IS_ERR(dedup_root)) {
+   ret = PTR_ERR(dedup_root);
+   btrfs_abort_transaction(trans, fs_info->tree_root, ret);
+   goto out;
+   }
+   dedup_info->dedup_root = dedup_root;
+
+   key.objectid = 0;
+   key.type = BTRFS_DEDUP_STATUS_ITEM_KEY;
+   key.offset = 0;
+
+   ret = btrfs_insert_empty_item(trans, dedup_root, path, ,
+ sizeof(*status));
+   if (ret < 0) {
+   btrfs_abort_transaction(trans, fs_info->tree_root, ret);
+   goto out;
+   }
+
+   status = btrfs_item_ptr(path->nodes[0], path->slots[0],
+   struct btrfs_dedup_status_item);
+   btrfs_set_dedup_status_blocksize(path->nodes[0], status,
+dedup_info->blocksize);
+   btrfs_set_dedup_status_limit(path->nodes[0], status,
+   dedup_info->limit_nr);
+   btrfs_set_dedup_status_hash_type(path->nodes[0], status,
+   dedup_info->hash_type);
+   btrfs_set_dedup_status_backend(path->nodes[0], status,
+   dedup_info->backend);
+   btrfs_mark_buffer_dirty(path->nodes[0]);
+out:
+   btrfs_free_path(path);
+   if (ret == 0)
+   btrfs_commit_transaction(trans, fs_info->tree_root);
+   return ret;
+}
+
 int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
   u64 blocksize, u64 limit_nr)
 {
struct btrfs_dedup_info *dedup_info;
+   int create_tree;
+   u64 compat_ro_flag = btrfs_super_compat_ro_flags(fs_info->super_copy);
u64 limit = limit_nr;
int ret = 0;
 
@@ -63,6 +158,14 @@ int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 
type, u16 backend,
limit = 4096; /* default value */
if (backend == BTRFS_DEDUP_BACKEND_ONDISK && limit_nr != 0)
limit = 0;
+   /* Ondisk backend needs DEDUP RO compat feature */
+   if (!(compat_ro_flag & BTRFS_FEATURE_COMPAT_RO_DEDUP) &&
+   backend == BTRFS_DEDUP_BACKEND_ONDISK)
+   return -EOPNOTSUPP;
+
+   /* Meaningless and unable to enable dedup for RO fs */
+   if (fs_info->sb->s_flags & 

[PATCH v7 17/20] btrfs: dedup: add a property handler for online dedup

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

We use btrfs extended attribute "btrfs.dedup" to record per-file online
dedup status, so add a dedup property handler.

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/props.c | 40 
 1 file changed, 40 insertions(+)

diff --git a/fs/btrfs/props.c b/fs/btrfs/props.c
index f9e6023..fb82080 100644
--- a/fs/btrfs/props.c
+++ b/fs/btrfs/props.c
@@ -41,6 +41,10 @@ static int prop_compression_apply(struct inode *inode,
  size_t len);
 static const char *prop_compression_extract(struct inode *inode);
 
+static int prop_dedup_validate(const char *value, size_t len);
+static int prop_dedup_apply(struct inode *inode, const char *value, size_t 
len);
+static const char *prop_dedup_extract(struct inode *inode);
+
 static struct prop_handler prop_handlers[] = {
{
.xattr_name = XATTR_BTRFS_PREFIX "compression",
@@ -49,6 +53,13 @@ static struct prop_handler prop_handlers[] = {
.extract = prop_compression_extract,
.inheritable = 1
},
+   {
+   .xattr_name = XATTR_BTRFS_PREFIX "dedup",
+   .validate = prop_dedup_validate,
+   .apply = prop_dedup_apply,
+   .extract = prop_dedup_extract,
+   .inheritable = 1
+   },
 };
 
 void __init btrfs_props_init(void)
@@ -425,4 +436,33 @@ static const char *prop_compression_extract(struct inode 
*inode)
return NULL;
 }
 
+static int prop_dedup_validate(const char *value, size_t len)
+{
+   if (!strncmp("disable", value, len))
+   return 0;
+
+   return -EINVAL;
+}
+
+static int prop_dedup_apply(struct inode *inode, const char *value, size_t len)
+{
+   if (len == 0) {
+   BTRFS_I(inode)->flags &= ~BTRFS_INODE_NODEDUP;
+   return 0;
+   }
+
+   if (!strncmp("disable", value, len)) {
+   BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
+static const char *prop_dedup_extract(struct inode *inode)
+{
+   if (BTRFS_I(inode)->flags & BTRFS_INODE_NODEDUP)
+   return "disable";
 
+   return NULL;
+}
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 09/20] btrfs: dedup: Inband in-memory only de-duplication implement

2016-02-17 Thread Qu Wenruo
Core implement for inband de-duplication.
It reuse the async_cow_start() facility to do the calculate dedup hash.
And use dedup hash to do inband de-duplication at extent level.

The work flow is as below:
1) Run delalloc range for an inode
2) Calculate hash for the delalloc range at the unit of dedup_bs
3) For hash match(duplicated) case, just increase source extent ref
   and insert file extent.
   For hash mismatch case, go through the normal cow_file_range()
   fallback, and add hash into dedup_tree.
   Compress for hash miss case is not supported yet.

Current implement restore all dedup hash in memory rb-tree, with LRU
behavior to control the limit.

Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent-tree.c |  18 ++
 fs/btrfs/inode.c   | 170 ++---
 2 files changed, 164 insertions(+), 24 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e2287c7..2a17c88 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -37,6 +37,7 @@
 #include "math.h"
 #include "sysfs.h"
 #include "qgroup.h"
+#include "dedup.h"
 
 #undef SCRAMBLE_DELAYED_REFS
 
@@ -2399,6 +2400,8 @@ static int run_one_delayed_ref(struct btrfs_trans_handle 
*trans,
 
if (btrfs_delayed_ref_is_head(node)) {
struct btrfs_delayed_ref_head *head;
+   struct btrfs_fs_info *fs_info = root->fs_info;
+
/*
 * we've hit the end of the chain and we were supposed
 * to insert this extent into the tree.  But, it got
@@ -2413,6 +2416,15 @@ static int run_one_delayed_ref(struct btrfs_trans_handle 
*trans,
btrfs_pin_extent(root, node->bytenr,
 node->num_bytes, 1);
if (head->is_data) {
+   /*
+* If insert_reserved is given, it means
+* a new extent is revered, then deleted
+* in one tran, and inc/dec get merged to 0.
+*
+* In this case, we need to remove its dedup
+* hash.
+*/
+   btrfs_dedup_del(trans, fs_info, node->bytenr);
ret = btrfs_del_csums(trans, root,
  node->bytenr,
  node->num_bytes);
@@ -6707,6 +6719,12 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
btrfs_release_path(path);
 
if (is_data) {
+   ret = btrfs_dedup_del(trans, info, bytenr);
+   if (ret < 0) {
+   btrfs_abort_transaction(trans, extent_root,
+   ret);
+   goto out;
+   }
ret = btrfs_del_csums(trans, root, bytenr, num_bytes);
if (ret) {
btrfs_abort_transaction(trans, extent_root, 
ret);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a4d3c54a..92b3bdd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -60,6 +60,7 @@
 #include "hash.h"
 #include "props.h"
 #include "qgroup.h"
+#include "dedup.h"
 
 struct btrfs_iget_args {
struct btrfs_key *location;
@@ -106,7 +107,8 @@ static int btrfs_finish_ordered_io(struct 
btrfs_ordered_extent *ordered_extent);
 static noinline int cow_file_range(struct inode *inode,
   struct page *locked_page,
   u64 start, u64 end, int *page_started,
-  unsigned long *nr_written, int unlock);
+  unsigned long *nr_written, int unlock,
+  struct btrfs_dedup_hash *hash);
 static struct extent_map *create_pinned_em(struct inode *inode, u64 start,
   u64 len, u64 orig_start,
   u64 block_start, u64 block_len,
@@ -335,6 +337,7 @@ struct async_extent {
struct page **pages;
unsigned long nr_pages;
int compress_type;
+   struct btrfs_dedup_hash *hash;
struct list_head list;
 };
 
@@ -353,7 +356,8 @@ static noinline int add_async_extent(struct async_cow *cow,
 u64 compressed_size,
 struct page **pages,
 unsigned long nr_pages,
-int compress_type)
+int compress_type,
+struct btrfs_dedup_hash *hash)
 {
struct 

[PATCH v7 05/20] btrfs: delayed-ref: Add support for increasing data ref under spinlock

2016-02-17 Thread Qu Wenruo
For in-band dedup, btrfs needs to increase data ref with delayed_ref
locked, so add a new function btrfs_add_delayed_data_ref_lock() to
increase extent ref with delayed_refs already locked.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/dedup.c   |  1 +
 fs/btrfs/delayed-ref.c | 30 +++---
 fs/btrfs/delayed-ref.h |  8 
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index 259b32d..ed18fc9 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -20,6 +20,7 @@
 #include "btrfs_inode.h"
 #include "transaction.h"
 #include "delayed-ref.h"
+#include "qgroup.h"
 
 struct inmem_hash {
struct rb_node hash_node;
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 914ac13..1091810 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -805,6 +805,26 @@ free_ref:
 }
 
 /*
+ * Do real delayed data ref insert.
+ * Caller must hold delayed_refs->lock and allocation memory
+ * for dref,head_ref and record.
+ */
+void btrfs_add_delayed_data_ref_locked(struct btrfs_fs_info *fs_info,
+   struct btrfs_trans_handle *trans,
+   struct btrfs_delayed_data_ref *dref,
+   struct btrfs_delayed_ref_head *head_ref,
+   struct btrfs_qgroup_extent_record *qrecord,
+   u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root,
+   u64 owner, u64 offset, u64 reserved, int action)
+{
+   head_ref = add_delayed_ref_head(fs_info, trans, _ref->node,
+   qrecord, bytenr, num_bytes, ref_root, reserved,
+   action, 1);
+   add_delayed_data_ref(fs_info, trans, head_ref, >node, bytenr,
+   num_bytes, parent, ref_root, owner, offset, action);
+}
+
+/*
  * add a delayed data ref. it's similar to btrfs_add_delayed_tree_ref.
  */
 int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
@@ -849,13 +869,9 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info 
*fs_info,
 * insert both the head node and the new ref without dropping
 * the spin lock
 */
-   head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, record,
-   bytenr, num_bytes, ref_root, reserved,
-   action, 1);
-
-   add_delayed_data_ref(fs_info, trans, head_ref, >node, bytenr,
-  num_bytes, parent, ref_root, owner, offset,
-  action);
+   btrfs_add_delayed_data_ref_locked(fs_info, trans, ref, head_ref, record,
+   bytenr, num_bytes, parent, ref_root, owner, offset,
+   reserved, action);
spin_unlock(_refs->lock);
 
return 0;
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index c24b653..2765858 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -239,11 +239,19 @@ static inline void btrfs_put_delayed_ref(struct 
btrfs_delayed_ref_node *ref)
}
 }
 
+struct btrfs_qgroup_extent_record;
 int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
   struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes, u64 parent,
   u64 ref_root, int level, int action,
   struct btrfs_delayed_extent_op *extent_op);
+void btrfs_add_delayed_data_ref_locked(struct btrfs_fs_info *fs_info,
+   struct btrfs_trans_handle *trans,
+   struct btrfs_delayed_data_ref *dref,
+   struct btrfs_delayed_ref_head *head_ref,
+   struct btrfs_qgroup_extent_record *qrecord,
+   u64 bytenr, u64 num_bytes, u64 parent, u64 ref_root,
+   u64 owner, u64 offset, u64 reserved, int action);
 int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
   struct btrfs_trans_handle *trans,
   u64 bytenr, u64 num_bytes,
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 16/20] btrfs: dedup: add an inode nodedup flag

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce BTRFS_INODE_NODEDUP flag, then we can explicitly disable
online data deduplication for specified files.

Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/ctree.h | 1 +
 fs/btrfs/ioctl.c | 6 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 174af5c..27ea1dd 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2438,6 +2438,7 @@ do {  
 \
 #define BTRFS_INODE_NOATIME(1 << 9)
 #define BTRFS_INODE_DIRSYNC(1 << 10)
 #define BTRFS_INODE_COMPRESS   (1 << 11)
+#define BTRFS_INODE_NODEDUP(1 << 12)
 
 #define BTRFS_INODE_ROOT_ITEM_INIT (1 << 31)
 
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index eb37a3d..5250b8f 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -159,7 +159,8 @@ void btrfs_update_iflags(struct inode *inode)
 /*
  * Inherit flags from the parent inode.
  *
- * Currently only the compression flags and the cow flags are inherited.
+ * Currently only the compression flags, the dedup flags and the cow
+ * flags are inherited.
  */
 void btrfs_inherit_iflags(struct inode *inode, struct inode *dir)
 {
@@ -184,6 +185,9 @@ void btrfs_inherit_iflags(struct inode *inode, struct inode 
*dir)
BTRFS_I(inode)->flags |= BTRFS_INODE_NODATASUM;
}
 
+   if (flags & BTRFS_INODE_NODEDUP)
+   BTRFS_I(inode)->flags |= BTRFS_INODE_NODEDUP;
+
btrfs_update_iflags(inode);
 }
 
-- 
2.7.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL][PATCH v7 00/19][For 4.6] Btrfs: Add inband (write time) de-duplication framework

2016-02-17 Thread Qu Wenruo
Hi Chris,

This is the pull request for btrfs in-band de-duplication patchset.

The patchset can also be fetched from github:
https://github.com/adam900710/linux.git wang_dedup_20160218

Theis patchset went through several tests and seems quite good based on
integration-4.5.
We will continue test after the rebase to
integration-4.6(commit dc1e02a4ae), but for it hasn't shown any problem yet
and I think it would be OK for 4.6 merge window.

Most of the patchset didn't change after v5.
Only 2 bug fix and minor return value is changed since then.

There is still some little further work to do, but currently it should
be OK for most user to try such new feature.

This updated version of inband de-duplication has the following features:
1) ONE unified dedup framework.
   Most of its code is hidden quietly in dedup.c and export the minimal
   interfaces for its caller.
   Reviewer and further developer would benefit from the unified
   framework.

2) TWO different back-end with different trade-off
   One is the improved version of previous Fujitsu in-memory only dedup.
   The other one is enhanced dedup implementation from Liu Bo.
   Changed its tree structure to handle bytenr -> hash search for
   deleting hash, without the hideous data backref hack.

3) Ioctl interface with persist dedup status
   Advised by David, now we use ioctl to enable/disable dedup.

   And we now have dedup status, recorded in the first item of dedup
   tree.
   Just like quota, once enabled, no extra ioctl is needed for next
   mount.

4) Ability to disable dedup for given dirs/files
   It works just like the compression prop method, by adding a new
   xattr.

TODO:
1) Support compression for hash miss case
   This may need to change the on-disk format for on-disk backend.

2) Add extent-by-extent comparison for faster but more conflicting algorithm
   Current SHA256 hash is quite slow, and for some old(5 years ago) CPU,
   CPU may even be a bottleneck other than IO.
   But for faster hash, it will definitely cause conflicts, so we need
   extent comparison before we introduce new dedup algorithm.

3) Misc end-user related helpers
   Like handy and easy to implement dedup rate report.
   And method to query in-memory hash size for those "non-exist" users who
   want to use 'dedup enable -l' option but didn't ever know how much
   RAM they have.

Changelog:
v2:
  Totally reworked to handle multiple backends
v3:
  Fix a stupid but deadly on-disk backend bug
  Add handle for multiple hash on same bytenr corner case to fix abort
  trans error
  Increase dedup rate by enhancing delayed ref handler for both backend.
  Move dedup_add() to run_delayed_ref() time, to fix abort trans error.
  Increase dedup block size up limit to 8M.
v4:
  Add dedup prop for disabling dedup for given files/dirs.
  Merge inmem_search() and ondisk_search() into generic_search() to save
  some code
  Fix another delayed_ref related bug.
  Use the same mutex for both inmem and ondisk backend.
  Move dedup_add() back to btrfs_finish_ordered_io() to increase dedup
  rate.
v5:
  Reuse compress routine for much simpler dedup function.
  Slightly improved performance due to above modification.
  Fix race between dedup enable/disable
  Fix for false ENOSPC report
v6:
  Further enable/disable race window fix.
  Minor format change according to checkpatch.
v7:
  Fix one concurrency bug with balance.
  Slightly modify return value from -EINVAL to -EOPNOTSUPP for
  btrfs_dedup_ioctl() to allow progs to distinguish unsupported commands
  and error parameter.
  Rebased to integration-4.6.

Qu Wenruo (7):
  btrfs: delayed-ref: Add support for increasing data ref under spinlock
  btrfs: dedup: Inband in-memory only de-duplication implement
  btrfs: dedup: Add basic tree structure for on-disk dedup method
  btrfs: dedup: Introduce interfaces to resume and cleanup dedup info
  btrfs: dedup: Add support for on-disk hash search
  btrfs: dedup: Add support to delete hash for on-disk backend
  btrfs: dedup: Add support for adding hash for on-disk backend

Wang Xiaoguang (13):
  btrfs: dedup: Introduce dedup framework and its header
  btrfs: dedup: Introduce function to initialize dedup info
  btrfs: dedup: Introduce function to add hash into in-memory tree
  btrfs: dedup: Introduce function to remove hash from in-memory tree
  btrfs: dedup: Introduce function to search for an existing hash
  btrfs: dedup: Implement btrfs_dedup_calc_hash interface
  btrfs: ordered-extent: Add support for dedup
  btrfs: dedup: Add ioctl for inband deduplication
  btrfs: dedup: add an inode nodedup flag
  btrfs: dedup: add a property handler for online dedup
  btrfs: dedup: add per-file online dedup control
  btrfs: try more times to alloc metadata reserve space
  btrfs: dedup: Fix a bug when running inband dedup with balance

 fs/btrfs/Makefile|2 +-
 fs/btrfs/ctree.h |   73 ++-
 fs/btrfs/dedup.c | 1146 ++
 fs/btrfs/dedup.h  

[PATCH v7 12/20] btrfs: dedup: Add support for on-disk hash search

2016-02-17 Thread Qu Wenruo
Now on-disk backend should be able to search hash now.

Signed-off-by: Wang Xiaoguang 
Signed-off-by: Qu Wenruo 
---
 fs/btrfs/dedup.c | 133 ---
 fs/btrfs/dedup.h |   1 +
 2 files changed, 118 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/dedup.c b/fs/btrfs/dedup.c
index ad7b7e1..61a5d7a 100644
--- a/fs/btrfs/dedup.c
+++ b/fs/btrfs/dedup.c
@@ -594,6 +594,79 @@ int btrfs_dedup_disable(struct btrfs_fs_info *fs_info)
return ret;
 }
 
+ /*
+ * Return 0 for not found
+ * Return >0 for found and set bytenr_ret
+ * Return <0 for error
+ */
+static int ondisk_search_hash(struct btrfs_dedup_info *dedup_info, u8 *hash,
+ u64 *bytenr_ret, u32 *num_bytes_ret)
+{
+   struct btrfs_path *path;
+   struct btrfs_key key;
+   struct btrfs_root *dedup_root = dedup_info->dedup_root;
+   u8 *buf = NULL;
+   u64 hash_key;
+   int hash_len = btrfs_dedup_sizes[dedup_info->hash_type];
+   int ret;
+
+   path = btrfs_alloc_path();
+   if (!path)
+   return -ENOMEM;
+
+   buf = kmalloc(hash_len, GFP_NOFS);
+   if (!buf) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   memcpy(_key, hash + hash_len - 8, 8);
+   key.objectid = hash_key;
+   key.type = BTRFS_DEDUP_HASH_ITEM_KEY;
+   key.offset = (u64)-1;
+
+   ret = btrfs_search_slot(NULL, dedup_root, , path, 0, 0);
+   if (ret < 0)
+   goto out;
+   WARN_ON(ret == 0);
+   while (1) {
+   struct extent_buffer *node;
+   struct btrfs_dedup_hash_item *hash_item;
+   int slot;
+
+   ret = btrfs_previous_item(dedup_root, path, hash_key,
+ BTRFS_DEDUP_HASH_ITEM_KEY);
+   if (ret < 0)
+   goto out;
+   if (ret > 0) {
+   ret = 0;
+   goto out;
+   }
+
+   node = path->nodes[0];
+   slot = path->slots[0];
+   btrfs_item_key_to_cpu(node, , slot);
+
+   if (key.type != BTRFS_DEDUP_HASH_ITEM_KEY ||
+   memcmp(, hash + hash_len - 8, 8))
+   break;
+   hash_item = btrfs_item_ptr(node, slot,
+   struct btrfs_dedup_hash_item);
+   read_extent_buffer(node, buf, (unsigned long)(hash_item + 1),
+  hash_len);
+   if (!memcmp(buf, hash, hash_len)) {
+   ret = 1;
+   *bytenr_ret = key.offset;
+   *num_bytes_ret = btrfs_dedup_hash_len(node, hash_item);
+   break;
+   }
+   }
+out:
+   kfree(buf);
+   btrfs_free_path(path);
+   return ret;
+}
+
 /*
  * Caller must ensure the corresponding ref head is not being run.
  */
@@ -624,9 +697,36 @@ inmem_search_hash(struct btrfs_dedup_info *dedup_info, u8 
*hash)
return NULL;
 }
 
-static int inmem_search(struct btrfs_dedup_info *dedup_info,
-   struct inode *inode, u64 file_pos,
-   struct btrfs_dedup_hash *hash)
+/* Wapper for different backends, caller needs to hold dedup_info->lock */
+static inline int generic_search_hash(struct btrfs_dedup_info *dedup_info,
+ u8 *hash, u64 *bytenr_ret,
+ u32 *num_bytes_ret)
+{
+   if (dedup_info->backend == BTRFS_DEDUP_BACKEND_INMEMORY) {
+   struct inmem_hash *found_hash;
+   int ret;
+
+   found_hash = inmem_search_hash(dedup_info, hash);
+   if (found_hash) {
+   ret = 1;
+   *bytenr_ret = found_hash->bytenr;
+   *num_bytes_ret = found_hash->num_bytes;
+   } else {
+   ret = 0;
+   *bytenr_ret = 0;
+   *num_bytes_ret = 0;
+   }
+   return ret;
+   } else if (dedup_info->backend == BTRFS_DEDUP_BACKEND_ONDISK) {
+   return ondisk_search_hash(dedup_info, hash, bytenr_ret,
+ num_bytes_ret);
+   }
+   return -EINVAL;
+}
+
+static int generic_search(struct btrfs_dedup_info *dedup_info,
+ struct inode *inode, u64 file_pos,
+ struct btrfs_dedup_hash *hash)
 {
int ret;
struct btrfs_root *root = BTRFS_I(inode)->root;
@@ -636,9 +736,9 @@ static int inmem_search(struct btrfs_dedup_info *dedup_info,
struct btrfs_delayed_ref_head *insert_head;
struct btrfs_delayed_data_ref *insert_dref;
struct btrfs_qgroup_extent_record *insert_qrecord = NULL;
-   struct inmem_hash *found_hash;
int free_insert = 1;
u64 bytenr;
+   

[PATCH v7 01/20] btrfs: dedup: Introduce dedup framework and its header

2016-02-17 Thread Qu Wenruo
From: Wang Xiaoguang 

Introduce the header for btrfs online(write time) de-duplication
framework and needed header.

The new de-duplication framework is going to support 2 different dedup
method and 1 dedup hash.

Signed-off-by: Qu Wenruo 
Signed-off-by: Wang Xiaoguang 
---
 fs/btrfs/ctree.h   |   5 +++
 fs/btrfs/dedup.h   | 127 +
 fs/btrfs/disk-io.c |   1 +
 3 files changed, 133 insertions(+)
 create mode 100644 fs/btrfs/dedup.h

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bc6a87e..094db5c 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1866,6 +1866,11 @@ struct btrfs_fs_info {
struct list_head pinned_chunks;
 
int creating_free_space_tree;
+
+   /* Inband de-duplication related structures*/
+   unsigned int dedup_enabled:1;
+   struct btrfs_dedup_info *dedup_info;
+   struct mutex dedup_ioctl_lock;
 };
 
 struct btrfs_subvolume_writers {
diff --git a/fs/btrfs/dedup.h b/fs/btrfs/dedup.h
new file mode 100644
index 000..8e1ff03
--- /dev/null
+++ b/fs/btrfs/dedup.h
@@ -0,0 +1,127 @@
+/*
+ * Copyright (C) 2015 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef __BTRFS_DEDUP__
+#define __BTRFS_DEDUP__
+
+#include 
+#include 
+#include 
+
+/*
+ * Dedup storage backend
+ * On disk is persist storage but overhead is large
+ * In memory is fast but will lose all its hash on umount
+ */
+#define BTRFS_DEDUP_BACKEND_INMEMORY   0
+#define BTRFS_DEDUP_BACKEND_ONDISK 1
+#define BTRFS_DEDUP_BACKEND_LAST   2
+
+/* Dedup block size limit and default value */
+#define BTRFS_DEDUP_BLOCKSIZE_MAX  (8 * 1024 * 1024)
+#define BTRFS_DEDUP_BLOCKSIZE_MIN  (16 * 1024)
+#define BTRFS_DEDUP_BLOCKSIZE_DEFAULT  (128 * 1024)
+
+/* Hash algorithm, only support SHA256 yet */
+#define BTRFS_DEDUP_HASH_SHA2560
+
+static int btrfs_dedup_sizes[] = { 32 };
+
+/*
+ * For caller outside of dedup.c
+ *
+ * Different dedup backends should have their own hash structure
+ */
+struct btrfs_dedup_hash {
+   u64 bytenr;
+   u32 num_bytes;
+
+   /* last field is a variable length array of dedup hash */
+   u8 hash[];
+};
+
+struct btrfs_dedup_info {
+   /* dedup blocksize */
+   u64 blocksize;
+   u16 backend;
+   u16 hash_type;
+
+   struct crypto_shash *dedup_driver;
+   struct mutex lock;
+
+   /* following members are only used in in-memory dedup mode */
+   struct rb_root hash_root;
+   struct rb_root bytenr_root;
+   struct list_head lru_list;
+   u64 limit_nr;
+   u64 current_nr;
+};
+
+struct btrfs_trans_handle;
+
+int btrfs_dedup_hash_size(u16 type);
+struct btrfs_dedup_hash *btrfs_dedup_alloc_hash(u16 type);
+
+/*
+ * Initial inband dedup info
+ * Called at dedup enable time.
+ */
+int btrfs_dedup_enable(struct btrfs_fs_info *fs_info, u16 type, u16 backend,
+  u64 blocksize, u64 limit_nr);
+
+/*
+ * Disable dedup and invalidate all its dedup data.
+ * Called at dedup disable time.
+ */
+int btrfs_dedup_disable(struct btrfs_fs_info *fs_info);
+
+/*
+ * Calculate hash for dedup.
+ * Caller must ensure [start, start + dedup_bs) has valid data.
+ */
+int btrfs_dedup_calc_hash(struct btrfs_fs_info *fs_info,
+ struct inode *inode, u64 start,
+ struct btrfs_dedup_hash *hash);
+
+/*
+ * Search for duplicated extents by calculated hash
+ * Caller must call btrfs_dedup_calc_hash() first to get the hash.
+ *
+ * @inode: the inode for we are writing
+ * @file_pos: offset inside the inode
+ * As we will increase extent ref immediately after a hash match,
+ * we need @file_pos and @inode in this case.
+ *
+ * Return > 0 for a hash match, and the extent ref will be
+ * *INCREASED*, and hash->bytenr/num_bytes will record the existing
+ * extent data.
+ * Return 0 for a hash miss. Nothing is done
+ */
+int btrfs_dedup_search(struct btrfs_fs_info *fs_info,
+  struct inode *inode, u64 file_pos,
+  struct btrfs_dedup_hash *hash);
+
+/* Add a dedup hash into dedup info */
+int btrfs_dedup_add(struct btrfs_trans_handle *trans,
+   struct btrfs_fs_info *fs_info,
+   struct 

[patch] btrfs: array overflow in btrfs_ioctl_rm_dev_v2()

2016-02-17 Thread Dan Carpenter
We were putting the NUL terminator at BTRFS_PATH_NAME_MAX (4087) bytes
instead of BTRFS_SUBVOL_NAME_MAX (4039) so it corrupted memory.

Fixes: 22af1a869288 ('btrfs: introduce device delete by devid')
Signed-off-by: Dan Carpenter 

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 5224fc8..77c61b4 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2700,7 +2700,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void 
__user *arg)
if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
ret = btrfs_rm_device(root, NULL, vol_args->devid);
} else {
-   vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
+   vol_args->name[BTRFS_SUBVOL_NAME_MAX] = '\0';
ret = btrfs_rm_device(root, vol_args->name, 0);
}
mutex_unlock(>fs_info->volume_mutex);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 6 full, but there is still space left on some devices

2016-02-17 Thread Qu Wenruo



Dan Blazejewski wrote on 2016/02/17 18:04 -0500:

Hello,

I upgraded my kernel to 4.4.2, and btrfs-progs to 4.4. I also added
another 4TB disk and kicked off a full balance (currently 7x4TB
RAID6). I'm interested to see what an additional drive will do to
this. I'll also have to wait and see if a full system balance on a
newer version of BTRFS tools does the trick or not.

I also noticed that "btrfs device usage" shows multiple entries for
Data, RAID 6 on some drives. Is this normal? Please note that /dev/sdh
is the new disk, and I only just started the balance.

# btrfs dev usage /mnt/data
/dev/sda, ID: 5
Device size: 3.64TiB
Data,RAID6:  1.43TiB
Data,RAID6:  1.48TiB
Data,RAID6:320.00KiB
Metadata,RAID6:  2.55GiB
Metadata,RAID6:  1.50GiB
System,RAID6:   16.00MiB
Unallocated:   733.67GiB

/dev/sdb, ID: 6
Device size: 3.64TiB
Data,RAID6:  1.48TiB
Data,RAID6:320.00KiB
Metadata,RAID6:  1.50GiB
System,RAID6:   16.00MiB
Unallocated: 2.15TiB

/dev/sdc, ID: 7
Device size: 3.64TiB
Data,RAID6:  1.43TiB
Data,RAID6:732.69GiB
Data,RAID6:  1.48TiB
Data,RAID6:320.00KiB
Metadata,RAID6:  2.55GiB
Metadata,RAID6:982.00MiB
Metadata,RAID6:  1.50GiB
System,RAID6:   16.00MiB
Unallocated:25.21MiB

/dev/sdd, ID: 1
Device size: 3.64TiB
Data,RAID6:  1.43TiB
Data,RAID6:732.69GiB
Data,RAID6:  1.48TiB
Data,RAID6:320.00KiB
Metadata,RAID6:  2.55GiB
Metadata,RAID6:982.00MiB
Metadata,RAID6:  1.50GiB
System,RAID6:   16.00MiB
Unallocated:25.21MiB

/dev/sdf, ID: 3
Device size: 3.64TiB
Data,RAID6:  1.43TiB
Data,RAID6:732.69GiB
Data,RAID6:  1.48TiB
Data,RAID6:320.00KiB
Metadata,RAID6:  2.55GiB
Metadata,RAID6:982.00MiB
Metadata,RAID6:  1.50GiB
System,RAID6:   16.00MiB
Unallocated:25.21MiB

/dev/sdg, ID: 2
Device size: 3.64TiB
Data,RAID6:  1.43TiB
Data,RAID6:732.69GiB
Data,RAID6:  1.48TiB
Data,RAID6:320.00KiB
Metadata,RAID6:  2.55GiB
Metadata,RAID6:982.00MiB
Metadata,RAID6:  1.50GiB
System,RAID6:   16.00MiB
Unallocated:25.21MiB

/dev/sdh, ID: 8
Device size: 3.64TiB
Data,RAID6:320.00KiB
Unallocated: 3.64TiB



Not sure how that multiple chunk type shows up.
Maybe all these shown RAID6 has different number of stripes?




Qu, in regards to your question, I ran RAID 1 on multiple disks of
different sizes. I believe I had a mix of 2x4TB, 1x2TB, and 1x3TB
drive. I replaced the 2TB drive first with a 4TB, and balanced it.
Later on, I replaced the 3TB drive with another 4TB, and balanced,
yielding an array of 4x4TB RAID1. A little while later, I wound up
sticking a fifth 4TB drive in, and converting to RAID6. The sixth 4TB
drive was added some time after that. The seventh was added just a few
minutes ago.


Personally speaking, I just came up to one method to balance all these 
disks, and in fact you don't need to add a disk.


1) Balance all data chunk to single profile
2) Balance all metadata chunk to single or RAID1 profile
3) Balance all data chunk back to RAID6 profile
4) Balance all metadata chunk back to RAID6 profile
System chunk is so small that normally you don't need to bother.

The trick is, as single is the most flex chunk type, only needs one disk 
with unallocated space.
And btrfs chunk allocater will allocate chunk to device with most 
unallocated space.


So after 1) and 2) you should found that chunk allocation is almost 
perfectly balanced across all devices, as long as they are in same size.


Now you have a balance base layout for RAID6 allocation. Should make 
things go quite smooth and result a balanced RAID6 chunk layout.


Thanks,
Qu




Thanks!

On Wed, Feb 17, 2016 at 12:58 AM, Qu Wenruo  wrote:



Dan Blazejewski wrote on 2016/02/16 15:20 -0500:


Hello,

I've searched high and low about my issue, but have been unable to
turn up anything like what I'm seeing right now.

A little background: I started using BTRFS over a year ago, in RAID 1
with mixed size drives. A few months ago, I started replacing the
disks with 4 TB drives, and eventually switched over to RAID 6. I am
currently running a 6x4TB RAID6 drive configuration, which should give
me ~14.5 TB
usable, but I'm only getting around 11.

The weird thing is that It seems to completely fill 4/6 of the disks,
while leaving lots of space 

Re: [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure

2016-02-17 Thread Satoru Takeuchi

On 2016/02/18 0:11, David Sterba wrote:

On Wed, Feb 17, 2016 at 02:54:23PM +0900, Satoru Takeuchi wrote:

On 2016/02/16 2:53, David Sterba wrote:

On Mon, Feb 15, 2016 at 02:38:09PM +0900, Satoru Takeuchi wrote:

There are some BUG_ON()'s after kmalloc() as follows.

=
foo = kmalloc();
BUG_ON(!foo);   /* -ENOMEM case */
=

A Docker + memory cgroup user hit these BUG_ON()s.

https://bugzilla.kernel.org/show_bug.cgi?id=112101

Since it's very hard to handle these ENOMEMs properly,
preventing these kmalloc() failures to avoid these
BUG_ON()s for now, are a bit better than the current
implementation anyway.


Beware that the NOFAIL semantics is can cause deadlocks if it's on the
critical writeback path or and can be reentered from itself through the
reclaim. Unless you're sure that this is not the case, please do not add
them just because it would seemingly fix the allocation failures.


About the all cases I changed, kmalloc()s can block
since gfp_flags_allow_blocking() are true. Then no locks
are acquired here and deadlocks don't happen.

Am I missing something?


Waiting as in GFP_WAIT is not the same as GFP_NOFAIL that can wait
indefinetelly. The locking of NOFAIL is implied. The kmalloc callsite
will block until there's memory available. If another thread of btrfs
waits for this code to progress, it will block as well.


In the docker example, the memory is limited by cgroups so the NOFAIL
mode can exhaust all reserves and just loop endlessly waiting for the
OOM killer to get some memory or just waiting without any chance to
progress.


I consider triggering OOM killer and killing processes
in a cgroup are better than killing whole system.


The action of OOM killer is not a problem. The cgroup memory limit can
be low or all the memory is unreclaimable. At this point btrfs code will
block.



About the possibility of endless loop, there are many
such problems in the whole kernel. Of course it can be
said to Btrfs.


Many? Examples? In this context we're talking about endless loops caused
by non-failing allocations.


==
$ grep -rnH __GFP_NOFAIL fs/btrfs/
fs/btrfs/extent-tree.c:5970: GFP_NOFS | __GFP_NOFAIL);
fs/btrfs/extent-tree.c:6043: bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
fs/btrfs/extent_io.c:4643: eb = kmem_cache_zalloc(extent_buffer_cache, 
GFP_NOFS|__GFP_NOFAIL);
fs/btrfs/extent_io.c:4909: p = find_or_create_page(mapping, index, 
GFP_NOFS|__GFP_NOFAIL);
==


I'm aware of the existing NOFAIL allocations. There are two at
extent_buffer allocation time, added recently and provoked by the
intended changes to memory allocator that would fail the GFP_NOFS
allocations.

The other two are from year 2010, set_extent_bit, IMHO added so the
error handling does not get complicated and expressing "we don't want to
fail here". But there are many other calls to set_extent_bit that could
fail. This is inconsistent and should be unified. In a way that we're
sure that we're not introducing potential hangs.


I understand fixing these problems cooperate with
memory cgroup guys is the best in the long run.


It's not about cgroups, btrfs needs to ideally handle all memory
allocation failures in a way that uses some sort of fallbacks but still
can lead to a transaction abort if there's simply no memory left.


However, I consider bypassing this problem for now
is better than the current implementation.


It's more like replacing one problem with another. With every new
NOFAIL, one has to think about the runtime interactions with the others.
I'd rather see this fixed with a particular call path in mind or class,
eg. the extent_map bit settings, than throwing NOFAIL at places that
somebody accidentally hit.

As a temporary fix we can add __GFP_HIGH to the interesting sites so we
can get access to the emergency reserves, and this is on my list of
things to do after the NOFS -> KERNEL updates are done.


OK, got it. I'll reconsider how to fix there problem.

Thanks,
Satoru


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: btrfs: reada: simplify dev->reada_in_flight processing

2016-02-17 Thread Zhao Lei
Hi, Dan Carpenter

> From: Dan Carpenter [mailto:dan.carpen...@oracle.com]
> Sent: Thursday, February 18, 2016 3:02 AM
> To: zhao...@cn.fujitsu.com
> Cc: linux-btrfs@vger.kernel.org
> Subject: re: btrfs: reada: simplify dev->reada_in_flight processing
> 
> Hello Zhao Lei,
> 
> The patch 7aff519c04d2: "btrfs: reada: simplify dev->reada_in_flight
> processing" from Jan 12, 2016, leads to the following static checker
> warning:
> 
>   fs/btrfs/reada.c:697 reada_start_machine_dev()
>   warn: inconsistent indenting
> 
> fs/btrfs/reada.c
>688  spin_unlock(_info->reada_lock);
>689  return 0;
>690  }
>691  dev->reada_next = re->logical +
> fs_info->tree_root->nodesize;
>692  re->refcnt++;
>693
>694  spin_unlock(_info->reada_lock);
>695
>696  spin_lock(>lock);
>697  if (re->scheduled || list_empty(>extctl)) {
> 
> This is indented too much.
> 
Thanks for report.

This problem is introduced in patch titled:
btrfs: reada: Move is_need_to_readahead contition earlier

I'll fix it.

CC: David Sterba 
I'll fix this indent problem in following branch:
https://github.com/zhaoleidd/btrfs.git integration-4.5

Could you pick them again?

Thanks
Zhaolei

>698  spin_unlock(>lock);
>699  reada_extent_put(fs_info, re);
>700  return 0;
>701  }
>702  re->scheduled = 1;
>703  spin_unlock(>lock);
>704
> 
> regards,
> dan carpenter




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] fstests: generic test for directory fsync after rename operation

2016-02-17 Thread Dave Chinner
On Mon, Feb 15, 2016 at 10:54:23AM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Test that if we move one file between directories, fsync the parent
> directory of the old directory, power fail and remount the filesystem,
> the file is not lost and it's located at the destination directory.
> 
> This is motivated by a bug found in btrfs, which is fixed by the patch
> (for the linux kernel) titled:
> 
>   "Btrfs: fix file loss on log replay after renaming a file and fsync"
> 
> Tested against ext3, ext4, xfs, f2fs and reiserfs.
> 
> Signed-off-by: Filipe Manana 

> +# We expect our file foo to exist, have an entry in the new parent
> +# directory (c/) and not have anymore an entry in the old parent directory
> +# (a/b/).
> +[ -e $SCRATCH_MNT/a/b/foo ] && echo "File foo is still at directory a/b/"
> +[ -e $SCRATCH_MNT/c/foo ] || echo "File foo is not at directory c/"
> +
> +# The new file named bar should also exist.
> +[ -e $SCRATCH_MNT/a/bar ] || echo "File bar is missing"

This can all be replaced simply by:

ls -R $SCRATCH_MNT | _filter_scratch

Because the golden image match will tell us if files are missing or
in the wrong place.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


КЛИЕНТСКИЕ БАЗЫ! Тел\Viber\Whatsapp: +79139393506 Email: mgordee...@gmail.com Skype: prodawez389

2016-02-17 Thread +79139393506
КЛИЕНТСКИЕ БАЗЫ!

Соберем для Вас по интернет базу данных потенциальных клиентов 
для Вашего Бизнеса!
Много! Быстро! Недорого! 
Узнайте об этом подробнее по 
Тел: +79139393506
Viber: +79139393506
Whatsapp: +79139393506
Skype: prodawez389
Email: mgordee...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2016-02-17 Thread Ángel González
Qu Wenruo wrote:
> If you're really interesting in whether your fs has skinny metadata 
> enabled, you can check btrfs-show-super output.


> Like the following output indicates skinny metadata:
> --
> incompat_flags0x161
>   ( MIXED_BACKREF |
>     BIG_METADATA |
>     EXTENDED_IREF |
>     SKINNY_METADATA ) << --
> 
> Even it has skinny metadata, it's still possible that some metadata
> are still in old format if you used btrfstune to convert an old fs to
> skinny metadata.

It was a freshly created filesystem. However, btrfs-show-super shows it
does *not* have skinny metadata:
> incompat_flags0x61
>   ( MIXED_BACKREF |
>     BIG_METADATA |
>     EXTENDED_IREF )

Maybe gparted explicitely requested it to be created without skinny
metadata. That won't make me lose my sleep, though.


> But anyway, it's always good to see the problem solved.

Indeed :-)

Thanks again
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][btrfs-progs] Include file verification with convert-tests

2016-02-17 Thread Lakshmipathi.G
I think its a good idea to populate fs and verify them after conversion.
I'm not sure about existing tools, simply we can create local script to
this testsuite. Some thoughts on possible data set:

--
a) regular files
   - empty files (touch)
   - smaller files dd if=/dev/zero 
   - smaller files dd if=/dev/urandom 
   
b) directory
   - empty dirs.
   - dir with files 
   - dir depth upto N
   - dir with no file but lot of unlinked entries.
   
c) fast/slow symlink
   - to dir
   - to file

d) hardlink :
   - between files
   - between files in different sub-dir

e) fifo files and sparsefile, broken symlink

f) file and dir with special inode attribute: 
- immutable flag
- sticky bit etc

g) file/directory acls:
   - large no.of acls

--
fs images with debugfs tool:

a) an valid image with bad blocks.

b) corrupted images:
Ex: Create duplicate blocks and convert it & expect btrfs-check to catch issue?

Any thoughts on above list, suggestions/comments? thanks!


Cheers,
Lakshmipathi.G 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


re: btrfs: reada: simplify dev->reada_in_flight processing

2016-02-17 Thread Dan Carpenter
Hello Zhao Lei,

The patch 7aff519c04d2: "btrfs: reada: simplify dev->reada_in_flight
processing" from Jan 12, 2016, leads to the following static checker
warning:

fs/btrfs/reada.c:697 reada_start_machine_dev()
warn: inconsistent indenting

fs/btrfs/reada.c
   688  spin_unlock(_info->reada_lock);
   689  return 0;
   690  }
   691  dev->reada_next = re->logical + fs_info->tree_root->nodesize;
   692  re->refcnt++;
   693  
   694  spin_unlock(_info->reada_lock);
   695  
   696  spin_lock(>lock);
   697  if (re->scheduled || list_empty(>extctl)) {

This is indented too much.

   698  spin_unlock(>lock);
   699  reada_extent_put(fs_info, re);
   700  return 0;
   701  }
   702  re->scheduled = 1;
   703  spin_unlock(>lock);
   704  

regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


КЛИЕНТСКИЕ БАЗЫ! Тел\Viber\Whatsapp: +79139393506 Email: mgordee...@gmail.com Skype: prodawez389

2016-02-17 Thread linux-btrfs@vger.kernel.org
КЛИЕНТСКИЕ БАЗЫ!

Соберем для Вас по интернет базу данных потенциальных клиентов 
для Вашего Бизнеса!
Много! Быстро! Недорого! 
Узнайте об этом подробнее по 
Тел: +79139393506
Viber: +79139393506
Whatsapp: +79139393506
Skype: prodawez389
Email: mgordee...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[josef-btrfs:fdinfo 1/1] fs/notify/inotify/inotify_user.c:321:18: warning: initialization from incompatible pointer type

2016-02-17 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git 
fdinfo
head:   9ecc561fd5b586bf648bb23c929750294e81c6db
commit: 9ecc561fd5b586bf648bb23c929750294e81c6db [1/1] fdinfo: handle large 
fdinfo buffers
config: x86_64-randconfig-x012-201607 (attached as .config)
reproduce:
git checkout 9ecc561fd5b586bf648bb23c929750294e81c6db
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

>> fs/notify/inotify/inotify_user.c:321:18: warning: initialization from 
>> incompatible pointer type [-Wincompatible-pointer-types]
 .start_fdinfo = fsnotify_start_fdinfo,
 ^
   fs/notify/inotify/inotify_user.c:321:18: note: (near initialization for 
'inotify_fops.start_fdinfo')

vim +321 fs/notify/inotify/inotify_user.c

   305  list) {
   306  send_len += sizeof(struct inotify_event);
   307  send_len += round_event_name_len(fsn_event);
   308  }
   309  mutex_unlock(>notification_mutex);
   310  ret = put_user(send_len, (int __user *) p);
   311  break;
   312  }
   313  
   314  return ret;
   315  }
   316  
   317  static const struct file_operations inotify_fops = {
   318  .show_fdinfo= inotify_show_fdinfo,
   319  .next_fdinfo= fsnotify_next_fdinfo,
   320  .stop_fdinfo= fsnotify_stop_fdinfo,
 > 321  .start_fdinfo   = fsnotify_start_fdinfo,
   322  .poll   = inotify_poll,
   323  .read   = inotify_read,
   324  .fasync = fsnotify_fasync,
   325  .release= inotify_release,
   326  .unlocked_ioctl = inotify_ioctl,
   327  .compat_ioctl   = inotify_ioctl,
   328  .llseek = noop_llseek,
   329  };

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure

2016-02-17 Thread David Sterba
On Wed, Feb 17, 2016 at 02:54:23PM +0900, Satoru Takeuchi wrote:
> On 2016/02/16 2:53, David Sterba wrote:
> > On Mon, Feb 15, 2016 at 02:38:09PM +0900, Satoru Takeuchi wrote:
> >> There are some BUG_ON()'s after kmalloc() as follows.
> >>
> >> =
> >> foo = kmalloc();
> >> BUG_ON(!foo);  /* -ENOMEM case */
> >> =
> >>
> >> A Docker + memory cgroup user hit these BUG_ON()s.
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=112101
> >>
> >> Since it's very hard to handle these ENOMEMs properly,
> >> preventing these kmalloc() failures to avoid these
> >> BUG_ON()s for now, are a bit better than the current
> >> implementation anyway.
> >
> > Beware that the NOFAIL semantics is can cause deadlocks if it's on the
> > critical writeback path or and can be reentered from itself through the
> > reclaim. Unless you're sure that this is not the case, please do not add
> > them just because it would seemingly fix the allocation failures.
> 
> About the all cases I changed, kmalloc()s can block
> since gfp_flags_allow_blocking() are true. Then no locks
> are acquired here and deadlocks don't happen.
> 
> Am I missing something?

Waiting as in GFP_WAIT is not the same as GFP_NOFAIL that can wait
indefinetelly. The locking of NOFAIL is implied. The kmalloc callsite
will block until there's memory available. If another thread of btrfs
waits for this code to progress, it will block as well.

> > In the docker example, the memory is limited by cgroups so the NOFAIL
> > mode can exhaust all reserves and just loop endlessly waiting for the
> > OOM killer to get some memory or just waiting without any chance to
> > progress.
> 
> I consider triggering OOM killer and killing processes
> in a cgroup are better than killing whole system.

The action of OOM killer is not a problem. The cgroup memory limit can
be low or all the memory is unreclaimable. At this point btrfs code will
block.


> About the possibility of endless loop, there are many
> such problems in the whole kernel. Of course it can be
> said to Btrfs.

Many? Examples? In this context we're talking about endless loops caused
by non-failing allocations.

> ==
> $ grep -rnH __GFP_NOFAIL fs/btrfs/
> fs/btrfs/extent-tree.c:5970: GFP_NOFS | __GFP_NOFAIL);
> fs/btrfs/extent-tree.c:6043: bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
> fs/btrfs/extent_io.c:4643: eb = kmem_cache_zalloc(extent_buffer_cache, 
> GFP_NOFS|__GFP_NOFAIL);
> fs/btrfs/extent_io.c:4909: p = find_or_create_page(mapping, index, 
> GFP_NOFS|__GFP_NOFAIL);
> ==

I'm aware of the existing NOFAIL allocations. There are two at
extent_buffer allocation time, added recently and provoked by the
intended changes to memory allocator that would fail the GFP_NOFS
allocations.

The other two are from year 2010, set_extent_bit, IMHO added so the
error handling does not get complicated and expressing "we don't want to
fail here". But there are many other calls to set_extent_bit that could
fail. This is inconsistent and should be unified. In a way that we're
sure that we're not introducing potential hangs.

> I understand fixing these problems cooperate with
> memory cgroup guys is the best in the long run.

It's not about cgroups, btrfs needs to ideally handle all memory
allocation failures in a way that uses some sort of fallbacks but still
can lead to a transaction abort if there's simply no memory left.

> However, I consider bypassing this problem for now
> is better than the current implementation.

It's more like replacing one problem with another. With every new
NOFAIL, one has to think about the runtime interactions with the others.
I'd rather see this fixed with a particular call path in mind or class,
eg. the extent_map bit settings, than throwing NOFAIL at places that
somebody accidentally hit.

As a temporary fix we can add __GFP_HIGH to the interesting sites so we
can get access to the emergency reserves, and this is on my list of
things to do after the NOFS -> KERNEL updates are done.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs_orphan_cleanup

2016-02-17 Thread Pavol Cupka
Hi all,

after boot I am seeing this in dmesg


[   23.874698] BTRFS info (device sda1): enabling auto defrag
[   23.874702] BTRFS info (device sda1): disk space caching is enabled
[   26.074456] Adding 4193276k swap on /dev/sda4.  Priority:-1
extents:1 across:4193276k
[   26.242940] [ cut here ]
[   26.242953] WARNING: CPU: 1 PID: 1863 at fs/btrfs/inode.c:3443
btrfs_orphan_cleanup+0x2d1/0x420()
[   26.242955] Modules linked in: vboxnetadp(O) vboxnetflt(O)
vboxdrv(O) tun nvidia(PO) cdc_ether usbnet mii cdc_acm cdc_wdm iwldvm
iwlwifi intel_agp intel_gtt efivarfs
[   26.242977] CPU: 1 PID: 1863 Comm: mount Tainted: PW  O
4.1.7-hardened-r1 #3
[   26.242979] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS
A16 12/05/2013
[   26.242982]  81c113c4 c90001633918 819d1e43
81e3a0f8
[   26.242986]   c90001633958 8107cf25
88021f57f800
[   26.242989]  88021f57f800 880222c39e10 8802217d74e0

[   26.242993] Call Trace:
[   26.243004]  [] dump_stack+0x45/0x5d
[   26.243012]  [] warn_slowpath_common+0x85/0xd0
[   26.243016]  [] warn_slowpath_null+0x15/0x20
[   26.243019]  [] btrfs_orphan_cleanup+0x2d1/0x420
[   26.243023]  [] btrfs_lookup_dentry+0x380/0x530
[   26.243027]  [] btrfs_lookup+0x11/0x50
[   26.243033]  [] lookup_real+0x2c/0x80
[   26.243036]  [] __lookup_hash+0x2e/0x40
[   26.243041]  [] path_lookupat+0x88e/0xd10
[   26.243048]  [] ? idr_get_empty_slot+0x1be/0x4e0
[   26.243056]  [] ? kmem_cache_alloc+0x2c/0x150
[   26.243058]  [] ? getname_kernel+0x2f/0x130
[   26.243060]  [] filename_lookup+0x1f/0x100
[   26.243062]  [] vfs_path_lookup+0x62/0xc0
[   26.243065]  [] ? proc_alloc_inum+0x2e/0xb0
[   26.243069]  [] ? __list_add+0x1b/0x40
[   26.243073]  [] mount_subtree+0x3b/0x90
[   26.243075]  [] ? kfree+0x38/0x180
[   26.243079]  [] btrfs_mount+0x273/0x970
[   26.243081]  [] mount_fs+0x3f/0x1a0
[   26.243085]  [] ? __alloc_percpu+0x10/0x20
[   26.243087]  [] vfs_kern_mount+0x63/0x120
[   26.243089]  [] do_mount+0x557/0xdc0
[   26.243091]  [] ? alloc_pages_current+0x95/0x120
[   26.243094]  [] ? __get_free_pages+0x9/0xb0
[   26.243096]  [] ? copy_mount_options+0x35/0x1c0
[   26.243098]  [] SyS_mount+0x8a/0xf0
[   26.243101]  [] system_call_fastpath+0x12/0x79
[   26.243102] ---[ end trace 666fc5daa976e384 ]---
[   26.243108] BTRFS error (device sda1): Error removing orphan entry,
stopping orphan cleanup
[   26.243110] BTRFS error (device sda1): could not do orphan cleanup -22

Linux  4.1.7-hardened-r1 #3 SMP Thu Dec 24 14:19:04 CET 2015 x86_64
Intel(R) Core(TM) i7 CPU M 640 @ 2.80GHz GenuineIntel GNU/Linux
btrfs-progs v4.3.1

Is there something I can do about it?

Thank you in advance
Pavol
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send error

2016-02-17 Thread Pavol Cupka
Will do.
Thank you.


On Wed, Feb 17, 2016 at 3:01 PM, Filipe Manana  wrote:
> On Wed, Feb 17, 2016 at 1:02 PM, Pavol Cupka  wrote:
>> Hi all,
>>
>> when trying to btrfs send a snaphost I get this in dmesg:
>> [8614395.539466] BTRFS error (device sda1): did not find backref in
>> send_root. inode=673755, offset=131072, disk_byte=25730310144 found
>> extent=25730310144
>>
>> scrub reveals no errors
>> scrub status for bb8094e8-a7b1-4c2d-854e-77a9921e6f7e
>> scrub started at Wed Feb 17 08:14:32 2016 and finished after 00:15:58
>> total bytes scrubbed: 67.49GiB with 0 errors
>>
>> Is there something else that I might try to get this work?
>
> Just upgrade to a 4.3 or 4.4 kernel, or build a kernel with the patch below.
> The fix for this landed in 4.3:
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d6589101b67a55107652050dfbf414403a93e351
>
>
>>
>> uname -a
>> Linux 4.1.7-hardened-r1 #4 SMP Mon Nov 9 20:02:04 CET 2015 x86_64
>> Intel(R) Atom(TM) CPU N2800 @ 1.86GHz GenuineIntel GNU/Linux
>>
>> btrfs --version
>> btrfs-progs v4.3.1
>>
>> Thanks Pavol
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send error

2016-02-17 Thread Filipe Manana
On Wed, Feb 17, 2016 at 1:02 PM, Pavol Cupka  wrote:
> Hi all,
>
> when trying to btrfs send a snaphost I get this in dmesg:
> [8614395.539466] BTRFS error (device sda1): did not find backref in
> send_root. inode=673755, offset=131072, disk_byte=25730310144 found
> extent=25730310144
>
> scrub reveals no errors
> scrub status for bb8094e8-a7b1-4c2d-854e-77a9921e6f7e
> scrub started at Wed Feb 17 08:14:32 2016 and finished after 00:15:58
> total bytes scrubbed: 67.49GiB with 0 errors
>
> Is there something else that I might try to get this work?

Just upgrade to a 4.3 or 4.4 kernel, or build a kernel with the patch below.
The fix for this landed in 4.3:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d6589101b67a55107652050dfbf414403a93e351


>
> uname -a
> Linux 4.1.7-hardened-r1 #4 SMP Mon Nov 9 20:02:04 CET 2015 x86_64
> Intel(R) Atom(TM) CPU N2800 @ 1.86GHz GenuineIntel GNU/Linux
>
> btrfs --version
> btrfs-progs v4.3.1
>
> Thanks Pavol
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs send error

2016-02-17 Thread Pavol Cupka
Hi all,

when trying to btrfs send a snaphost I get this in dmesg:
[8614395.539466] BTRFS error (device sda1): did not find backref in
send_root. inode=673755, offset=131072, disk_byte=25730310144 found
extent=25730310144

scrub reveals no errors
scrub status for bb8094e8-a7b1-4c2d-854e-77a9921e6f7e
scrub started at Wed Feb 17 08:14:32 2016 and finished after 00:15:58
total bytes scrubbed: 67.49GiB with 0 errors

Is there something else that I might try to get this work?

uname -a
Linux 4.1.7-hardened-r1 #4 SMP Mon Nov 9 20:02:04 CET 2015 x86_64
Intel(R) Atom(TM) CPU N2800 @ 1.86GHz GenuineIntel GNU/Linux

btrfs --version
btrfs-progs v4.3.1

Thanks Pavol
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Docs]? Only one Subvolume with DUP (or different parameters)?

2016-02-17 Thread Austin S. Hemmelgarn

On 2016-02-16 23:49, Duncan wrote:

Christian Völker posted on Tue, 16 Feb 2016 20:25:47 +0100 as excerpted:


sorry for the simple question and I assume every developer here laughs
about this question.

Anyway:

I have read loads of documents but did not find an answer for sure. Even
though I assume I am right.

On a btrfs filesystem created; is it possible to have subvolumes with
data duplication and another subvolume without (resp. with just metadata
duplication)?

I have some large filesystems currently with ext4 and I am thinking of
changing to btrfs. Some of the data is more important than others. So I
want to have data duplication on the important files (sorted in a mount
point) and without for the other subvolume.

So I want to have the advantage of redundancy of important files
combined with the flexibility of the volume manager and shared disk
space.


As Hugo mentions, that's not possible at this point, tho the plan is to
make replication mode a per-subvolume or even per-file property at some
still-future point.  Given the rate of progress, however, that future
point is extremely likely to be at least two years out and could well be 5
+ years out -- IOW, it's nothing you could plan for at this point.

However, it's quite possible to do multiple separate btrfs, with each one
having its own replication mode.  That is, in fact, what I do here, tho
in my case it's more to keep all my data eggs from being in the same
filesystem basket, in case btrfs decides to eat a filesystem, than it is
for different replication (most of mine are raid1 across partitions on
two devices).  If the filesystem goes, being on a different subvolume
from whatever triggered the problem isn't going to help much, while being
on a different filesystem entirely, which might have been mounted
read-only or not even mounted at all at the time, very likely will, and I
prefer not having all my data eggs in the same filesystem basket,
particularly when that filesystem basket is btrfs, which while
stabilizing, isn't yet full stable and mature yet, even if it means a bit
more hands-on administration than would simply shoving everything in the
same basket and hoping the bottom doesn't drop out of it.

Tho that might be just me...

It's not just you, I do so myself, and usually make the same 
recommendation to people who ask me about it.  This was in fact the 
usual recommendation on most old UNIX systems as well, although that was 
just as much because of small disks and a lack of logical volume 
management as it was about data safety.  The big difference these days 
is where the splits are made.  Traditional UNIX often had /var, /usr, 
/home, and / as separate filesystems, often with multiple /home 
filesystems on big systems.  Most modern Linux distributions can't 
handle /usr and /var being on separate filesystems from /, so people 
split things differently when the split things at all (the good 
installers usually provide options to easily do common splits like a 
separate /home, but most make assumptions about how you lay things out, 
this is part of the reason I prefer Gentoo, it lets you do things 
however you want as long as you can configure it yourself).  In my case 
for example, I split out stuff that's trivial to recreate (/usr/portage, 
/var/lib/layman, /usr/src), stuff that I don't want backed up for other 
reasons (I have a separate filesystem I use for local copies of public 
VCS repositories for example), and stuff that should just be on a 
separate filesystem regardless (either because it needs different 
replication, or because it needs to be more isolated from the rest of 
the system, back-end storage for the GlusterFS storage cluster I'm 
building being an excellent example).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix ifnullfree.cocci warnings

2016-02-17 Thread kbuild test robot
fs/btrfs/volumes.c:1886:2-7: WARNING: NULL check before freeing functions like 
kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. 
Maybe consider reorganizing relevant code to avoid passing NULL values.

 NULL check before some freeing functions is not needed.

 Based on checkpatch warning
 "kfree(NULL) is safe this check is probably not required"
 and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci

CC: Anand Jain 
Signed-off-by: Fengguang Wu 
---

 volumes.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1882,8 +1882,7 @@ int btrfs_rm_device(struct btrfs_root *r
}
 
 out:
-   if (dev_name)
-   kfree(dev_name);
+   kfree(dev_name);
 
mutex_unlock(_mutex);
return ret;
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 12/13] btrfs: introduce device delete by devid

2016-02-17 Thread David Sterba
On Sat, Feb 13, 2016 at 10:01:39AM +0800, Anand Jain wrote:
> + if (vol_args->flags & BTRFS_DEVICE_BY_ID) {
> + ret = btrfs_rm_device(root, NULL, vol_args->devid);
> + } else {
> + vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';

   BTRFS_SUBVOL_NAME_MAX

Spotted by Chris,

fs/btrfs/ioctl.c:2703: warning: array subscript is above array bounds

my gcc version does not report that. Fixed and for-next pushed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] generic/304: fix high offset

2016-02-17 Thread Christoph Hellwig
On Wed, Feb 17, 2016 at 10:04:23AM +, Filipe Manana wrote:
> > What does btrfs return here?
> 
> -EINVAL, so this makes the test pass on btrfs.

Ok, looks fine to me then:

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] generic/304: fix high offset

2016-02-17 Thread Filipe Manana
On Wed, Feb 17, 2016 at 9:56 AM, Christoph Hellwig  wrote:
> On Mon, Feb 15, 2016 at 12:46:14PM -0800, Darrick J. Wong wrote:
>> "Invalid argument" is a better response to an impossibly high offset
>> dedupe request than "extents don't match", so change the test.
>
> What does btrfs return here?

-EINVAL, so this makes the test pass on btrfs.

> --
> To unsubscribe from this list: send the line "unsubscribe fstests" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] xfs/24[356]: check for -c switch to xfs_io bmap command

2016-02-17 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] punch-alternating: use the block size reported by the fs for punching

2016-02-17 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] generic/304: fix high offset

2016-02-17 Thread Christoph Hellwig
On Mon, Feb 15, 2016 at 12:46:14PM -0800, Darrick J. Wong wrote:
> "Invalid argument" is a better response to an impossibly high offset
> dedupe request than "extents don't match", so change the test.

What does btrfs return here?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html