date:20141219

Re: [PATCH v3] xfstests: btrfs: add test case for qgroup account on shared extents

2014-12-19 Thread Liu Bo

On Thu, Dec 18, 2014 at 11:05:30AM +1100, Dave Chinner wrote:
 On Wed, Dec 17, 2014 at 04:30:47PM +0800, Liu Bo wrote:
  This is a regression test of
  'commit fcebe4562dec (Btrfs: rework qgroup accounting)'
  
  It can produce qgroup related warnings.
  
  The fix is https://patchwork.kernel.org/patch/5499981/
  Btrfs: fix a warning of qgroup account on shared extents
 
  +#! /bin/bash
  +# FS QA Test No. 017
  +#
  +# Regression of 'commit fcebe4562dec (Btrfs: rework qgroup accounting)',
  +# this will throw a warning into dmesg.
  +#
  +# For more details, the fix is https://patchwork.kernel.org/patch/5499981/
  +# Btrfs: fix a warning of qgroup account on shared extents
 
 Please describe the test directly.
 
  +
  +_need_to_be_root
  +_supported_fs btrfs
  +_supported_os Linux
  +_require_scratch
  +_require_cloner
  +
  +run_check _scratch_mkfs --nodesize 4096
  +run_check _scratch_mount
 
 No, please don't use run_check like this.
 
 Errors will end up in the output file, and that will cause the test
 to fail.
 
  +run_check $XFS_IO_PROG -f -d -c pwrite 0 8K $SCRATCH_MNT/foo
 
 Same - filter the output, and errors will be verbose and cause a
 failure.
 
  +
  +_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
  +
  +run_check $CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo 
  $SCRATCH_MNT/foo-reflink
  +run_check $CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo 
  $SCRATCH_MNT/snap/foo-reflink
  +run_check $CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo 
  $SCRATCH_MNT/snap/foo-reflink2
 
 Filter the output, not run_check.
 
 If CLONER_PROG is silent when it fails, then it is broken and needs
 fixing because users need to know that something failed and they
 don't check exit codes.
 
  +_run_btrfs_util_prog quota enable $SCRATCH_MNT
  +_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
  +
  +rm -fr $SCRATCH_MNT/* /dev/null 21
 
 Don't redirect the output. If an unlink fails, we want to know about
 it.
 
  +_run_btrfs_util_prog filesystem sync $SCRATCH_MNT
 
 What's wrong with sync?
 
  +$BTRFS_UTIL_PROG qgroup show $SCRATCH_MNT | $SED_PROG -n '/[0-9]/p' | 
  $AWK_PROG '{print $2 $3}'
 
 You can do regex matches with awk.

Thanks for reviewing this.

Thanks,

-liubo

 
 Cheers,
 
 Dave.
 -- 
 Dave Chinner
 da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4] xfstests: btrfs: add test case for qgroup account on shared extents

2014-12-19 Thread Liu Bo

This is a regression test of
'commit fcebe4562dec (Btrfs: rework qgroup accounting)',
it's used to verify that removing shared extents can end up incorrect
qgroup accounting.

It can produce qgroup related warnings.

The fix is https://patchwork.kernel.org/patch/5499981/
Btrfs: fix a warning of qgroup account on shared extents

Signed-off-by: Liu Bo bo.li@oracle.com
Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Reviewed-by: Eryu Guan eg...@redhat.com
---
v4: - remove inproper run_check macro and add filter macro for xfs_io
- use awk's regexp directly
- add test case description

v3: - remove trailing whilespace.
- add the fix link for more details of the problem.

v2: - use new seq number 017 instead 080
- use 'cloner' to get shared extents
- use XFS_IO_PROG instead

 tests/btrfs/017 | 82 +
 tests/btrfs/017.out |  5 
 tests/btrfs/group   |  1 +
 3 files changed, 88 insertions(+)
 create mode 100755 tests/btrfs/017
 create mode 100644 tests/btrfs/017.out

diff --git a/tests/btrfs/017 b/tests/btrfs/017
new file mode 100755
index 000..7937607
--- /dev/null
+++ b/tests/btrfs/017
@@ -0,0 +1,82 @@
+#! /bin/bash
+# FS QA Test No. 017
+#
+# Verify that removing shared extents can end up incorrect qgroup accounting.
+#
+# Regression of 'commit fcebe4562dec (Btrfs: rework qgroup accounting)',
+# this will throw a warning into dmesg.
+#
+# The issue is fixed by https://patchwork.kernel.org/patch/5499981/
+# Btrfs: fix a warning of qgroup account on shared extents
+#
+#---
+# Copyright (c) 2014 Liu Bo.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap _cleanup; exit \$status 0 1 2 3 15
+
+_cleanup()
+{
+cd /
+rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_cloner
+
+rm -f $seqres.full
+
+_scratch_mkfs --nodesize 4096
+_scratch_mount
+
+$XFS_IO_PROG -f -d -c pwrite 0 8K $SCRATCH_MNT/foo | _filter_xfs_io
+
+_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
+
+$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo-reflink
+$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/snap/foo-reflink
+$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/snap/foo-reflink2
+
+_run_btrfs_util_prog quota enable $SCRATCH_MNT
+_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
+
+rm -fr $SCRATCH_MNT/foo*
+rm -fr $SCRATCH_MNT/snap/foo*
+
+sync
+
+$BTRFS_UTIL_PROG qgroup show $SCRATCH_MNT | $AWK_PROG '/[0-9]/ {print $2 $3}'
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/017.out b/tests/btrfs/017.out
new file mode 100644
index 000..7658e2e
--- /dev/null
+++ b/tests/btrfs/017.out
@@ -0,0 +1,5 @@
+QA output created by 017
+wrote 8192/8192 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+4096 4096
+4096 4096
diff --git a/tests/btrfs/group b/tests/btrfs/group
index abb2fe4..e2b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -19,6 +19,7 @@
 014 auto balance
 015 auto quick snapshot
 016 auto quick send
+017 auto quick qgroup
 018 auto quick subvol
 019 auto quick send
 020 auto quick replace
-- 
1.8.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oddly slow read performance with near-full largish FS

2014-12-19 Thread Satoru Takeuchi


Hi,

Sorry for late reply. Let me ask some questions.

On 2014/12/17 11:42, Charles Cazabon wrote:

Hi,

I've been running btrfs for various filesystems for a few years now, and have
recently run into problems with a large filesystem becoming *really* slow for
basic reading.  None of the debugging/testing suggestions I've come across in
the wiki or in the mailing list archives seems to have helped.

Background: this particular filesystem holds backups for various other
machines on the network, a mix of rdiff-backup data (so lots of small files)
and rsync copies of larger files (everything from ~5MB data files to ~60GB VM
HD images).  There's roughly 16TB of data in this filesystem (the filesystem
is ~17TB).  The btrfs filesystem is a simple single volume, no snapshots,
multiple devices, or anything like that.  It's an LVM logical volume on top of
dmcrypt on top of an mdadm RAID set (8 disks in RAID 6).


Q1. You mean your Btrfs file system exists on the top of
the following deep layers?

+---+
|Btrfs(single)  |
+---+
|LVM(non RAID?) |
+---+
|dmcrypt|
+---+
|mdadm RAID set |
+---+

# Unfortunately, I don't know how Btrfs works in conjunction
#with such a deep layers.

Q2. If Q1 is true, is it possible to reduce that layers as follows?

+---+
|Btrfs(*1)  |
+---+
|dmcrypt|
+---+

It's because there are too many layers and these have
the same/similar features and heavy layered file system
tends to cause more trouble than thinner layered ones
regardless of file system type.

*1) Currently I don't recommend you to use RAID56 of Btrfs.
So, if RAID6 is mandatory, mdadm RAID6 is also necessary.



The performance:  trying to copy the data off this filesystem to another
(non-btrfs) filesystem with rsync or just cp was taking ges - I found one
suggestion that it could be because updating the atimes required a COW of the
metadata in btrfs, so I mounted the filesystem noatime, but this doesn't
appear to have made any difference.  The speeds I'm seeing (with iotop)
fluctuate a lot.  They spend most of the time in the range of 1-3 MB/s, with
large periods of time where no IO seems to happen at all, and occasional short
spikes to ~25-30 MB/s.  System load seems to sit around 10-12 (with only 2
processes reported as running, everything else sleeping) while this happens.
The server is doing nothing other than this copy at the time.  The only
processes using any noticable CPU are rsync (source and destination processes,
around 3% CPU each, plus an md0:raid6 process around 2-3%), and a handful of
kworker processes, perhaps one per CPU (there are 8 physical cores in the
server, plus hyperthreading).

Other filesystems on the same physical disks have no trouble exceeding 100MB/s
reads.  The machine is not swapping (16GB RAM, ~8GB swap with 0 swap used).


Q3. They are also consist of the following layers?

+---+
|XFS/ext4   |
+---+
|LVM(non RAID?) |
+---+
|dmcrypt|
+---+
|mdadm RAID set |
+---+

Q4. Are other filesystems also near-full?

Q5. Is there any error/warning message about
Btrfs/LVM/dmcrypt/mdadm/hardwares?

Thanks,
Satoru



Is there something obvious I'm missing here?  Is there a reason I can only
average ~3MB/s reads from a btrfs filesystem?

kernel is x86_64 linux-stable 3.17.6.  btrfs-progs is v3.17.3-3-g8cb0438.
Output of the various info commands is:

   $ sudo btrfs fi df /media/backup/
   Data, single: total=16.24TiB, used=15.73TiB
   System, DUP: total=8.00MiB, used=1.75MiB
   System, single: total=4.00MiB, used=0.00
   Metadata, DUP: total=35.50GiB, used=34.05GiB
   Metadata, single: total=8.00MiB, used=0.00
   unknown, single: total=512.00MiB, used=0.00

   $ btrfs --version
   Btrfs v3.17.3-3-g8cb0438

   $ sudo btrfs fi show

   Label: 'backup'  uuid: c18dfd04-d931-4269-b999-e94df3b1918c
 Total devices 1 FS bytes used 15.76TiB
 devid1 size 16.37TiB used 16.31TiB path /dev/mapper/vg-backup

Thanks in advance for any suggestions.

Charles



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] xfstests: btrfs: add test case for qgroup account on shared extents

2014-12-19 Thread Satoru Takeuchi

Hi Liu,

On 2014/12/19 17:31, Liu Bo wrote:
 This is a regression test of
 'commit fcebe4562dec (Btrfs: rework qgroup accounting)',
 it's used to verify that removing shared extents can end up incorrect
 qgroup accounting.
 
 It can produce qgroup related warnings.
 
 The fix is https://patchwork.kernel.org/patch/5499981/
 Btrfs: fix a warning of qgroup account on shared extents
 
 Signed-off-by: Liu Bo bo.li@oracle.com
 Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

V4 also passed my test,

Thanks,
Satoru

 Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 Reviewed-by: Eryu Guan eg...@redhat.com
 ---
 v4: - remove inproper run_check macro and add filter macro for xfs_io
  - use awk's regexp directly
  - add test case description
 
 v3: - remove trailing whilespace.
  - add the fix link for more details of the problem.
 
 v2: - use new seq number 017 instead 080
  - use 'cloner' to get shared extents
  - use XFS_IO_PROG instead
 
   tests/btrfs/017 | 82 
 +
   tests/btrfs/017.out |  5 
   tests/btrfs/group   |  1 +
   3 files changed, 88 insertions(+)
   create mode 100755 tests/btrfs/017
   create mode 100644 tests/btrfs/017.out
 
 diff --git a/tests/btrfs/017 b/tests/btrfs/017
 new file mode 100755
 index 000..7937607
 --- /dev/null
 +++ b/tests/btrfs/017
 @@ -0,0 +1,82 @@
 +#! /bin/bash
 +# FS QA Test No. 017
 +#
 +# Verify that removing shared extents can end up incorrect qgroup accounting.
 +#
 +# Regression of 'commit fcebe4562dec (Btrfs: rework qgroup accounting)',
 +# this will throw a warning into dmesg.
 +#
 +# The issue is fixed by https://patchwork.kernel.org/patch/5499981/
 +# Btrfs: fix a warning of qgroup account on shared extents
 +#
 +#---
 +# Copyright (c) 2014 Liu Bo.  All Rights Reserved.
 +#
 +# This program is free software; you can redistribute it and/or
 +# modify it under the terms of the GNU General Public License as
 +# published by the Free Software Foundation.
 +#
 +# This program is distributed in the hope that it would be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program; if not, write the Free Software Foundation,
 +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 +#---
 +#
 +
 +seq=`basename $0`
 +seqres=$RESULT_DIR/$seq
 +echo QA output created by $seq
 +
 +here=`pwd`
 +tmp=/tmp/$$
 +status=1 # failure is the default!
 +trap _cleanup; exit \$status 0 1 2 3 15
 +
 +_cleanup()
 +{
 +cd /
 +rm -f $tmp.*
 +}
 +
 +# get standard environment, filters and checks
 +. ./common/rc
 +. ./common/filter
 +
 +# real QA test starts here
 +
 +_need_to_be_root
 +_supported_fs btrfs
 +_supported_os Linux
 +_require_scratch
 +_require_cloner
 +
 +rm -f $seqres.full
 +
 +_scratch_mkfs --nodesize 4096
 +_scratch_mount
 +
 +$XFS_IO_PROG -f -d -c pwrite 0 8K $SCRATCH_MNT/foo | _filter_xfs_io
 +
 +_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
 +
 +$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo-reflink
 +$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/snap/foo-reflink
 +$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo 
 $SCRATCH_MNT/snap/foo-reflink2
 +
 +_run_btrfs_util_prog quota enable $SCRATCH_MNT
 +_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
 +
 +rm -fr $SCRATCH_MNT/foo*
 +rm -fr $SCRATCH_MNT/snap/foo*
 +
 +sync
 +
 +$BTRFS_UTIL_PROG qgroup show $SCRATCH_MNT | $AWK_PROG '/[0-9]/ {print $2 
 $3}'
 +
 +# success, all done
 +status=0
 +exit
 diff --git a/tests/btrfs/017.out b/tests/btrfs/017.out
 new file mode 100644
 index 000..7658e2e
 --- /dev/null
 +++ b/tests/btrfs/017.out
 @@ -0,0 +1,5 @@
 +QA output created by 017
 +wrote 8192/8192 bytes at offset 0
 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +4096 4096
 +4096 4096
 diff --git a/tests/btrfs/group b/tests/btrfs/group
 index abb2fe4..e2b 100644
 --- a/tests/btrfs/group
 +++ b/tests/btrfs/group
 @@ -19,6 +19,7 @@
   014 auto balance
   015 auto quick snapshot
   016 auto quick send
 +017 auto quick qgroup
   018 auto quick subvol
   019 auto quick send
   020 auto quick replace
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Cleanup warnings from clang

2014-12-19 Thread Satoru Takeuchi

Hi,

On 2014/12/19 15:13, Qu Wenruo wrote:
 Cleanup warning when compile btrfs-progs with clang.
 
 Clang analyser also reports about 60+ errors, but it will be another
 patchset fixing it later.
 
 Qu Wenruo (5):
btrfs-progs: Makefile: Move linker only option to LDFLAGS
btrfs-progs: Fix a clang dead-judgement warning in disk-io.c.
btrfs-progs: Remove a unused function root_gtp_mask().
btrfs-progs: Remove a unused function offset_to_bitmap()
btrfs-progs: Remove deprecated _BSD_SOURCE macro.

All these patches looks good to me.

Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

Thanks,
Satoru

 
   Makefile   |  3 ++-
   cmds-receive.c |  2 +-
   disk-io.c  |  9 ++---
   free-space-cache.c | 16 
   radix-tree.c   |  5 -
   5 files changed, 9 insertions(+), 26 deletions(-)
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] xfstests: btrfs: add test case for qgroup account on shared extents

2014-12-19 Thread Liu Bo

Hi Satoru san,
On Fri, Dec 19, 2014 at 06:21:30PM +0900, Satoru Takeuchi wrote:
 Hi Liu,
 
 On 2014/12/19 17:31, Liu Bo wrote:
  This is a regression test of
  'commit fcebe4562dec (Btrfs: rework qgroup accounting)',
  it's used to verify that removing shared extents can end up incorrect
  qgroup accounting.
  
  It can produce qgroup related warnings.
  
  The fix is https://patchwork.kernel.org/patch/5499981/
  Btrfs: fix a warning of qgroup account on shared extents
  
  Signed-off-by: Liu Bo bo.li@oracle.com
  Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 
 V4 also passed my test,

Thanks for your active testing!

Thanks,

-liubo

 
 Thanks,
 Satoru
 
  Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
  Reviewed-by: Eryu Guan eg...@redhat.com
  ---
  v4: - remove inproper run_check macro and add filter macro for xfs_io
   - use awk's regexp directly
   - add test case description
  
  v3: - remove trailing whilespace.
   - add the fix link for more details of the problem.
  
  v2: - use new seq number 017 instead 080
   - use 'cloner' to get shared extents
   - use XFS_IO_PROG instead
  
tests/btrfs/017 | 82 
  +
tests/btrfs/017.out |  5 
tests/btrfs/group   |  1 +
3 files changed, 88 insertions(+)
create mode 100755 tests/btrfs/017
create mode 100644 tests/btrfs/017.out
  
  diff --git a/tests/btrfs/017 b/tests/btrfs/017
  new file mode 100755
  index 000..7937607
  --- /dev/null
  +++ b/tests/btrfs/017
  @@ -0,0 +1,82 @@
  +#! /bin/bash
  +# FS QA Test No. 017
  +#
  +# Verify that removing shared extents can end up incorrect qgroup 
  accounting.
  +#
  +# Regression of 'commit fcebe4562dec (Btrfs: rework qgroup accounting)',
  +# this will throw a warning into dmesg.
  +#
  +# The issue is fixed by https://patchwork.kernel.org/patch/5499981/
  +# Btrfs: fix a warning of qgroup account on shared extents
  +#
  +#---
  +# Copyright (c) 2014 Liu Bo.  All Rights Reserved.
  +#
  +# This program is free software; you can redistribute it and/or
  +# modify it under the terms of the GNU General Public License as
  +# published by the Free Software Foundation.
  +#
  +# This program is distributed in the hope that it would be useful,
  +# but WITHOUT ANY WARRANTY; without even the implied warranty of
  +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  +# GNU General Public License for more details.
  +#
  +# You should have received a copy of the GNU General Public License
  +# along with this program; if not, write the Free Software Foundation,
  +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  +#---
  +#
  +
  +seq=`basename $0`
  +seqres=$RESULT_DIR/$seq
  +echo QA output created by $seq
  +
  +here=`pwd`
  +tmp=/tmp/$$
  +status=1   # failure is the default!
  +trap _cleanup; exit \$status 0 1 2 3 15
  +
  +_cleanup()
  +{
  +cd /
  +rm -f $tmp.*
  +}
  +
  +# get standard environment, filters and checks
  +. ./common/rc
  +. ./common/filter
  +
  +# real QA test starts here
  +
  +_need_to_be_root
  +_supported_fs btrfs
  +_supported_os Linux
  +_require_scratch
  +_require_cloner
  +
  +rm -f $seqres.full
  +
  +_scratch_mkfs --nodesize 4096
  +_scratch_mount
  +
  +$XFS_IO_PROG -f -d -c pwrite 0 8K $SCRATCH_MNT/foo | _filter_xfs_io
  +
  +_run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
  +
  +$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo-reflink
  +$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo 
  $SCRATCH_MNT/snap/foo-reflink
  +$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo 
  $SCRATCH_MNT/snap/foo-reflink2
  +
  +_run_btrfs_util_prog quota enable $SCRATCH_MNT
  +_run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
  +
  +rm -fr $SCRATCH_MNT/foo*
  +rm -fr $SCRATCH_MNT/snap/foo*
  +
  +sync
  +
  +$BTRFS_UTIL_PROG qgroup show $SCRATCH_MNT | $AWK_PROG '/[0-9]/ {print $2 
  $3}'
  +
  +# success, all done
  +status=0
  +exit
  diff --git a/tests/btrfs/017.out b/tests/btrfs/017.out
  new file mode 100644
  index 000..7658e2e
  --- /dev/null
  +++ b/tests/btrfs/017.out
  @@ -0,0 +1,5 @@
  +QA output created by 017
  +wrote 8192/8192 bytes at offset 0
  +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  +4096 4096
  +4096 4096
  diff --git a/tests/btrfs/group b/tests/btrfs/group
  index abb2fe4..e2b 100644
  --- a/tests/btrfs/group
  +++ b/tests/btrfs/group
  @@ -19,6 +19,7 @@
014 auto balance
015 auto quick snapshot
016 auto quick send
  +017 auto quick qgroup
018 auto quick subvol
019 auto quick send
020 auto quick replace
  
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [PATCH 3/6] btrfs-progs: fi usage, update manpage

2014-12-19 Thread Satoru Takeuchi

Hi David,

 On 2014/12/18 23:27, David Sterba wrote:
 Signed-off-by: David Sterba dste...@suse.cz
 ---
Documentation/btrfs-filesystem.txt | 28 
1 file changed, 28 insertions(+)

 diff --git a/Documentation/btrfs-filesystem.txt 
 b/Documentation/btrfs-filesystem.txt
 index a8f2972a0e1a..85a94eb52569 100644
 --- a/Documentation/btrfs-filesystem.txt
 +++ b/Documentation/btrfs-filesystem.txt
 @@ -123,6 +123,34 @@ Show or update the label of a filesystem.
If a newlabel optional argument is passed, the label is changed.
NOTE: the maximum allowable length shall be less than 256 chars

 +*usage* [options] path [path...]::
 +Show detailed information about internal filesystem usage.
 
 Options from -b to -t are the completely same as btrfs fi df's ones.
 So how about pointing the df's options as follows?
 
 ===
 ...
 +
 `Options`
 +
 -T
 Show data in tabular format
 +
 There are some option to set unit. See description of *df*'s options
 from '-b' to '-t'.
 +
 If conflicting options are passed, the last one takes precedence.
 ...
 ===
 
 I consider it can prevent mistakes caused by further changes.

This patch seems to already be in devel/integration-20141218.
Here is the patch example.


---
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Subject: [PATCH] btrfs-progs: Cleanup: Fix the redundancy of btrfs-filesystem.

Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

---
 Documentation/btrfs-filesystem.txt | 21 +++--
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/Documentation/btrfs-filesystem.txt 
b/Documentation/btrfs-filesystem.txt
index 85a94eb..a8e7431 100644
--- a/Documentation/btrfs-filesystem.txt
+++ b/Documentation/btrfs-filesystem.txt
@@ -128,27 +128,12 @@ Show detailed information about internal filesystem usage.
 +
 `Options`
 +
--b|--raw
-raw numbers in bytes, without the 'B' suffix
--h
-print human friendly numbers, base 1024, this is the default
--H
-print human friendly numbers, base 1000
---iec
-select the 1024 base for the following options, according to the IEC standard
---si
-select the 1000 base for the following options, according to the SI standard
--k|--kbytes
-show sizes in KiB, or kB with --si
--m|--mbytes
-show sizes in MiB, or MB with --si
--g|--gbytes
-show sizes in GiB, or GB with --si
--t|--tbytes
-show sizes in TiB, or TB with --si
 -T
 show data in tabular format
 
+There are some options to set unit. See the description of *df* subcommand
+from '-b' option to '-t' option.
+
 If conflicting options are passed, the last one takes precedence.
 
 EXIT STATUS
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: generic checksum framework

2014-12-19 Thread Liu Bo

This changes the original crc32c specific checksum functions into more generic
ones, so that converting to a new checksum algorithm can be transparent to btrfs
internal code.

Note that file names' lookup and extent_data_ref's hashing use crc32c with their
own seed instead of the default ~0, so they remain unchanged.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/check-integrity.c  |  10 +++--
 fs/btrfs/compression.c  |  30 ++---
 fs/btrfs/ctree.h|   5 ++-
 fs/btrfs/disk-io.c  | 100 +++-
 fs/btrfs/disk-io.h  |   2 -
 fs/btrfs/file-item.c|  25 +--
 fs/btrfs/free-space-cache.c |   8 ++--
 fs/btrfs/hash.c |  49 ++
 fs/btrfs/hash.h |   9 +++-
 fs/btrfs/inode.c|  21 ++
 fs/btrfs/ioctl.c|   1 +
 fs/btrfs/ordered-data.c |  10 +++--
 fs/btrfs/ordered-data.h |   9 ++--
 fs/btrfs/scrub.c|  55 +---
 14 files changed, 216 insertions(+), 118 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index d897ef8..4c26cfa 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -1790,8 +1790,8 @@ static int btrfsic_test_for_metadata(struct btrfsic_state 
*state,
 {
struct btrfs_header *h;
u8 csum[BTRFS_CSUM_SIZE];
-   u32 crc = ~(u32)0;
unsigned int i;
+   SHASH_DESC_ON_STACK(shash, state-root-fs_info-csum_tfm);
 
if (num_pages * PAGE_CACHE_SIZE  state-metablock_size)
return 1; /* not metadata */
@@ -1801,14 +1801,18 @@ static int btrfsic_test_for_metadata(struct 
btrfsic_state *state,
if (memcmp(h-fsid, state-root-fs_info-fsid, BTRFS_UUID_SIZE))
return 1;
 
+   shash-tfm = state-root-fs_info-csum_tfm;
+   shash-flags = 0;
+   crypto_shash_init(shash);
+
for (i = 0; i  num_pages; i++) {
u8 *data = i ? datav[i] : (datav[i] + BTRFS_CSUM_SIZE);
size_t sublen = i ? PAGE_CACHE_SIZE :
(PAGE_CACHE_SIZE - BTRFS_CSUM_SIZE);
 
-   crc = btrfs_crc32c(crc, data, sublen);
+   crypto_shash_update(shash, data, sublen);
}
-   btrfs_csum_final(crc, csum);
+   crypto_shash_final(shash, csum);
if (memcmp(csum, h-csum, state-csum_size))
return 1;
 
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index e9df886..42c2435 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -78,7 +78,7 @@ struct compressed_bio {
 * the start of a variable length array of checksums only
 * used by reads
 */
-   u32 sums;
+   u8 sums[];
 };
 
 static int btrfs_decompress_biovec(int type, struct page **pages_in,
@@ -111,31 +111,29 @@ static int check_compressed_csum(struct inode *inode,
struct page *page;
unsigned long i;
char *kaddr;
-   u32 csum;
-   u32 *cb_sum = cb-sums;
+   u8 csum[BTRFS_CSUM_SIZE];
+   u8 *cb_sum = cb-sums;
+   struct btrfs_fs_info *fs_info = BTRFS_I(inode)-root-fs_info;
+   u16 csum_size = btrfs_super_csum_size(fs_info-super_copy);
 
if (BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM)
return 0;
 
for (i = 0; i  cb-nr_pages; i++) {
page = cb-compressed_pages[i];
-   csum = ~(u32)0;
 
kaddr = kmap_atomic(page);
-   csum = btrfs_csum_data(kaddr, csum, PAGE_CACHE_SIZE);
-   btrfs_csum_final(csum, (char *)csum);
+   btrfs_csum(fs_info, kaddr, PAGE_CACHE_SIZE, csum);
kunmap_atomic(kaddr);
 
-   if (csum != *cb_sum) {
+   if (memcmp(csum, cb_sum, csum_size)) {
btrfs_info(BTRFS_I(inode)-root-fs_info,
-  csum failed ino %llu extent %llu csum %u wanted %u 
mirror %d,
-  btrfs_ino(inode), disk_start, csum, *cb_sum,
-  cb-mirror_num);
+  csum failed ino %llu extent %llu mirror %d,
+  btrfs_ino(inode), disk_start, cb-mirror_num);
ret = -EIO;
goto fail;
}
-   cb_sum++;
-
+   cb_sum += csum_size;
}
ret = 0;
 fail:
@@ -584,7 +582,8 @@ int btrfs_submit_compressed_read(struct inode *inode, 
struct bio *bio,
struct extent_map *em;
int ret = -ENOMEM;
int faili = 0;
-   u32 *sums;
+   u8 *sums;
+   u16 csum_size = btrfs_super_csum_size(root-fs_info-super_copy);
 
tree = BTRFS_I(inode)-io_tree;
em_tree = BTRFS_I(inode)-extent_tree;
@@ -607,7 +606,7 @@ int btrfs_submit_compressed_read(struct inode *inode, 
struct bio *bio,
cb-errors = 0;
cb-inode = inode;
cb-mirror_num = mirror_num;
-   sums =

Re: [PATCH 3/6] btrfs-progs: fi usage, update manpage

2014-12-19 Thread David Sterba

On Fri, Dec 19, 2014 at 06:56:43PM +0900, Satoru Takeuchi wrote:
 +There are some options to set unit. See the description of *df* subcommand
 +from '-b' option to '-t' option.

The unit options exist only for very few subcommands so I found it more
convenient to list all of them near to the command itself rather than
pointing somewhere else.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs-progs: integration-20141218 possible corruption test regression

2014-12-19 Thread David Sterba

On Fri, Dec 19, 2014 at 02:23:12PM +0800, Qu Wenruo wrote:
 In fact, it's not a regression.
 
 The 013 testcase is a special case that uses a script to corrupt the 
 image and then do the btrfsck test.
 
 There is a patch before the commit, to allow btrfs-progs test script 
 call corruption script.
 But since there is still some discussion about the corruption script and 
 maybe later verify script,
 the previous patch is not picked.

Yes, I'm waiting for final version of the patch, but wanted to add all
test images that were available.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Cleanup warnings from clang

2014-12-19 Thread David Sterba

On Fri, Dec 19, 2014 at 06:27:36PM +0900, Satoru Takeuchi wrote:
  Qu Wenruo (5):
 btrfs-progs: Makefile: Move linker only option to LDFLAGS
 btrfs-progs: Fix a clang dead-judgement warning in disk-io.c.
 btrfs-progs: Remove a unused function root_gtp_mask().
 btrfs-progs: Remove a unused function offset_to_bitmap()
 btrfs-progs: Remove deprecated _BSD_SOURCE macro.
 
 All these patches looks good to me.
 
 Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

All added.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[GIT PULL] Btrfs pull part two

2014-12-19 Thread Chris Mason


Hi Linus,

Please pull my for-linus branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

It has part two of our merge window patches.  These are all from Filipe,
and fix some really hard to find races that can cause corruptions.  Most
of them involved block group removal (balance) or discard.

Filipe Manana (4) commits (+20/-25):
Btrfs: fix fs corruption on transaction abort if device supports discard 
(+6/-10)
Btrfs: always clear a block group node when removing it from the tree 
(+3/-0)
Btrfs: remove non-sense btrfs_error_discard_extent() function (+6/-12)
Btrfs: ensure deletion from pinned_chunks list is protected (+5/-3)

Total: (4) commits (+20/-25)

 fs/btrfs/ctree.h|  4 ++--
 fs/btrfs/disk-io.c  |  6 --
 fs/btrfs/extent-tree.c  | 23 +++
 fs/btrfs/free-space-cache.c | 12 +++-
 4 files changed, 20 insertions(+), 25 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] Btrfs progs, coverity fixes for 3.18

2014-12-19 Thread David Sterba

A few straightforward fixes.

David Sterba (6):
  btrfs-progs: corrupt block, add missing break to option I
  btrfs-progs: corrupt block, add break after option U
  btrfs-progs: fragments, close output file on error
  btrfs-progs: check result of first_cache_extent
  btrfs-progs: check allocation result in add_clone_source
  btrfs-progs: let btrfs_free_path accept NULL

 btrfs-corrupt-block.c |  2 ++
 btrfs-fragments.c |  7 +--
 cmds-check.c  |  2 ++
 cmds-send.c   | 25 +
 ctree.c   |  2 ++
 5 files changed, 32 insertions(+), 6 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] btrfs-progs: corrupt block, add missing break to option I

2014-12-19 Thread David Sterba

Using -I would imply -d.

Resolves-Coverity-CID: 1258792
Signed-off-by: David Sterba dste...@suse.cz
---
 btrfs-corrupt-block.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index af9ae4d4047c..aeeb1b298f66 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -1096,6 +1096,7 @@ int main(int ac, char **av)
break;
case 'I':
corrupt_item = 1;
+   break;
case 'd':
delete = 1;
break;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] btrfs-progs: corrupt block, add break after option U

2014-12-19 Thread David Sterba

Resolves-Coverity-CID: 1258793
Signed-off-by: David Sterba dste...@suse.cz
---
 btrfs-corrupt-block.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
index aeeb1b298f66..b477e878376b 100644
--- a/btrfs-corrupt-block.c
+++ b/btrfs-corrupt-block.c
@@ -1068,6 +1068,7 @@ int main(int ac, char **av)
break;
case 'U':
chunk_tree = 1;
+   break;
case 'i':
inode = arg_strtou64(optarg);
break;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] btrfs-progs: check result of first_cache_extent

2014-12-19 Thread David Sterba

If the tree's empty, we'll get NULL and dereference it.

Resolves-Coverity-CID: 1248828
Signed-off-by: David Sterba dste...@suse.cz
---
 cmds-check.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/cmds-check.c b/cmds-check.c
index 6eea36c2f52c..3e7a4ebdce44 100644
--- a/cmds-check.c
+++ b/cmds-check.c
@@ -8075,6 +8075,8 @@ static void free_roots_info_cache(void)
struct root_item_info *rii;
 
entry = first_cache_extent(roots_info_cache);
+   if (!entry)
+   break;
remove_cache_extent(roots_info_cache, entry);
rii = container_of(entry, struct root_item_info, cache_extent);
free(rii);
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] btrfs-progs: let btrfs_free_path accept NULL

2014-12-19 Thread David Sterba

Same in kernel and matches semantics of free().

Resolves-Coverity-CID: 1054881
Signed-off-by: David Sterba dste...@suse.cz
---
 ctree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ctree.c b/ctree.c
index bd6cb125b2a2..589efa3db17e 100644
--- a/ctree.c
+++ b/ctree.c
@@ -48,6 +48,8 @@ struct btrfs_path *btrfs_alloc_path(void)
 
 void btrfs_free_path(struct btrfs_path *p)
 {
+   if (!p)
+   return;
btrfs_release_path(p);
kfree(p);
 }
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] btrfs-progs: check allocation result in add_clone_source

2014-12-19 Thread David Sterba

Resolves-Coverity-CID: 1054894
Signed-off-by: David Sterba dste...@suse.cz
---
 cmds-send.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/cmds-send.c b/cmds-send.c
index b17b5e2ca666..9b32c1f0e624 100644
--- a/cmds-send.c
+++ b/cmds-send.c
@@ -172,11 +172,16 @@ out:
return ret;
 }
 
-static void add_clone_source(struct btrfs_send *s, u64 root_id)
+static int add_clone_source(struct btrfs_send *s, u64 root_id)
 {
s-clone_sources = realloc(s-clone_sources,
sizeof(*s-clone_sources) * (s-clone_sources_count + 1));
+
+   if (!s-clone_sources)
+   return -ENOMEM;
s-clone_sources[s-clone_sources_count++] = root_id;
+
+   return 0;
 }
 
 static int write_buf(int fd, const void *buf, int size)
@@ -475,7 +480,11 @@ int cmd_send(int argc, char **argv)
goto out;
}
 
-   add_clone_source(send, root_id);
+   ret = add_clone_source(send, root_id);
+   if (ret  0) {
+   fprintf(stderr, ERROR: not enough memory\n);
+   goto out;
+   }
subvol_uuid_search_finit(send.sus);
free(subvol);
subvol = NULL;
@@ -575,7 +584,11 @@ int cmd_send(int argc, char **argv)
goto out;
}
 
-   add_clone_source(send, parent_root_id);
+   ret = add_clone_source(send, parent_root_id);
+   if (ret  0) {
+   fprintf(stderr, ERROR: not enough memory\n);
+   goto out;
+   }
}
 
for (i = optind; i  argc; i++) {
@@ -671,7 +684,11 @@ int cmd_send(int argc, char **argv)
goto out;
 
/* done with this subvol, so add it to the clone sources */
-   add_clone_source(send, root_id);
+   ret = add_clone_source(send, root_id);
+   if (ret  0) {
+   fprintf(stderr, ERROR: not enough memory\n);
+   goto out;
+   }
 
parent_root_id = 0;
full_send = 0;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] btrfs-progs: fragments, close output file on error

2014-12-19 Thread David Sterba

Resolves-Coverity-CID: 1258794
Signed-off-by: David Sterba dste...@suse.cz
---
 btrfs-fragments.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/btrfs-fragments.c b/btrfs-fragments.c
index d03c2c3e7319..360f10f87bfa 100644
--- a/btrfs-fragments.c
+++ b/btrfs-fragments.c
@@ -233,7 +233,7 @@ list_fragments(int fd, u64 flags, char *dir)
ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
if (ret  0) {
fprintf(stderr, ERROR: can't perform the search\n);
-   return ret;
+   goto out_close;
}
/* the ioctl returns the number of item it found in nr_items */
if (sk-nr_items == 0)
@@ -373,7 +373,10 @@ skip:;
fprintf(html, /p);
}
fprintf(html, /body/html\n);
-   
+
+out_close:
+   fclose(html);
+
return ret;
 }
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs receive being very slow

2014-12-19 Thread Nick Dimov

Hello.

So I split the job in 2 tasks as per your suggestion. I create the
differential snapshot with btrfs send and save it on SSD - so far this 
is very efficient and the sending happens almost at full SSD speed.

When I try to receive the snapshot on the HDD - the speed is just as
low as before (as when I do ionice'd pipe). No ionice is used.

The hdd raw speed is, according to hdparm:
 Timing cached reads:   15848 MB in  2.00 seconds = 7928.82 MB/sec
 Timing buffered disk reads: 310 MB in  3.01 seconds = 103.02 MB/sec

And I have more than 100Gb free space on it, but the speed is still low.

So, as you mentioned it - I might be dealing with a very fragmented
system. Now there are some conclusions and questions:

 1. The btrfs send is out of question - it works great with or without
ionice.
 2. The receive is slow no matter what I do, even if run alone. (as for
the what kind of data is being sent, i sent the snapshot of / and
/home and both are slow for btrfs receive)
 3. How to check how fragmented the filesystem is? (i.e. i want to know
if this is the real cause)
 4. How to defragment all those read-only snapshots without breaking the
compatibility with differential btrfs send. (if i understand it
correctly the parent snapshot must be the same on source and
destination, is this correct?)
 5. Will making those snapshots writable, defragmenting them and
re-snapshoting them as read-only break compatibility with btrfs
differential send? E.g. will I still be able to btrfs receive a
differential snapshot after defragmentation?

Also for your suggestion to do it in a break - I would have done it but
it sometimes takes hours to sync, thats why i tried to ionice it so I
can work while it runs.

Thank you a lot for your explanations and effort!

On 15.12.2014 10:49, Robert White wrote:
 On 12/14/2014 11:41 PM, Nick Dimov wrote:
 Hi, thanks for the answer, I will answer between the lines.

 On 15.12.2014 08:45, Robert White wrote:
 On 12/14/2014 08:50 PM, Nick Dimov wrote:
 Hello everyone!

 First, thanks for amazing work on btrfs filesystem!

 Now the problem:
 I use a ssd as my system drive (/dev/sda2) and use daily snapshots on
 it. Then, from time to time, i sync those on HDD (/dev/sdb4) by using
 btrfs send / receive like this:

 ionice -c3 btrfs send -p /ssd/previously_synced_snapshot
 /ssd/snapshot-X
 | pv | btrfs receive /hdd/snapshots

 I use pv to measure speed and i get ridiculos speeds like 5-200kiB/s!
 (rarely it goes over 1miB). However if i replace the btrfs receive
 with
 cat /dev/null - the speed is 400-500MiB/s (almost full SSD speed)
 so I
 understand that the problem is the fs on the HDD... Do you have any
 idea
 of how to trace this problem down?


 You have _lots_ of problems with that above...

 (1) your ionice is causing the SSD to stall the send every time the
 receiver does _anything_.
 I will try to remove completely ionice - but them my system becomes
 irresponsive :(

 Yep, see below.

 Then again if it only goes bad for a minute or two, then just launch
 the backup right as you go for a break.


 (1a) The ionice doesn't apply to the pipeline, it only applies to the
 command it proceeds. So it's ionice -c btrfs send... then pipeline
 then btrfs receive at the default io scheduling class. You need to
 specify it twice, or wrap it all in a script.

 ionice -c 3 btrfs send -p /ssd/parent /ssd/snapshot-X |
 ionice -c 3 btrfs receive /hdd/snapshots
 This is usually what i do but I wanted to show that there is no throtle
 on the receiver. (i tested it with and without - the result is the same)

 (1b) Your comparison case is flawed because cat /dev/null results in
 no actual IO (e.g. writing to dev-null doesn't transfer any data
 anywhere, it just gets rubber-stamped okay at the kernel method level).
 This was only an intention to show that the sender itself is OK.

 I understood why you did it, I was just trying to point out that since
 there was no other IO competing with the btrfs send, it would give you
 are really outrageously false positive. Particularly if you always
 used ionice.


 (2) You probably get negative-to-no value from using ionice on the
 sending side, particularly since SSDs don't have physical heads to
 seek around.
 yeah in theory it should be like this, but in practice on my system -
 when i use no ionice my system becomes very unresponsive (ubuntu 14.10).

 What all is in the snapshot? Is it your whole system or just /home or
 what? e.g. what are your subvolume boundaries if any?

 btrfs send is very efficent, but that efficency means that it can
 rifle through a heck of a lot of the parent snapshot and decide it
 doesn't need sending, and it can do so very fast, and that can be a
 huge hit on other activities. If most of your system doesn't change
 between snapshots the send will plow through your disk yelling nope
 and skip this like a shopper in a black firday riot.


 (2a) The value of nicing your IO is trivial on the

Re: [PATCH 2/6] btrfs-progs: corrupt block, add break after option U

2014-12-19 Thread Eric Sandeen

On 12/19/14 10:06 AM, David Sterba wrote:
 Resolves-Coverity-CID: 1258793
 Signed-off-by: David Sterba dste...@suse.cz

Reviewed-by: Eric Sandeen sand...@redhat.com

 ---
  btrfs-corrupt-block.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
 index aeeb1b298f66..b477e878376b 100644
 --- a/btrfs-corrupt-block.c
 +++ b/btrfs-corrupt-block.c
 @@ -1068,6 +1068,7 @@ int main(int ac, char **av)
   break;
   case 'U':
   chunk_tree = 1;
 + break;
   case 'i':
   inode = arg_strtou64(optarg);
   break;
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] btrfs-progs: corrupt block, add missing break to option I

2014-12-19 Thread Eric Sandeen

On 12/19/14 10:06 AM, David Sterba wrote:
 Using -I would imply -d.
 
 Resolves-Coverity-CID: 1258792
 Signed-off-by: David Sterba dste...@suse.cz

Reviewed-by: Eric Sandeen sand...@redhat.com

 ---
  btrfs-corrupt-block.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/btrfs-corrupt-block.c b/btrfs-corrupt-block.c
 index af9ae4d4047c..aeeb1b298f66 100644
 --- a/btrfs-corrupt-block.c
 +++ b/btrfs-corrupt-block.c
 @@ -1096,6 +1096,7 @@ int main(int ac, char **av)
   break;
   case 'I':
   corrupt_item = 1;
 + break;
   case 'd':
   delete = 1;
   break;
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/6] btrfs-progs: fragments, close output file on error

2014-12-19 Thread Eric Sandeen

On 12/19/14 10:06 AM, David Sterba wrote:
 Resolves-Coverity-CID: 1258794
 Signed-off-by: David Sterba dste...@suse.cz

Reviewed-by: Eric Sandeen sand...@redhat.com

 ---
  btrfs-fragments.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/btrfs-fragments.c b/btrfs-fragments.c
 index d03c2c3e7319..360f10f87bfa 100644
 --- a/btrfs-fragments.c
 +++ b/btrfs-fragments.c
 @@ -233,7 +233,7 @@ list_fragments(int fd, u64 flags, char *dir)
   ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, args);
   if (ret  0) {
   fprintf(stderr, ERROR: can't perform the search\n);
 - return ret;
 + goto out_close;
   }
   /* the ioctl returns the number of item it found in nr_items */
   if (sk-nr_items == 0)
 @@ -373,7 +373,10 @@ skip:;
   fprintf(html, /p);
   }
   fprintf(html, /body/html\n);
 - 
 +
 +out_close:
 + fclose(html);
 +
   return ret;
  }
  
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oddly slow read performance with near-full largish FS

2014-12-19 Thread Charles Cazabon

Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com wrote:
 
 Let me ask some questions.

Sure - thanks for taking an interest.

 On 2014/12/17 11:42, Charles Cazabon wrote:
  There's roughly 16TB of data in this filesystem (the filesystem is ~17TB).
  The btrfs filesystem is a simple single volume, no snapshots, multiple
  devices, or anything like that.  It's an LVM logical volume on top of
  dmcrypt on top of an mdadm RAID set (8 disks in RAID 6).
 
 Q1. You mean your Btrfs file system exists on the top of
 the following deep layers?
 
 +---+
 |Btrfs(single)  |
 +---+
 |LVM(non RAID?) |
 +---+
 |dmcrypt|
 +---+
 |mdadm RAID set |
 +---+

Yes, precisely.  mdadm is used to make a large RAID6 device, which is
encrypted with LUKS, on top of which is layered LVM (for ease of management),
and the btrfs filesystem sits on that.

 Q2. If Q1 is true, is it possible to reduce that layers as follows?
 
 +---+
 |Btrfs(*1)  |
 +---+
 |dmcrypt|
 +---+

I don't see how I could do that - I simply have far too much data for a single
disk (not to mention I don't want to risk loss of data from a single disk
failing).  This filesystem has 16.x TB of data in it at present.

 It's because there are too many layers and these have
 the same/similar features and heavy layered file system
 tends to cause more trouble than thinner layered ones
 regardless of file system type.

This configuration is one I've been using for many years.  It's only recently
that I've noticed it being particularly slow with btrfs -- I don't know if
that's because the filesystem has filled up past some critical point, or due
to something else entirely.  That's why I'm trying to figure this out.

 *1) Currently I don't recommend you to use RAID56 of Btrfs.
 So, if RAID6 is mandatory, mdadm RAID6 is also necessary.

Yes, exactly.  That's why I use mdadm.

  The speeds I'm seeing (with iotop) fluctuate a lot.  They spend most of
  the time in the range of 1-3 MB/s, with large periods of time where no IO
  seems to happen at all, and occasional short spikes to ~25-30 MB/s.
  System load seems to sit around 10-12 (with only 2 processes reported as
  running, everything else sleeping) while this happens.
[...]
  Other filesystems on the same physical disks have no trouble exceeding
  100MB/s reads.  The machine is not swapping (16GB RAM, ~8GB swap with 0
  swap used).
 
 Q3. They are also consist of the following layers?

Yes, exactly the same configuration.  The fact that I don't see any speed
problems with other filesystems (even in the same LVM volume group) leads me
in the direction of suspecting something to do with btrfs.

 Q4. Are other filesystems also near-full?

No, not particularly.  Now, the btrfs volume in question isn't exactly close
to full - there's more than 500 GB free.  It's just *relatively* full.

 Q5. Is there any error/warning message about
 Btrfs/LVM/dmcrypt/mdadm/hardwares?

No, no errors or warnings in logs related to the disks, LVM, or btrfs.  I have
historically, with previous kernels, gotten the task blocked for more than
120 seconds warnings fairly often, but I haven't seen those lately.

Is there any other info I can collect on this that would help?

Thanks,

Charles
-- 
---
Charles Cazabon
GPL'ed software available at:   http://pyropus.ca/software/
---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] btrfs-progs: check result of first_cache_extent

2014-12-19 Thread Eric Sandeen

On 12/19/14 10:06 AM, David Sterba wrote:
 If the tree's empty, we'll get NULL and dereference it.

Hm, but this is under an explicit check for not empty:

while (!cache_tree_empty(roots_info_cache)) {

sooo?  Maybe it's just defensive?  Nothing really wrong
with being defensive, I suppose, so:

Reviewed-by: Eric Sandeen sand...@redhat.com

 Resolves-Coverity-CID: 1248828
 Signed-off-by: David Sterba dste...@suse.cz
 ---
  cmds-check.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/cmds-check.c b/cmds-check.c
 index 6eea36c2f52c..3e7a4ebdce44 100644
 --- a/cmds-check.c
 +++ b/cmds-check.c
 @@ -8075,6 +8075,8 @@ static void free_roots_info_cache(void)
   struct root_item_info *rii;
  
   entry = first_cache_extent(roots_info_cache);
 + if (!entry)
 + break;
   remove_cache_extent(roots_info_cache, entry);
   rii = container_of(entry, struct root_item_info, cache_extent);
   free(rii);
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/6] btrfs-progs: check allocation result in add_clone_source

2014-12-19 Thread Eric Sandeen

On 12/19/14 10:06 AM, David Sterba wrote:
 Resolves-Coverity-CID: 1054894
 Signed-off-by: David Sterba dste...@suse.cz

Reviewed-by: Eric Sandeen sand...@redhat.com

 ---
  cmds-send.c | 25 +
  1 file changed, 21 insertions(+), 4 deletions(-)
 
 diff --git a/cmds-send.c b/cmds-send.c
 index b17b5e2ca666..9b32c1f0e624 100644
 --- a/cmds-send.c
 +++ b/cmds-send.c
 @@ -172,11 +172,16 @@ out:
   return ret;
  }
  
 -static void add_clone_source(struct btrfs_send *s, u64 root_id)
 +static int add_clone_source(struct btrfs_send *s, u64 root_id)
  {
   s-clone_sources = realloc(s-clone_sources,
   sizeof(*s-clone_sources) * (s-clone_sources_count + 1));
 +
 + if (!s-clone_sources)
 + return -ENOMEM;
   s-clone_sources[s-clone_sources_count++] = root_id;
 +
 + return 0;
  }
  
  static int write_buf(int fd, const void *buf, int size)
 @@ -475,7 +480,11 @@ int cmd_send(int argc, char **argv)
   goto out;
   }
  
 - add_clone_source(send, root_id);
 + ret = add_clone_source(send, root_id);
 + if (ret  0) {
 + fprintf(stderr, ERROR: not enough memory\n);
 + goto out;
 + }
   subvol_uuid_search_finit(send.sus);
   free(subvol);
   subvol = NULL;
 @@ -575,7 +584,11 @@ int cmd_send(int argc, char **argv)
   goto out;
   }
  
 - add_clone_source(send, parent_root_id);
 + ret = add_clone_source(send, parent_root_id);
 + if (ret  0) {
 + fprintf(stderr, ERROR: not enough memory\n);
 + goto out;
 + }
   }
  
   for (i = optind; i  argc; i++) {
 @@ -671,7 +684,11 @@ int cmd_send(int argc, char **argv)
   goto out;
  
   /* done with this subvol, so add it to the clone sources */
 - add_clone_source(send, root_id);
 + ret = add_clone_source(send, root_id);
 + if (ret  0) {
 + fprintf(stderr, ERROR: not enough memory\n);
 + goto out;
 + }
  
   parent_root_id = 0;
   full_send = 0;
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 6/6] btrfs-progs: let btrfs_free_path accept NULL

2014-12-19 Thread Eric Sandeen

On 12/19/14 10:06 AM, David Sterba wrote:
 Same in kernel and matches semantics of free().
 
 Resolves-Coverity-CID: 1054881
 Signed-off-by: David Sterba dste...@suse.cz

Reviewed-by: Eric Sandeen sand...@redhat.com

 ---
  ctree.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/ctree.c b/ctree.c
 index bd6cb125b2a2..589efa3db17e 100644
 --- a/ctree.c
 +++ b/ctree.c
 @@ -48,6 +48,8 @@ struct btrfs_path *btrfs_alloc_path(void)
  
  void btrfs_free_path(struct btrfs_path *p)
  {
 + if (!p)
 + return;
   btrfs_release_path(p);
   kfree(p);
  }
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs-progs: integration-20141218 possible corruption test regression

2014-12-19 Thread WorMzy Tykashi

On 19 December 2014 at 13:48, David Sterba dste...@suse.cz wrote:
 On Fri, Dec 19, 2014 at 02:23:12PM +0800, Qu Wenruo wrote:
 In fact, it's not a regression.

 The 013 testcase is a special case that uses a script to corrupt the
 image and then do the btrfsck test.

 There is a patch before the commit, to allow btrfs-progs test script
 call corruption script.
 But since there is still some discussion about the corruption script and
 maybe later verify script,
 the previous patch is not picked.

 Yes, I'm waiting for final version of the patch, but wanted to add all
 test images that were available.

Thanks for the clarification, Qu, David. I've disabled test 013 in my
local builds for the mean time.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] btrfs-progs: check result of first_cache_extent

2014-12-19 Thread David Sterba

On Fri, Dec 19, 2014 at 10:56:41AM -0600, Eric Sandeen wrote:
 On 12/19/14 10:06 AM, David Sterba wrote:
  If the tree's empty, we'll get NULL and dereference it.
 
 Hm, but this is under an explicit check for not empty:
 
 while (!cache_tree_empty(roots_info_cache)) {
 
 sooo?  Maybe it's just defensive?  Nothing really wrong
 with being defensive, I suppose, so:

Well, mostly to shut up the warning with a minimal change. It could be
rewritten to

while ((entry = ...)) { ... }

as in other places.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Oddly slow read performance with near-full largish FS

2014-12-19 Thread Duncan

Charles Cazabon posted on Fri, 19 Dec 2014 10:58:49 -0600 as excerpted:

 This configuration is one I've been using for many years.  It's only
 recently that I've noticed it being particularly slow with btrfs -- I
 don't know if that's because the filesystem has filled up past some
 critical point, or due to something else entirely.  That's why I'm
 trying to figure this out.

Not recommending at this point, just saying these are options...

Btrfs raid56 mode should, I believe, be pretty close to done with the 
latest patches.  That would be 3.19, however, which isn't out yet of 
course.

There's also raid10, if you have enough devices or little enough data to 
do it.  That's much more mature than raid56 mode and should be about as 
mature and stable as btrfs in single-device mode, which is what you are 
using now.  But it'll require more devices than a raid56 would...

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL] [PATCH 0/4] Updates in message levels

2014-12-19 Thread David Sterba

This is motivated by the ERR level when skinny metadata are used, this has been
reported several times. Patch tagged for stable.  The rest is taken from a SLES
patch that I forgot to forward upstream.

You can pull the branch from

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git 
fix/message-levels

based on current master (d790be3863b28fd22e0)

David Sterba (4):
  btrfs: update message levels for errors
  btrfs: update message levels during failed mount
  btrfs: update message levels after checksum errors
  btrfs: set proper message level for skinny metadata

 fs/btrfs/disk-io.c | 29 +++--
 fs/btrfs/inode.c   |  4 ++--
 2 files changed, 17 insertions(+), 16 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] btrfs: set proper message level for skinny metadata

2014-12-19 Thread David Sterba

This has been confusing people for too long, the message is really just
informative.

CC: sta...@vger.kernel.org # 3.10+
Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6e986a34f9a1..d5e95ec60e12 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2499,7 +2499,7 @@ int open_ctree(struct super_block *sb,
features |= BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO;
 
if (features  BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA)
-   printk(KERN_ERR BTRFS: has skinny extents\n);
+   printk(KERN_INFO BTRFS: has skinny extents\n);
 
/*
 * flag our filesystem as having big metadata blocks if
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] btrfs: update message levels after checksum errors

2014-12-19 Thread David Sterba

The errors are worth noting and might get missed with INFO level.

Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/disk-io.c | 2 +-
 fs/btrfs/inode.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3b694192a4a9..6e986a34f9a1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -318,7 +318,7 @@ static int csum_tree_block(struct btrfs_root *root, struct 
extent_buffer *buf,
memcpy(found, result, csum_size);
 
read_extent_buffer(buf, val, 0, csum_size);
-   printk_ratelimited(KERN_INFO
+   printk_ratelimited(KERN_WARNING
BTRFS: %s checksum verify failed on %llu 
wanted %X found %X 
level %d\n,
root-fs_info-sb-s_id, buf-start,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ff6d98d8dc20..a91d9ff3293b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2945,7 +2945,7 @@ static int __readpage_endio_check(struct inode *inode,
return 0;
 zeroit:
if (__ratelimit(_rs))
-   btrfs_info(BTRFS_I(inode)-root-fs_info,
+   btrfs_warn(BTRFS_I(inode)-root-fs_info,
   csum failed ino %llu off %llu csum %u expected csum 
%u,
   btrfs_ino(inode), start, csum, csum_expected);
memset(kaddr + pgoff, 1, len);
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] btrfs: update message levels for errors

2014-12-19 Thread David Sterba

Several messages that point to some internal problem, level INFO is
wrong here.

Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/disk-io.c | 9 +
 fs/btrfs/inode.c   | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 30965120772b..8beb74ffb075 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -367,7 +367,8 @@ static int verify_parent_transid(struct extent_io_tree 
*io_tree,
ret = 0;
goto out;
}
-   printk_ratelimited(KERN_INFO BTRFS (device %s): parent transid verify 
failed on %llu wanted %llu found %llu\n,
+   printk_ratelimited(KERN_ERR
+   BTRFS (device %s): parent transid verify failed on %llu wanted 
%llu found %llu\n,
eb-fs_info-sb-s_id, eb-start,
parent_transid, btrfs_header_generation(eb));
ret = 1;
@@ -633,21 +634,21 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio 
*io_bio,
 
found_start = btrfs_header_bytenr(eb);
if (found_start != eb-start) {
-   printk_ratelimited(KERN_INFO BTRFS (device %s): bad tree block 
start 
+   printk_ratelimited(KERN_ERR BTRFS (device %s): bad tree block 
start 
   %llu %llu\n,
   eb-fs_info-sb-s_id, found_start, eb-start);
ret = -EIO;
goto err;
}
if (check_tree_block_fsid(root, eb)) {
-   printk_ratelimited(KERN_INFO BTRFS (device %s): bad fsid on 
block %llu\n,
+   printk_ratelimited(KERN_ERR BTRFS (device %s): bad fsid on 
block %llu\n,
   eb-fs_info-sb-s_id, eb-start);
ret = -EIO;
goto err;
}
found_level = btrfs_header_level(eb);
if (found_level = BTRFS_MAX_LEVEL) {
-   btrfs_info(root-fs_info, bad tree block level %d,
+   btrfs_err(root-fs_info, bad tree block level %d,
   (int)btrfs_header_level(eb));
ret = -EIO;
goto err;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e687bb0dc73a..ff6d98d8dc20 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3407,7 +3407,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root)
 
 out:
if (ret)
-   btrfs_crit(root-fs_info,
+   btrfs_err(root-fs_info,
could not do orphan cleanup %d, ret);
btrfs_free_path(path);
return ret;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] btrfs: update message levels during failed mount

2014-12-19 Thread David Sterba

All error conditions from open_ctree shall be ERR. Warning would
suggest that something's wrong and we can continue.

Signed-off-by: David Sterba dste...@suse.cz
---
 fs/btrfs/disk-io.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8beb74ffb075..3b694192a4a9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2523,7 +2523,7 @@ int open_ctree(struct super_block *sb,
 */
if ((features  BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) 
(sectorsize != nodesize)) {
-   printk(KERN_WARNING BTRFS: unequal leaf/node/sector sizes 
+   printk(KERN_ERR BTRFS: unequal leaf/node/sector sizes 
are not allowed for mixed block groups on 
%s\n,
sb-s_id);
goto fail_alloc;
@@ -2631,12 +2631,12 @@ int open_ctree(struct super_block *sb,
sb-s_blocksize_bits = blksize_bits(sectorsize);
 
if (btrfs_super_magic(disk_super) != BTRFS_MAGIC) {
-   printk(KERN_INFO BTRFS: valid FS not found on %s\n, sb-s_id);
+   printk(KERN_ERR BTRFS: valid FS not found on %s\n, sb-s_id);
goto fail_sb_buffer;
}
 
if (sectorsize != PAGE_SIZE) {
-   printk(KERN_WARNING BTRFS: Incompatible sector size(%lu) 
+   printk(KERN_ERR BTRFS: incompatible sector size (%lu) 
   found on %s\n, (unsigned long)sectorsize, sb-s_id);
goto fail_sb_buffer;
}
@@ -2645,7 +2645,7 @@ int open_ctree(struct super_block *sb,
ret = btrfs_read_sys_array(tree_root);
mutex_unlock(fs_info-chunk_mutex);
if (ret) {
-   printk(KERN_WARNING BTRFS: failed to read the system 
+   printk(KERN_ERR BTRFS: failed to read the system 
   array on %s\n, sb-s_id);
goto fail_sb_buffer;
}
@@ -2660,7 +2660,7 @@ int open_ctree(struct super_block *sb,
   generation);
if (!chunk_root-node ||
!test_bit(EXTENT_BUFFER_UPTODATE, chunk_root-node-bflags)) {
-   printk(KERN_WARNING BTRFS: failed to read chunk root on %s\n,
+   printk(KERN_ERR BTRFS: failed to read chunk root on %s\n,
   sb-s_id);
goto fail_tree_roots;
}
@@ -2672,7 +2672,7 @@ int open_ctree(struct super_block *sb,
 
ret = btrfs_read_chunk_tree(chunk_root);
if (ret) {
-   printk(KERN_WARNING BTRFS: failed to read chunk tree on %s\n,
+   printk(KERN_ERR BTRFS: failed to read chunk tree on %s\n,
   sb-s_id);
goto fail_tree_roots;
}
@@ -2684,7 +2684,7 @@ int open_ctree(struct super_block *sb,
btrfs_close_extra_devices(fs_info, fs_devices, 0);
 
if (!fs_devices-latest_bdev) {
-   printk(KERN_CRIT BTRFS: failed to read devices on %s\n,
+   printk(KERN_ERR BTRFS: failed to read devices on %s\n,
   sb-s_id);
goto fail_tree_roots;
}
@@ -2768,7 +2768,7 @@ retry_root_backup:
 
ret = btrfs_recover_balance(fs_info);
if (ret) {
-   printk(KERN_WARNING BTRFS: failed to recover balance\n);
+   printk(KERN_ERR BTRFS: failed to recover balance\n);
goto fail_block_groups;
}
 
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs progs pre-release 3.18-rc2

2014-12-19 Thread David Sterba

Hi

another step towards 3.18, the changes are limited in scope and mostly
cleanups or docs. I'd like to see more test images for new fsck code,
there are 2 new that I missed earlier, so they don't count.

The timing with end of the year is not good so if I'm not confident
that the release is in a good shape I'm rather going to postpone it.
Alternatively, I can do a release on Monday and then use the following
weeks to prepare a .1 bugfix release.

Known problems: the image 013 fails
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: track dirty block groups on their own list V2

2014-12-19 Thread Josef Bacik

Currently any time we try to update the block groups on disk we will walk _all_
block groups and check for the -dirty flag to see if it is set.  This function
can get called several times during a commit.  So if you have several terabytes
of data you will be a very sad panda as we will loop through _all_ of the block
groups several times, which makes the commit take a while which slows down the
rest of the file system operations.

This patch introduces a dirty list for the block groups that we get added to
when we dirty the block group for the first time.  Then we simply update any
block groups that have been dirtied since the last time we called
btrfs_write_dirty_block_groups.  This allows us to clean up how we write the
free space cache out so it is much cleaner.  Thanks,

Signed-off-by: Josef Bacik jba...@fb.com
---
V1-V2: Don't unconditionally take the dirty bg list lock in update_block_group,
only do it if our dirty_bg list is empty.

 fs/btrfs/ctree.h|   5 +-
 fs/btrfs/extent-tree.c  | 169 ++--
 fs/btrfs/free-space-cache.c |   8 ++-
 fs/btrfs/transaction.c  |  14 ++--
 fs/btrfs/transaction.h  |   2 +
 5 files changed, 74 insertions(+), 124 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b62315b..e5bc509 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1237,7 +1237,6 @@ enum btrfs_disk_cache_state {
BTRFS_DC_ERROR  = 1,
BTRFS_DC_CLEAR  = 2,
BTRFS_DC_SETUP  = 3,
-   BTRFS_DC_NEED_WRITE = 4,
 };
 
 struct btrfs_caching_control {
@@ -1275,7 +1274,6 @@ struct btrfs_block_group_cache {
unsigned long full_stripe_len;
 
unsigned int ro:1;
-   unsigned int dirty:1;
unsigned int iref:1;
 
int disk_cache_state;
@@ -1309,6 +1307,9 @@ struct btrfs_block_group_cache {
 
/* For read-only block groups */
struct list_head ro_list;
+
+   /* For dirty block groups */
+   struct list_head dirty_list;
 };
 
 /* delayed seq elem */
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 74eb29d..71a9752 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -74,8 +74,9 @@ enum {
RESERVE_ALLOC_NO_ACCOUNT = 2,
 };
 
-static int update_block_group(struct btrfs_root *root,
- u64 bytenr, u64 num_bytes, int alloc);
+static int update_block_group(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root, u64 bytenr,
+ u64 num_bytes, int alloc);
 static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
u64 bytenr, u64 num_bytes, u64 parent,
@@ -3315,120 +3316,42 @@ int btrfs_write_dirty_block_groups(struct 
btrfs_trans_handle *trans,
   struct btrfs_root *root)
 {
struct btrfs_block_group_cache *cache;
-   int err = 0;
+   struct btrfs_transaction *cur_trans = trans-transaction;
+   int ret = 0;
struct btrfs_path *path;
-   u64 last = 0;
+
+   if (list_empty(cur_trans-dirty_bgs))
+   return 0;
 
path = btrfs_alloc_path();
if (!path)
return -ENOMEM;
 
-again:
-   while (1) {
-   cache = btrfs_lookup_first_block_group(root-fs_info, last);
-   while (cache) {
-   if (cache-disk_cache_state == BTRFS_DC_CLEAR)
-   break;
-   cache = next_block_group(root, cache);
-   }
-   if (!cache) {
-   if (last == 0)
-   break;
-   last = 0;
-   continue;
-   }
-   err = cache_save_setup(cache, trans, path);
-   last = cache-key.objectid + cache-key.offset;
-   btrfs_put_block_group(cache);
-   }
-
-   while (1) {
-   if (last == 0) {
-   err = btrfs_run_delayed_refs(trans, root,
-(unsigned long)-1);
-   if (err) /* File system offline */
-   goto out;
-   }
-
-   cache = btrfs_lookup_first_block_group(root-fs_info, last);
-   while (cache) {
-   if (cache-disk_cache_state == BTRFS_DC_CLEAR) {
-   btrfs_put_block_group(cache);
-   goto again;
-   }
-
-   if (cache-dirty)
-   break;
-   cache = next_block_group(root, cache);
-   }
-   if (!cache) {
-   if (last == 0)
-   break;
-   last = 0;
-   continue;
-   }
-
-

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Phillip Susi

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/18/2014 9:59 AM, Daniele Testa wrote:
 As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On 
 that partition, I have one single starse file, taking 302GB of
 space (max 315GB). The snapshots directory is completely empty.

So you don't have any snapshots or other subvolumes?

 However, for some weird reason, btrfs seems to think it takes
 404GB. The big file is a disk that I use in a virtual server and
 when I write stuff inside that virtual server, the disk-usage of
 the btrfs partition on the host keeps increasing even if the
 sparse-file is constant at 302GB. I even have 100GB of free
 disk-space inside that virtual disk-file. Writing 1GB inside the
 virtual disk-file seems to increase the usage about 4-5GB on the
 outside.

Did you flag the file as nodatacow?

 Does anyone have a clue on what is going on? How can the
 difference and behaviour be like this when I just have one single
 file? Is it also normal to have 672MB of metadata for a single
 file?

You probably have the data checksums enabled and that isn't
unreasonable for checksums on 302g of data.


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUlHQyAAoJENRVrw2cjl5RZWEIAKfdDzlNVrD/IYDZ5wzIeg5P
DR5H8anGGc2QPTAD76vEX/XA7/j1Kg+PbQRHGdz6Iq2+Vq4CGno/yIi46oVVVYaL
H4XvuH7GvPJyzHJ+XCMHjPGLrSCBxgIm1XSluNXmFNCwqi/FONk8TUhWsw7JchaZ
yCVe/82YI+MLZhmJdudt48MeNFzW6LYi58dQo/JfYnTGnpZAFutdgBM7vLmnqLY2
WVLQUNHZsHBa7solttCuRtc4h8ku9FBObfKKYNPAEn1YWfx7bihWgPeBMH/blsza
yhpMq96OMhIfn2SmIZMSwGh2ys+AxQQfymYR69fyGYTIajHmJEhJUzltuQD9Yg8=
=Z9/S
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Daniele Testa

No, I don't have any snapshots or subvolumes. Only that single file.

The file has both checksums and datacow on it. I will do chattr +C
on the parent dir and re-create the file to make sure all files are
marked as nodatacow.

Should I also turn off checksums with the mount-flags if this
filesystem only contain big VM-files? Or is it not needed if I put +C
on the parent dir?

2014-12-20 2:53 GMT+08:00 Phillip Susi ps...@ubuntu.com:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 12/18/2014 9:59 AM, Daniele Testa wrote:
 As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On
 that partition, I have one single starse file, taking 302GB of
 space (max 315GB). The snapshots directory is completely empty.

 So you don't have any snapshots or other subvolumes?

 However, for some weird reason, btrfs seems to think it takes
 404GB. The big file is a disk that I use in a virtual server and
 when I write stuff inside that virtual server, the disk-usage of
 the btrfs partition on the host keeps increasing even if the
 sparse-file is constant at 302GB. I even have 100GB of free
 disk-space inside that virtual disk-file. Writing 1GB inside the
 virtual disk-file seems to increase the usage about 4-5GB on the
 outside.

 Did you flag the file as nodatacow?

 Does anyone have a clue on what is going on? How can the
 difference and behaviour be like this when I just have one single
 file? Is it also normal to have 672MB of metadata for a single
 file?

 You probably have the data checksums enabled and that isn't
 unreasonable for checksums on 302g of data.


 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2.0.17 (MingW32)

 iQEcBAEBAgAGBQJUlHQyAAoJENRVrw2cjl5RZWEIAKfdDzlNVrD/IYDZ5wzIeg5P
 DR5H8anGGc2QPTAD76vEX/XA7/j1Kg+PbQRHGdz6Iq2+Vq4CGno/yIi46oVVVYaL
 H4XvuH7GvPJyzHJ+XCMHjPGLrSCBxgIm1XSluNXmFNCwqi/FONk8TUhWsw7JchaZ
 yCVe/82YI+MLZhmJdudt48MeNFzW6LYi58dQo/JfYnTGnpZAFutdgBM7vLmnqLY2
 WVLQUNHZsHBa7solttCuRtc4h8ku9FBObfKKYNPAEn1YWfx7bihWgPeBMH/blsza
 yhpMq96OMhIfn2SmIZMSwGh2ys+AxQQfymYR69fyGYTIajHmJEhJUzltuQD9Yg8=
 =Z9/S
 -END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Phillip Susi

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/19/2014 2:59 PM, Daniele Testa wrote:
 No, I don't have any snapshots or subvolumes. Only that single
 file.
 
 The file has both checksums and datacow on it. I will do chattr
 +C on the parent dir and re-create the file to make sure all files
 are marked as nodatacow.
 
 Should I also turn off checksums with the mount-flags if this 
 filesystem only contain big VM-files? Or is it not needed if I put
 +C on the parent dir?

If you don't want the overhead of those checksums, then yea.  Also I
would question why you are using btrfs to hold only big vm files in
the first place.  You would be better off using lvm thinp volumes
instead of files, though personally I prefer to just use regular lvm
volumes and manually allocate enough space.  It avoids the
fragmentation you get from thin provisioning ( or qcow2 ) at the cost
of a bit of overallocated space and the need to do some manual
resizing to add more if and when it is needed.

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUlIwGAAoJENRVrw2cjl5RlGEH/1OYz07C/OjGBASA9IHTCVMV
NkYHnO3/s2+SOsafQj4ej/RifgX9aG43b8Y6z9XAdosG/X+8z7xRjW9Nic0H5beK
JZRpwP+02Dw02A3/RSPjGqJBeAmS8yi9yTlunnPaCau+m1kPYL4M/vFM8/hqrGeU
Jy+jbffX+XtOedBWptxnDVIyXpYskgVyH8AmQ9d3TGrv52jw/QY1BxkuoVG60hBU
Fk4Q8ed43C9zjCVihmkDOeER6Ygr1roDb1/gFLoeCk4FwVLO9Kusft2Qi2oXyHy1
iTkoVJan8NRzXBhrPtZexxQdewHSw9Z4wyHxlal3b/xIbRf6/DRwPRHfgG5djvM=
=AqC/
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Josef Bacik


On 12/18/2014 09:59 AM, Daniele Testa wrote:

Hey,

I am hoping you guys can shed some light on my issue. I know that it's
a common question that people see differences in the disk used when
running different calculations, but I still think that my issue is
weird.

root@s4 / # mount
/dev/md3 on /opt/drives/ssd type btrfs
(rw,noatime,compress=zlib,discard,nospace_cache)

root@s4 / # btrfs filesystem df /opt/drives/ssd
Data: total=407.97GB, used=404.08GB
System, DUP: total=8.00MB, used=52.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.25GB, used=672.21MB
Metadata: total=8.00MB, used=0.00

root@s4 /opt/drives/ssd # ls -alhs
total 302G
4.0K drwxr-xr-x 1 root root   42 Dec 18 14:34 .
4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 ..
302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49 disk_208.img
0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots

root@s4 /opt/drives/ssd # du -h
0   ./snapshots
302G.

As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On
that partition, I have one single starse file, taking 302GB of space
(max 315GB). The snapshots directory is completely empty.

However, for some weird reason, btrfs seems to think it takes 404GB.
The big file is a disk that I use in a virtual server and when I write
stuff inside that virtual server, the disk-usage of the btrfs
partition on the host keeps increasing even if the sparse-file is
constant at 302GB. I even have 100GB of free disk-space inside that
virtual disk-file. Writing 1GB inside the virtual disk-file seems to
increase the usage about 4-5GB on the outside.

Does anyone have a clue on what is going on? How can the difference
and behaviour be like this when I just have one single file? Is it
also normal to have 672MB of metadata for a single file?



Hello and welcome to the wonderful world of btrfs, where COW can really 
suck hard without being super clear why!  It's 4pm on a Friday right 
before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to 
use pretty pictures.  You have this case to start with


file offset 0   offset 302g
[-prealloced 302g extent--]

(man it's impressive I got all that lined up right)

On disk you have 2 things.  First your file which has file extents which 
says


inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g

and then in the extent tree, who keeps track of actual allocated space 
has this


extent bytenr 123, len 302g, refs 1

Now say you boot up your virt image and it writes 1 4k block to offset 
0.  Now you have this


[4k][302g-4k--]

And for your inode you now have this

inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), 
disklen 4k
inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123, 
disklen 302g


and in your extent tree you have

extent bytenr 123, len 302g, refs 1
extent bytenr whatever, len 4k, refs 1

See that?  Your file is still the same size, it is still 302g.  If you 
cp'ed it right now it would copy 302g of information.  But what you have 
actually allocated on disk?  Well that's now 302g + 4k.  Now lets say 
your virt thing decides to write to the middle, lets say at offset 12k, 
now you have this


inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g), 
disklen 4k

inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever, 
disklen 4k
inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123, 
disklen 302g


and in the extent tree you have this

extent bytenr 123, len 302g, refs 2
extent bytenr whatever, len 4k, refs 1
extent bytenr notimportant, len 4k, refs 1

See that refs 2 change?  We split the original extent, so we have 2 file 
extents pointing to the same physical extents, so we bumped the ref 
count.  This will happen over and over again until we have completely 
overwritten the original extent, at which point your space usage will go 
back down to ~302g.


We split big extents with cow, so unless you've got lots of space to 
spare or are going to use nodatacow you should probably not pre-allocate 
virt images.  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Josef Bacik


On 12/19/2014 02:59 PM, Daniele Testa wrote:

No, I don't have any snapshots or subvolumes. Only that single file.

The file has both checksums and datacow on it. I will do chattr +C
on the parent dir and re-create the file to make sure all files are
marked as nodatacow.

Should I also turn off checksums with the mount-flags if this
filesystem only contain big VM-files? Or is it not needed if I put +C
on the parent dir?


Please God don't turn off of checksums.  Checksums are tracked in 
metadata anyway, they won't show up in the data accounting.  Our csums 
are 8 bytes per block, so basic math says you are going to max out at 
604 megabytes for that big of a file.


Please people try to only take advice from people who know what they are 
talking about.  So unless it's from somebody who has commits in 
btrfs/btrfs-progs take their feedback with a grain of salt.  Thanks,


Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Josef Bacik


On 12/19/2014 04:10 PM, Josef Bacik wrote:

On 12/18/2014 09:59 AM, Daniele Testa wrote:

Hey,

I am hoping you guys can shed some light on my issue. I know that it's
a common question that people see differences in the disk used when
running different calculations, but I still think that my issue is
weird.

root@s4 / # mount
/dev/md3 on /opt/drives/ssd type btrfs
(rw,noatime,compress=zlib,discard,nospace_cache)

root@s4 / # btrfs filesystem df /opt/drives/ssd
Data: total=407.97GB, used=404.08GB
System, DUP: total=8.00MB, used=52.00KB
System: total=4.00MB, used=0.00
Metadata, DUP: total=1.25GB, used=672.21MB
Metadata: total=8.00MB, used=0.00

root@s4 /opt/drives/ssd # ls -alhs
total 302G
4.0K drwxr-xr-x 1 root root   42 Dec 18 14:34 .
4.0K drwxr-xr-x 4 libvirt-qemu libvirt-qemu 4.0K Dec 18 14:31 ..
302G -rw-r--r-- 1 libvirt-qemu libvirt-qemu 315G Dec 18 14:49
disk_208.img
0 drwxr-xr-x 1 libvirt-qemu libvirt-qemu0 Dec 18 10:08 snapshots

root@s4 /opt/drives/ssd # du -h
0   ./snapshots
302G.

As seen above, I have a 410GB SSD mounted at /opt/drives/ssd. On
that partition, I have one single starse file, taking 302GB of space
(max 315GB). The snapshots directory is completely empty.

However, for some weird reason, btrfs seems to think it takes 404GB.
The big file is a disk that I use in a virtual server and when I write
stuff inside that virtual server, the disk-usage of the btrfs
partition on the host keeps increasing even if the sparse-file is
constant at 302GB. I even have 100GB of free disk-space inside that
virtual disk-file. Writing 1GB inside the virtual disk-file seems to
increase the usage about 4-5GB on the outside.

Does anyone have a clue on what is going on? How can the difference
and behaviour be like this when I just have one single file? Is it
also normal to have 672MB of metadata for a single file?



Hello and welcome to the wonderful world of btrfs, where COW can really
suck hard without being super clear why!  It's 4pm on a Friday right
before I'm gone for 2 weeks so I'm a bit happy and drunk so I'm going to
use pretty pictures.  You have this case to start with

file offset 0   offset 302g
[-prealloced 302g extent--]

(man it's impressive I got all that lined up right)

On disk you have 2 things.  First your file which has file extents which
says

inode 256, file offset 0, size 302g, offset0, disk bytenr 123, disklen 302g

and then in the extent tree, who keeps track of actual allocated space
has this

extent bytenr 123, len 302g, refs 1

Now say you boot up your virt image and it writes 1 4k block to offset
0.  Now you have this

[4k][302g-4k--]

And for your inode you now have this

inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
disklen 4k
inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123,
disklen 302g

and in your extent tree you have

extent bytenr 123, len 302g, refs 1
extent bytenr whatever, len 4k, refs 1

See that?  Your file is still the same size, it is still 302g.  If you
cp'ed it right now it would copy 302g of information.  But what you have
actually allocated on disk?  Well that's now 302g + 4k.  Now lets say
your virt thing decides to write to the middle, lets say at offset 12k,
now you have this

inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
disklen 4k
inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
disklen 4k
inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123,
disklen 302g

and in the extent tree you have this

extent bytenr 123, len 302g, refs 2
extent bytenr whatever, len 4k, refs 1
extent bytenr notimportant, len 4k, refs 1

See that refs 2 change?  We split the original extent, so we have 2 file
extents pointing to the same physical extents, so we bumped the ref
count.  This will happen over and over again until we have completely
overwritten the original extent, at which point your space usage will go
back down to ~302g.

We split big extents with cow, so unless you've got lots of space to
spare or are going to use nodatacow you should probably not pre-allocate
virt images.  Thanks,



Sorry should have added a

tl;dr: Cow means you can in the worst case end up using 2 * filesize - 
blocksize of data on disk and the file will appear to be filesize.  Thanks,


Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!

2014-12-19 Thread Josef Bacik


On 12/12/2014 09:37 AM, Tomasz Chmielewski wrote:

FYI, still seeing this with 3.18 (scrub passes fine on this filesystem).

# time btrfs balance start /mnt/lxc2
Segmentation fault



Ok now I remember why I haven't fix this yet, the images you gave me 
restore but then they don't mount because the extent tree is corrupted 
for some reason.  Could you re-image this fs and send it to me and I 
promise to spend all of my time on the problem until its fixed.  Thanks,


Josef

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Phillip Susi

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/19/2014 4:15 PM, Josef Bacik wrote:
 Please God don't turn off of checksums.  Checksums are tracked in 
 metadata anyway, they won't show up in the data accounting.  Our
 csums are 8 bytes per block, so basic math says you are going to
 max out at 604 megabytes for that big of a file.

Yes, and it is exactly that metadata space he is complaining about.
So if you don't want to use up all of that space ( and have no use for
the checksums ), then you turn them off.

 Please people try to only take advice from people who know what
 they are talking about.  So unless it's from somebody who has
 commits in btrfs/btrfs-progs take their feedback with a grain of
 salt.  Thanks,

Well that is rather arrogant and rude.  For that matter, I *do* have
commits in btrfs-progs.


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUlJ5gAAoJENRVrw2cjl5RZ5MIAI0Ok0q0hFTMcYYXu1U48R4Z
AsuRg6zQDMOa9C1SqZucH2cuiiaGU8XixKcscaquoJDzzaND2kuy+sxp0k2YQnGz
+/269OmZUtwjYil1NcSFTJiE2bYUAx1R+xWUGax/03NsXRr672f0EtAQ2sIitTaG
WsNUhiU0GREpQL6pK403fO79eD2vRmgCx2w50gB2OYPQYciJ+YN0YAJ7z8VEmUro
M9xqce2oc7haAHliDvazl+7IDRkkiZ7FcpSs2nBSqiHiUhgVaxuTzHZEXvUasE5l
LamJCwiSwuevWWPCDE4N/r7qVcamKM2K/DMvZCiOuPkSm3YkcVyrUd8x4i8OEJs=
=8R13
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Josef Bacik


On 12/19/2014 04:53 PM, Phillip Susi wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/19/2014 4:15 PM, Josef Bacik wrote:

Please God don't turn off of checksums.  Checksums are tracked in
metadata anyway, they won't show up in the data accounting.  Our
csums are 8 bytes per block, so basic math says you are going to
max out at 604 megabytes for that big of a file.


Yes, and it is exactly that metadata space he is complaining about.
So if you don't want to use up all of that space ( and have no use for
the checksums ), then you turn them off.


Please people try to only take advice from people who know what
they are talking about.  So unless it's from somebody who has
commits in btrfs/btrfs-progs take their feedback with a grain of
salt.  Thanks,


Well that is rather arrogant and rude.  For that matter, I *do* have
commits in btrfs-progs.



root@destiny ~/btrfs-progs# git log --oneline --author=Phillip Susi
c65345d btrfs-progs: document --rootdir mkfs switch
f6b6e93 btrfs-progs: removed extraneous whitespace from mkfs man page

Sorry I should have qualified that statement better.

So unless it's from somebody who has had commits to meaningful portions 
of btrfs/btrfs-progs take their feedback with a grain of salt.


There are too many people on this list who give random horribly wrong 
advice to users that can result in data loss or corruption.  Now I'll 
admit I read her question wrong so what you said wasn't incorrect, I'm 
sorry for that.  I've seen a lot of people responding to questions 
recently that I don't recognize that have been completely full of crap, 
I just assumed you were in that camp as well.  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 3.18.0: kernel BUG at fs/btrfs/relocation.c:242!

2014-12-19 Thread Tomasz Chmielewski


On 2014-12-19 22:47, Josef Bacik wrote:

On 12/12/2014 09:37 AM, Tomasz Chmielewski wrote:
FYI, still seeing this with 3.18 (scrub passes fine on this 
filesystem).


# time btrfs balance start /mnt/lxc2
Segmentation fault



Ok now I remember why I haven't fix this yet, the images you gave me
restore but then they don't mount because the extent tree is corrupted
for some reason.  Could you re-image this fs and send it to me and I
promise to spend all of my time on the problem until its fixed.


(un)fortunately one filesystem stopped crashing on balance with some 
kernel update, and the other one I had crashing on balance was fixed 
with btrfs - so I'm not able to reproduce anymore / produce an image 
which is crashing.


--
Tomasz Chmielewski
http://www.sslrack.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

kernel BUG at /home/apw/COD/linux/fs/btrfs/inode.c:3123!

2014-12-19 Thread Tomasz Chmielewski


Get this BUG with 3.18.1 (pasted at the bottom of the email).
Below all actions from creating the fs to BUG. I did not attempt to 
reproduce.


# mkfs.btrfs /dev/vdb
Btrfs v3.17.3
See http://btrfs.wiki.kernel.org for more information.

Turning ON incompat feature 'extref': increased hardlink limit per file 
to 65536

fs created label (null) on /dev/vdb
nodesize 16384 leafsize 16384 sectorsize 4096 size 256.00GiB

# mount -o noatime /dev/vdb /mnt/test/
# cd /mnt/test
# btrfs sub cre subvolume
Create subvolume './subvolume'
# dd if=/dev/urandom of=bigfile.img bs=64k
^C91758+0 records in
91757+0 records out
6013386752 bytes (6.0 GB) copied, 374.777 s, 16.0 MB/s
# btrfs sub list /mnt/test/
ID 257 gen 16 top level 5 path subvolume

# btrfs quota enable /mnt/test

# btrfs qgroup show /mnt/test
qgroupid rfer   excl
    
0/5  16384  16384
0/2576013403136 6013403136

# dd if=/dev/urandom of=bigfile2.img bs=64k
^C47721+0 records in
47720+0 records out
3127377920 bytes (3.1 GB) copied, 194.641 s, 16.1 MB/s

# btrfs qgroup show /mnt/test
qgroupid rfer   excl
    
0/5  16384  16384
0/2578704049152 8704049152
root@srv2:/mnt/test/subvolume# sync
root@srv2:/mnt/test/subvolume# btrfs qgroup show /mnt/test
qgroupid rfer   excl
    
0/5  16384  16384
0/2579140781056 9140781056

# dd if=/dev/urandom of=bigfile3.img bs=64k
^C3617580+0 records in
3617579+0 records out
237081657344 bytes (237 GB) copied, 14796 s, 16.0 MB/s

# df -h
Filesystem  Size  Used Avail Use% Mounted on
(...)
/dev/vdb256G  230G   25G  91% /mnt/test


# btrfs qgroup show /mnt/test
qgroupid rfer excl
  
0/5  1638416384
0/257245960245248 245960245248

# ls -l
total 240451584
-rw-r--r-- 1 root root   3127377920 Dec 19 20:06 bigfile2.img
-rw-r--r-- 1 root root 237081657344 Dec 20 00:15 bigfile3.img
-rw-r--r-- 1 root root   6013386752 Dec 19 20:02 bigfile.img

# rm bigfile3.img

# sync

# dmesg
(...)
[   95.055420] BTRFS: device fsid 97f98279-21e7-4822-89be-3aed9dc05f2c 
devid 1 transid 3 /dev/vdb

[  118.446509] BTRFS info (device vdb): disk space caching is enabled
[  118.446518] BTRFS: flagging fs with big metadata feature
[  118.452176] BTRFS: creating UUID tree
[  575.189412] BTRFS info (device vdb): qgroup scan completed
[15948.234826] [ cut here ]
[15948.234883] kernel BUG at /home/apw/COD/linux/fs/btrfs/inode.c:3123!
[15948.234906] invalid opcode:  [#1] SMP
[15948.234925] Modules linked in: nf_log_ipv6 ip6t_REJECT nf_reject_ipv6 
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 
nf_log_common xt_LOG ipt_REJECT nf_reject_ipv4 xt_tcpudp 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
iptable_filter ip_tables x_tables dm_crypt btrfs xor crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel aesni_intel ppdev aes_x86_64 lrw 
raid6_pq gf128mul glue_helper ablk_helper cryptd serio_raw mac_hid 
pvpanic 8250_fintek parport_pc i2c_piix4 lp parport psmouse qxl ttm 
floppy drm_kms_helper drm
[15948.235172] CPU: 0 PID: 3274 Comm: btrfs-cleaner Not tainted 
3.18.1-031801-generic #201412170637
[15948.235193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 
04/01/2014
[15948.235222] task: 880036708a00 ti: 88007b97c000 task.ti: 
88007b97c000
[15948.235240] RIP: 0010:[c0458ec9]  [c0458ec9] 
btrfs_orphan_add+0x1a9/0x1c0 [btrfs]

[15948.235305] RSP: 0018:88007b97fc98  EFLAGS: 00010286
[15948.235318] RAX: ffe4 RBX: 88007b80a800 RCX: 

[15948.235333] RDX: 219e RSI: 0004 RDI: 
880079418138
[15948.235349] RBP: 88007b97fcd8 R08: 88007fc1cae0 R09: 
88007ad272d0
[15948.235366] R10:  R11: 0010 R12: 
88007a2d9500
[15948.235381] R13: 8800027d60e0 R14: 88007b80ac58 R15: 
0001
[15948.235401] FS:  () GS:88007fc0() 
knlGS:

[15948.235418] CS:  0010 DS:  ES:  CR0: 80050033
[15948.235432] CR2: 7f0489ff CR3: 7a5e CR4: 
001407f0

[15948.235464] Stack:
[15948.235473]  88007b97fcd8 c0497acf 88007b809800 
88003c207400
[15948.235498]  88007b809800 88007ad272d0 88007a2d9500 
0001
[15948.235521]  88007b97fd58 c04412e0 880079418000 
0004c0427fea

[15948.235551] Call Trace:
[15948.235601]  [c0497acf] ? 
lookup_free_space_inode+0x4f/0x100 [btrfs]
[15948.235642]  [c04412e0] 
btrfs_remove_block_group+0x140/0x490 [btrfs]
[15948.235693]  [c047bde5] btrfs_remove_chunk+0x245/0x380 
[btrfs]
[15948.235731]  [c0441866] btrfs_delete_unused_bgs+0x236/0x270 
[btrfs]

[15948.235771]  [c044ad6c] cleaner_kthread+0x12c/0x190 [btrfs]
[15948.235806]

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Duncan

Daniele Testa posted on Sat, 20 Dec 2014 03:59:42 +0800 as excerpted:

 The file has both checksums and datacow on it. I will do chattr +C
 on the parent dir and re-create the file to make sure all files are
 marked as nodatacow.
 
 Should I also turn off checksums with the mount-flags if this filesystem
 only contain big VM-files? Or is it not needed if I put +C on the parent
 dir?

FWIW...

Turning off datacow, whether by chattr +C on the parent dir before 
creating the file, or via mount option, turns off checksumming as well.  
(For completeness, it also turns off compression, but I don't think that 
applies in your case.)

In general, active VM images (and database files) with default flags tend 
to get very highly fragmented very fast, due to btrfs' default COW on a 
file with a heavy internal rewrite pattern (as opposed to append-only 
or full rename/replace on rewrite).  For relatively small files with this 
rewrite pattern, think typical desktop firefox sqlite database files of a 
quarter GiB or less, the btrfs autodefrag mount option can be helpful, 
but because it triggers a rewrite of the entire file, as filesize goes 
up, the viability of autodefrag goes down, and at somewhere around half a 
gig, autodefrag doesn't work so well any more, particularly on very 
active files where the incoming rewrite stream may be faster than btrfs 
can rewrite the entire file.

Making heavy-internal-rewrite pattern files of over say half a GiB in 
size nocow is one suggested solution.  However, snapshots lock in place 
the existing version, causing a one-time COW after a snapshot.  If people 
are doing frequent automated snapshots (say once an hour), this can be a 
big problem, as the file ends up fragmenting pretty badly with these 1-
cow writes as well.  That's where snapshots come into the picture.

There are ways to work around the problem (put the files in question on a 
subvolume and don't snapshot it as often as the parent, setup a cron job 
to do say weekly defrag on the files in question, etc), but since you 
don't have snapshots going anyway, that's not a concern for you except as 
a preventative -- consider it if you /do/ start doing snapshots.

So anyway, as I said, creating the file nocow (whether by mount option or 
chattr) will turn off checksumming too.  But on something that frequently 
internally rewritten, where corruption will very likely corrupt the VM 
anyway and there's already mechanisms in place to deal with that (either 
VM integrity mechanisms, or backups, or simply disposable VMs, fire up a 
new one when necessary), at least with btrfs single-mode-data where 
there's no second copy to restore from if the checksum /does/ fail, 
turning off checksumming isn't necessarily as bad as it may seem anyway.

And it /should/ save you some on the metadata... tho I'd not consider 
that savings worth turning off checksumming if that were the /only/ 
reason, on its own.  The metadata difference is more a nice side-effect 
of an already commonly recommended practice for large VM image files, 
than something you'd turn off checksumming for in the first place.  
Certainly, on most files I'd prefer the checksums, and in fact am running 
btrfs raid1 mode here specifically to get the benefit of having a second 
copy to retrieve from if the first attempted copy fails checksum.  But VM 
images and database files are a bit of an exception.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Duncan

Josef Bacik posted on Fri, 19 Dec 2014 16:17:08 -0500 as excerpted:

 tl;dr: Cow means you can in the worst case end up using 2 * filesize -
 blocksize of data on disk and the file will appear to be filesize.

Thanks for the tl;dr /and/ the very sensible longer explanation.  That's 
a very nice thing to know and to file away for further reference. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Can BTRFS handle XATTRs larger than 4K?

2014-12-19 Thread Richard Sharpe

Hi folks,

I need a Linux file system that supports XATTRs up to 64K.

Can BTRFS support that or is XFS the only Linux file system with such support?

-- 
Regards,
Richard Sharpe
(何以解憂？唯有杜康。--曹操)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Zygo Blaxell

On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote:
 And for your inode you now have this
 
 inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
 disklen 4k
 inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123,
 disklen 302g
 
 and in your extent tree you have
 
 extent bytenr 123, len 302g, refs 1
 extent bytenr whatever, len 4k, refs 1
 
 See that?  Your file is still the same size, it is still 302g.  If you
 cp'ed it right now it would copy 302g of information.  But what you have
 actually allocated on disk?  Well that's now 302g + 4k.  Now lets say
 your virt thing decides to write to the middle, lets say at offset 12k,
 now you have this
 
 inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
 disklen 4k
 inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
 inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
 disklen 4k
 inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123,
 disklen 302g
 
 and in the extent tree you have this
 
 extent bytenr 123, len 302g, refs 2
 extent bytenr whatever, len 4k, refs 1
 extent bytenr notimportant, len 4k, refs 1
 
 See that refs 2 change?  We split the original extent, so we have 2 file
 extents pointing to the same physical extents, so we bumped the ref
 count.  This will happen over and over again until we have completely
 overwritten the original extent, at which point your space usage will go
 back down to ~302g.

Wait, *what*?

OK, I did a small experiment, and found that btrfs actually does do
something like this.  Can't argue with fact, though it would be nice if
btrfs could be smarter and drop unused portions of the original extent
sooner.  :-P

The above quoted scenario is a little oversimplified.  Chances are that
302G file is made of much smaller extents (128M..256M).  If the VM is
writing 4K randomly everywhere then those 128M+ extents are not going
away any time soon.  Even the extents that are dropped stick around for
a few btrfs transaction commits before they go away.

I couldn't reproduce this behavior until I realized the extents I was
overwriting in my tests were exactly the same size and position of
the extents on disk.  I changed the offset slightly and found that
partially-overwritten extents do in fact stick around in their entirety.

There seems to be an unexpected benefit for compression here:  compression
keeps the extents small, so many small updates will be less likely to
leave big mostly-unused extents lying around the filesystem.


signature.asc
Description: Digital signature

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Daniele Testa

But I read somewhere that compression should be turned off on mounts
that just store large VM-images. Is that wrong?

Btw, I am not pre-allocation space for the images. I use sparse files with:

dd if=/dev/zero of=drive.img bs=1 count=1 seek=300G

It creates the file in a few ms.
Is it better to use fallocate with btrfs?

If I use sparse files, it adds a benefit when I want to copy/move the
image-file to another server.
Like if the 300GB sparse file just has 10GB of data in it, I only need
to copy 10GB when moving it to another server.
Would the same be true with fallocate?

Anyways, would disabling CoW (by putting +C on the parent dir) prevent
the performance issues and 2*filesize issue?

2014-12-20 13:52 GMT+08:00 Zygo Blaxell ce3g8...@umail.furryterror.org:
 On Fri, Dec 19, 2014 at 04:17:08PM -0500, Josef Bacik wrote:
 And for your inode you now have this
 
 inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
 disklen 4k
 inode 256, file offset 4k, size 302g-4k, offset 4k, diskbytenr 123,
 disklen 302g
 
 and in your extent tree you have
 
 extent bytenr 123, len 302g, refs 1
 extent bytenr whatever, len 4k, refs 1
 
 See that?  Your file is still the same size, it is still 302g.  If you
 cp'ed it right now it would copy 302g of information.  But what you have
 actually allocated on disk?  Well that's now 302g + 4k.  Now lets say
 your virt thing decides to write to the middle, lets say at offset 12k,
 now you have this
 
 inode 256, file offset 0, size 4k, offset 0, diskebytenr (123+302g),
 disklen 4k
 inode 256, file offset 4k, size 3k, offset 4k, diskbytenr 123, disklen 302g
 inode 256, file offset 12k, size 4k, offset 0, diskebytenr whatever,
 disklen 4k
 inode 256, file offset 16k, size 302g - 16k, offset 16k, diskbytenr 123,
 disklen 302g
 
 and in the extent tree you have this
 
 extent bytenr 123, len 302g, refs 2
 extent bytenr whatever, len 4k, refs 1
 extent bytenr notimportant, len 4k, refs 1
 
 See that refs 2 change?  We split the original extent, so we have 2 file
 extents pointing to the same physical extents, so we bumped the ref
 count.  This will happen over and over again until we have completely
 overwritten the original extent, at which point your space usage will go
 back down to ~302g.

 Wait, *what*?

 OK, I did a small experiment, and found that btrfs actually does do
 something like this.  Can't argue with fact, though it would be nice if
 btrfs could be smarter and drop unused portions of the original extent
 sooner.  :-P

 The above quoted scenario is a little oversimplified.  Chances are that
 302G file is made of much smaller extents (128M..256M).  If the VM is
 writing 4K randomly everywhere then those 128M+ extents are not going
 away any time soon.  Even the extents that are dropped stick around for
 a few btrfs transaction commits before they go away.

 I couldn't reproduce this behavior until I realized the extents I was
 overwriting in my tests were exactly the same size and position of
 the extents on disk.  I changed the offset slightly and found that
 partially-overwritten extents do in fact stick around in their entirety.

 There seems to be an unexpected benefit for compression here:  compression
 keeps the extents small, so many small updates will be less likely to
 leave big mostly-unused extents lying around the filesystem.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs is using 25% more disk than it should

2014-12-19 Thread Duncan

Daniele Testa posted on Sat, 20 Dec 2014 14:18:31 +0800 as excerpted:

 Anyways, would disabling CoW (by putting +C on the parent dir) prevent
 the performance issues and 2*filesize issue?

It should, provided you don't then start snapshotting the file (which I 
don't believe you intend to do but just in case...).

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

54 matches

Mail list logo