Re: [PATCH 00/10] Qgroup fixes for kdave/for-next-20161125 branch

2017-02-16 Thread Qu Wenruo



At 02/17/2017 09:04 AM, Qu Wenruo wrote:

Hi David,

Any update about this patchset?

As I see a lot of cleanups related to qgroup is submitted to mail list,
should I rebase this patchset to handle the conflicts using your latest
for-next?

Or should I rebase them only after cleanups got merged into mainline?


Anyway, I rebased the qgroup fixes only to v4.10-rc8 and uploaded it to
https://github.com/adam900710/linux/tree/qgroup_fixes.

Several conflicts are detected, mainly due to Wang's reserve type patch 
to fix compression ENOSPC.


Xfstests detects no bug for qgroup test group.

Thanks,
Qu


Thanks,
Qu

At 12/09/2016 10:28 AM, Qu Wenruo wrote:

The branch can be fetched from github:
https://github.com/adam900710/linux.git for-david-next-qgroup-fixes

If David wants to push these fixes to 4.10, then I can rebase these
patches to
Chris' for-linus branch.

Recent qgroup fixes for several problems:
1) Qgroup reserved space underflow
   Caused by several reasons, from buffer write happens before qgroup
enable,
   to freeing qgroup space not reserved by caller.

2) inode_cache mount option corruption

3) Enhance qgroup trace point
   Used for debugging above problems.

All patches are already submitted to mail list.
The 1st patch is the diff between V4 and V5 patch which adds WARN_ON()
for
underflowing qgroup reserved space.

Qu Wenruo (10):
  btrfs: qgroup: Fix wrong qgroup passed to reserved space error report
  btrfs: qgroup: Add trace point for qgroup reserved space
  btrfs: qgroup: Re-arrange tracepoint timing to co-operate with
reserved space tracepoint
  btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount
option
  btrfs: qgroup: Add quick exit for non-fs extents
  btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  btrfs: qgroup: Return actually freed bytes for qgroup release or free
data
  btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered
write and quota enable
  btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  btrfs: qgroup: Fix qgroup reserved space underflow by only freeing
reserved ranges

 fs/btrfs/ctree.h |  14 +--
 fs/btrfs/extent-tree.c   |  32 ---
 fs/btrfs/extent_io.h |  24 +-
 fs/btrfs/file.c  |  45 ++
 fs/btrfs/inode-map.c |   4 +-
 fs/btrfs/inode.c |  69 +--
 fs/btrfs/ioctl.c |   9 +-
 fs/btrfs/qgroup.c| 197
+++
 fs/btrfs/qgroup.h|  14 ++-
 fs/btrfs/relocation.c|  12 +--
 fs/btrfs/transaction.c   |  20 ++---
 include/trace/events/btrfs.h |  43 ++
 12 files changed, 341 insertions(+), 142 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: test decompression in the middle of large extents

2017-02-16 Thread Omar Sandoval
From: Omar Sandoval 

This is a regression test for "Btrfs: fix btrfs_decompress_buf2page()".
It fails for zlib on v4.10-rc[1-7].

Signed-off-by: Omar Sandoval 
---
This runs in <60 seconds on my test VM, which I think qualifies for the
quick group?

 common/btrfs|  8 +
 tests/btrfs/137 | 96 +
 tests/btrfs/137.out |  2 ++
 tests/btrfs/group   |  1 +
 4 files changed, 107 insertions(+)
 create mode 100755 tests/btrfs/137
 create mode 100644 tests/btrfs/137.out

diff --git a/common/btrfs b/common/btrfs
index 96c3635b..015ce4d2 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -364,3 +364,11 @@ _reload_btrfs_ko()
modprobe -r btrfs || _fail "btrfs unload failed"
modprobe btrfs || _fail "btrfs load failed"
 }
+
+_btrfs_compression_algos()
+{
+   echo zlib
+   if [ -e /sys/fs/btrfs/features/compress_lzo ]; then
+   echo lzo
+   fi
+}
diff --git a/tests/btrfs/137 b/tests/btrfs/137
new file mode 100755
index ..c3f28cc0
--- /dev/null
+++ b/tests/btrfs/137
@@ -0,0 +1,96 @@
+#! /bin/bash
+# FS QA Test 137
+#
+# Test decompression in the middle of large extents. Regression test for Linux
+# kernel commit 6e78b3f7a193 ("Btrfs: fix btrfs_decompress_buf2page()").
+#
+#---
+# Copyright (c) 2017 Facebook.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+
+algos=($(_btrfs_compression_algos))
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+# Need 1GB for the uncompressed file plus <1GB for each compressed file.
+_require_fs_space $SCRATCH_MNT $((1024 * 1024 * (1 + ${#algos[@]})))
+
+# Checksum a piece in the middle of the file. This hits the unaligned case that
+# caused the original bug.
+do_csum()
+{
+   dd if="$1" bs=4K skip=555 count=100 2>>$seqres.full| md5sum | cut -d ' 
' -f 1
+}
+
+# Create a large, uncompressed (but compressible) file.
+touch "${SCRATCH_MNT}/uncompressed"
+$BTRFS_UTIL_PROG property set "${SCRATCH_MNT}/uncompressed" compression ""
+_ddt of="${SCRATCH_MNT}/uncompressed" bs=1M count=1K 2>&1 | _filter_dd
+
+csum="$(do_csum "${SCRATCH_MNT}/uncompressed")"
+
+for algo in ${algos[@]}; do
+   echo "Testing ${algo}" >> $seqres.full
+
+   # Copy the same data, but with compression enabled.
+   touch "${SCRATCH_MNT}/${algo}"
+   $BTRFS_UTIL_PROG property set "${SCRATCH_MNT}/${algo}" compression 
"${algo}"
+   dd if="${SCRATCH_MNT}/uncompressed" of="${SCRATCH_MNT}/${algo}" bs=1M 
2>&1 | _filter_dd
+
+   # The correct data is likely still cached. Cycle the mount to drop the
+   # cache and start fresh.
+   _scratch_cycle_mount
+
+   # Check the checksum.
+   compressed_csum="$(do_csum "${SCRATCH_MNT}/${algo}")"
+   if [ "${compressed_csum}" != "${csum}" ]; then
+   echo "Checksum mismatch for ${algo} (expected ${csum}, got 
${compressed_csum})"
+   fi
+done
+
+echo "Silence is golden"
+
+status=0
+exit
diff --git a/tests/btrfs/137.out b/tests/btrfs/137.out
new file mode 100644
index ..eb48dd21
--- /dev/null
+++ b/tests/btrfs/137.out
@@ -0,0 +1,2 @@
+QA output created by 137
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index ea88ba4e..0a119e4b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -139,3 +139,4 @@
 134 auto quick send
 135 auto quick send
 136 auto convert
+137 auto quick compress
-- 
2.11.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] common/rc: remove unnecessary cat in _ddt

2017-02-16 Thread Omar Sandoval
From: Omar Sandoval 

Signed-off-by: Omar Sandoval 
---
 common/rc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/common/rc b/common/rc
index 76515e2b..6304d690 100644
--- a/common/rc
+++ b/common/rc
@@ -2417,7 +2417,7 @@ _die()
 # convert urandom incompressible data to compressible text data
 _ddt()
 {
-   cat /dev/urandom | od | dd iflag=fullblock ${*}
+   od /dev/urandom | dd iflag=fullblock ${*}
 }
 
 #takes files, randomdata
-- 
2.11.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] Qgroup fixes for kdave/for-next-20161125 branch

2017-02-16 Thread Qu Wenruo

Hi David,

Any update about this patchset?

As I see a lot of cleanups related to qgroup is submitted to mail list, 
should I rebase this patchset to handle the conflicts using your latest 
for-next?


Or should I rebase them only after cleanups got merged into mainline?

Thanks,
Qu

At 12/09/2016 10:28 AM, Qu Wenruo wrote:

The branch can be fetched from github:
https://github.com/adam900710/linux.git for-david-next-qgroup-fixes

If David wants to push these fixes to 4.10, then I can rebase these patches to
Chris' for-linus branch.

Recent qgroup fixes for several problems:
1) Qgroup reserved space underflow
   Caused by several reasons, from buffer write happens before qgroup enable,
   to freeing qgroup space not reserved by caller.

2) inode_cache mount option corruption

3) Enhance qgroup trace point
   Used for debugging above problems.

All patches are already submitted to mail list.
The 1st patch is the diff between V4 and V5 patch which adds WARN_ON() for
underflowing qgroup reserved space.

Qu Wenruo (10):
  btrfs: qgroup: Fix wrong qgroup passed to reserved space error report
  btrfs: qgroup: Add trace point for qgroup reserved space
  btrfs: qgroup: Re-arrange tracepoint timing to co-operate with
reserved space tracepoint
  btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount
option
  btrfs: qgroup: Add quick exit for non-fs extents
  btrfs: qgroup: Cleanup btrfs_qgroup_prepare_account_extents function
  btrfs: qgroup: Return actually freed bytes for qgroup release or free
data
  btrfs: qgroup: Fix qgroup reserved space underflow caused by buffered
write and quota enable
  btrfs: qgroup: Introduce extent changeset for qgroup reserve functions
  btrfs: qgroup: Fix qgroup reserved space underflow by only freeing
reserved ranges

 fs/btrfs/ctree.h |  14 +--
 fs/btrfs/extent-tree.c   |  32 ---
 fs/btrfs/extent_io.h |  24 +-
 fs/btrfs/file.c  |  45 ++
 fs/btrfs/inode-map.c |   4 +-
 fs/btrfs/inode.c |  69 +--
 fs/btrfs/ioctl.c |   9 +-
 fs/btrfs/qgroup.c| 197 +++
 fs/btrfs/qgroup.h|  14 ++-
 fs/btrfs/relocation.c|  12 +--
 fs/btrfs/transaction.c   |  20 ++---
 include/trace/events/btrfs.h |  43 ++
 12 files changed, 341 insertions(+), 142 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: misc-tests: Superblock corruption and recovery using backup.

2017-02-16 Thread Lakshmipathi.G
On Thu, Feb 16, 2017 at 09:05:02AM +0800, Qu Wenruo wrote:
> 
> 
> At 02/16/2017 04:50 AM, Lakshmipathi.G wrote:
> >Signed-off-by: Lakshmipathi.G 
> 
> Looks good to me.
> 
> Reviewed by: Qu Wenruo 
> 
> BTW, it would be better to have some commit message.

ok, sent the v3 patch with commit message.

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Handle btrfs_reloc_clone_csums error correctly to avoid deadlock

2017-02-16 Thread Qu Wenruo



At 02/16/2017 06:03 PM, Filipe Manana wrote:

On Thu, Feb 16, 2017 at 12:39 AM, Qu Wenruo  wrote:



At 02/15/2017 11:59 PM, Filipe Manana wrote:


On Wed, Feb 15, 2017 at 8:49 AM, Qu Wenruo 
wrote:


If run btrfs/125 with nospace_cache or space_cache=v2 mount option,
btrfs will block with the following backtrace:

Call Trace:
 __schedule+0x2d4/0xae0
 schedule+0x3d/0x90
 btrfs_start_ordered_extent+0x160/0x200 [btrfs]
 ? wake_atomic_t_function+0x60/0x60
 btrfs_run_ordered_extent_work+0x25/0x40 [btrfs]
 btrfs_scrubparity_helper+0x1c1/0x620 [btrfs]
 btrfs_flush_delalloc_helper+0xe/0x10 [btrfs]
 process_one_work+0x2af/0x720
 ? process_one_work+0x22b/0x720
 worker_thread+0x4b/0x4f0
 kthread+0x10f/0x150
 ? process_one_work+0x720/0x720
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x2e/0x40

The direct cause is the error handler in run_delalloc_nocow() doesn't
handle error from btrfs_reloc_clone_csums() well.

The related part call path will be:
__extent_writepage
|- writepage_delalloc()
|  |- run_delalloc_range()
| |- run_delalloc_nocow()
||- btrfs_add_ordered_extent()
||  Now one ordered extent for file range, e.g [0, 1M) is
inserted
||
||- btrfs_reloc_clone_csums()
||  Fails with -EIO, as RAID5/6 doesn't repair some csum tree
||  blocks
||
||- extent_clear_unlock_delalloc()
|   Error routine, unlock and clear page DIRTY, end page
writeback
|   So the remaining 255 pages will not go through writeback
|
|- __extent_writepage_io()
   |- writepage_end_io_hook()
  |- btrfs_dev_test_ordered_pending()
 Reduce ordered_extent->bytes_left by 4K.
 Still have (1M - 4K) to finish.

While the remaining 255 pages will not go through IO nor trigger
writepage_end_io_hook(), the ordered extent for [0, 1M) will
never finish, and blocking current transaction forever.

Although the root cause is still in RAID5/6, it won't hurt to fix the
error routine first.

This patch will cleanup the ordered extent in error routine, so at least
we won't cause deadlock.

Signed-off-by: Qu Wenruo 
---
 fs/btrfs/extent_io.c|  1 -
 fs/btrfs/inode.c| 10 --
 fs/btrfs/ordered-data.c | 25 +
 fs/btrfs/ordered-data.h | 10 ++
 4 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 4ac383a3a649..a14d1b0840c5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3258,7 +3258,6 @@ static noinline_for_stack int
writepage_delalloc(struct inode *inode,
   delalloc_end,
   _started,
   nr_written);
-   /* File system has been set read-only */
if (ret) {
SetPageError(page);
/* fill_delalloc should be return < 0 for error
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1e861a063721..3c3ade58afd7 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1052,8 +1052,11 @@ static noinline int cow_file_range(struct inode
*inode,
BTRFS_DATA_RELOC_TREE_OBJECTID) {
ret = btrfs_reloc_clone_csums(inode, start,
  cur_alloc_size);
-   if (ret)
+   if (ret) {
+   btrfs_clean_ordered_extent(inode, start,
+  ram_size);
goto out_drop_extent_cache;
+   }
}

btrfs_dec_block_group_reservations(fs_info,
ins.objectid);
@@ -1538,7 +1541,7 @@ static noinline int run_delalloc_nocow(struct inode
*inode,
if (!ret)
ret = err;

-   if (ret && cur_offset < end)
+   if (ret && cur_offset < end) {
extent_clear_unlock_delalloc(inode, cur_offset, end, end,
 locked_page, EXTENT_LOCKED |
 EXTENT_DELALLOC |
EXTENT_DEFRAG |
@@ -1546,6 +1549,9 @@ static noinline int run_delalloc_nocow(struct inode
*inode,
 PAGE_CLEAR_DIRTY |
 PAGE_SET_WRITEBACK |
 PAGE_END_WRITEBACK);
+   btrfs_clean_ordered_extent(inode, cur_offset,
+  end - cur_offset + 1);



So this is partially correct only.
First here you can have 0, 1 or more ordered extents that were created
in the while loop. So your new function must find and process all
ordered extents within the delalloc range and not just the last one
that was created.



OK, I found that only btrfs_reloc_clone_csums() will need to cleanup 

[PATCH v3] btrfs-progs: misc-tests: Superblock corruption and recovery using backup.

2017-02-16 Thread Lakshmipathi.G
Test script to recover damaged primary superblock using backup superblock.

Signed-off-by: Lakshmipathi.G 
---
 .../019-fix-superblock-corruption/test.sh  | 36 ++
 1 file changed, 36 insertions(+)
 create mode 100755 tests/misc-tests/019-fix-superblock-corruption/test.sh

diff --git a/tests/misc-tests/019-fix-superblock-corruption/test.sh 
b/tests/misc-tests/019-fix-superblock-corruption/test.sh
new file mode 100755
index 000..95815e4
--- /dev/null
+++ b/tests/misc-tests/019-fix-superblock-corruption/test.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+#
+# Corrupt primary superblock and restore it using backup superblock.
+#
+
+source $TOP/tests/common
+
+check_prereq btrfs-select-super
+check_prereq btrfs
+
+setup_root_helper
+prepare_test_dev 512M
+
+FIRST_SUPERBLOCK_OFFSET=65536
+
+test_superblock_restore()
+{
+   run_check $SUDO_HELPER $TOP/mkfs.btrfs -f $TEST_DEV
+
+   # Corrupt superblock checksum
+dd if=/dev/zero of=$TEST_DEV seek=$FIRST_SUPERBLOCK_OFFSET bs=1 \
+count=4  conv=notrunc &> /dev/null
+   # Run btrfs check to detect corruption
+   $TOP/btrfs check $TEST_DEV >& /dev/null && \
+   _fail "btrfs check should detect corruption"
+   # Copy backup superblock to primary
+   run_check $TOP/btrfs-select-super -s 1 $TEST_DEV
+
+   echo "Performing btrfs check" &>> $RESULTS
+   $TOP/btrfs check $TEST_DEV &>> $RESULTS
+if [ $? -ne 0 ]; then
+   _fail "Failed to fix superblock."
+fi
+}
+
+test_superblock_restore
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] btrfs-progs: RAID5:Inject data stripe corruption and verify scrub fixes it.

2017-02-16 Thread Lakshmipathi.G
Test script to create file with specific data-stripe layout and pre-computed 
parity values.
Inject corruption into data-stripe and verify scrubbing process fixes corrupted 
block.
It also attempts to validate the corresponding parity block value.

Signed-off-by: Lakshmipathi.G 
---
 .../020-raid5-datastripe-corruption/test.sh| 275 +
 1 file changed, 275 insertions(+)
 create mode 100755 tests/misc-tests/020-raid5-datastripe-corruption/test.sh

diff --git a/tests/misc-tests/020-raid5-datastripe-corruption/test.sh 
b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
new file mode 100755
index 000..3c743eb
--- /dev/null
+++ b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
@@ -0,0 +1,275 @@
+#!/bin/bash
+#
+# Raid5: Inject data stripe corruption and fix them using scrub.
+# 
+# Script will perform the following:
+# 1) Create Raid5 using 3 loopback devices.
+# 2) Ensure file layout is created in a predictable manner. 
+#Each data stripe(64KB) should uniquely start with 'DN',   
+#where N represents the data stripe number.(ex:D0,D1 etc)
+# 3) Once file is created with specific layout, gather data stripe details 
+#like devicename, position and actual on-disk data.
+# 4) Now use 'dd' to verify the data-stripe against its expected value.
+# 5) At this stage, gather possible parity location for this specific data
+#stripe. The parity values is pre-computed for known data-stripes.
+# 6) Inject corruption into given data-stripe by zero'ing out its first 4 
bytes.
+# 7) After injecting corruption, running online-scrub is expected to fix 
+#the corrupted data stripe with the help of parity block and 
+#corresponding data stripe. 
+# 8) If scrub successful, verify the data stripe has original un-corrupted 
value.
+# 9) If scrub successful, re-generate list of possible parity blocks, this
+#list must match one prepared in step 5. Otherwise report it as possible 
parity 
+#bug issue.
+# 10) If no issues found, cleanup files and devices. Repeat the process for 
+#different file size and data-stripe.
+#
+#  Note: This script doesn't handle parity block corruption.
+
+source $TOP/tests/common
+
+check_prereq btrfs
+check_prereq mkfs.btrfs
+
+setup_root_helper
+prepare_test_dev 512M
+
+ndevs=3
+declare -a devs
+device_name=""
+stripe_offset=""
+stripe_content=""
+known_parity_value="0001"
+
+LAYOUT_TMP=$(mktemp --tmpdir btrfs-progs-raid5-file.layoutXX)
+STRIPEINFO_TMP=$(mktemp --tmpdir btrfs-progs-raid5-file.infoXX)
+PARITYINFO1_TMP=$(mktemp --tmpdir btrfs-progs-raid5-initial.parityXX)
+PARITYINFO2_TMP=$(mktemp --tmpdir btrfs-progs-raid5-end.parityXX)
+
+prepare_devices()
+{
+   for i in `seq $ndevs`; do
+   touch img$i
+   chmod a+rw img$i
+   truncate -s0 img$i
+   truncate -s512M img$i
+   devs[$i]=`run_check_stdout $SUDO_HELPER losetup --find --show 
img$i`
+   done
+}
+
+cleanup_devices()
+{
+   for dev in ${devs[@]}; do
+   run_check $SUDO_HELPER losetup -d $dev
+   done
+   for i in `seq $ndevs`; do
+   truncate -s0 img$i
+   done
+   run_check $SUDO_HELPER losetup --all
+}
+
+test_do_mkfs()
+{
+   run_check $SUDO_HELPER $TOP/mkfs.btrfs -f   \
+   $@
+}
+
+test_mkfs_multi()
+{
+   test_do_mkfs $@ ${devs[@]}
+}
+
+#$1 Filename
+#$2 Expected no.of data stripes for the file.
+create_layout(){
+   fname=$1
+   size=$(( $2 * 65536 ))
+   n=0
+   bs_value=1
+   stripe=0
+   while (( $n < $size ))
+   do
+   if [ $(( $n % 65536 )) -eq 0 ]; then
+   val='D'$stripe
+   echo -n $val
+   stripe=$(( $stripe+1 ))
+   # ensure proper value   
+   bs_value=`echo "${#val}"` 
+   else
+   echo -n 'x'
+   bs_value=1
+   fi
+n=$(( $n+$bs_value ))
+   done | dd of="$TEST_MNT"/$fname bs=$bs_value conv=notrunc &> /dev/null
+}
+
+find_data_stripe_details(){
+   for dev in ${devs[@]}; do
+   echo $dev >> $LAYOUT_TMP
+   $SUDO_HELPER cat $dev | hexdump -e '"%010_ad|" 16/1 "%_p" 
"|\n"' |
+   grep -P 'D[0-9]+xx'  >> $LAYOUT_TMP
+   done
+}
+#$1 - file to save possible parity locations
+find_parity_stripe_count(){
+   PARITY_COUNT=$1
+   for dev in ${devs[@]}; do
+   echo "Possible parity stripes locations on device: $dev" >> 
$PARITY_COUNT
+   $SUDO_HELPER cat $dev | hexdump -e '"%010_ad|" 16/1 "%02X" 
"\n"' |
+   grep $known_parity_value >> $PARITY_COUNT
+   done
+}
+
+#Collect data stripe information in a readable manner.
+save_data_stripe_details(){
+   devname=""
+   for entry in `cat $LAYOUT_TMP`; do  
+   

Re: FS gives kernel UPS on attempt to create snapshot and after running balance it's unmountable.

2017-02-16 Thread Tomasz Kusmierz
Thanks Qu,

Just before I’ll go and accidentally mess up this FS more - I’ve
mentioned originally that this problem started with FS not being able
to create a snapshot ( it would get remounted RO automatically ) for
about a month, and when I’ve realised that there is a problem like
that I’ve attempted a full FS balance that caused this FS to be
unmountable. Is there any other debug you would require before I
proceed (I’ve got a lot i

On 16 Feb 2017, at 01:26, Qu Wenruo  wrote:



At 02/15/2017 10:11 PM, Tomasz Kusmierz wrote:

So guys, any help here ? I’m kinda stuck now with system just idling
and doing nothing while I wait for some feedback ...


Sorry for the late reply.

Busying debugging a kernel bug.

On 14 Feb 2017, at 19:38, Tomasz Kusmierz  wrote:

[root@server ~]#  btrfs-show-super -af /dev/sdc
superblock: bytenr=65536, device=/dev/sdc
-
csum_type   0 (crc32c)
csum_size   4
csum0x17d56ce0 [match]


This superblock is good.

bytenr  65536
flags   0x1
  ( WRITTEN )
magic   _BHRfS_M [match]
fsid0576d577-8954-4a60-a02b-9492b3c29318
label   main_pool
generation  150682
root5223857717248
sys_array_size  321
chunk_root_generation   150678
root_level  1
chunk_root  8669488005120
chunk_root_level1
log_root0
log_root_transid0
log_root_level  0
total_bytes 16003191472128
bytes_used  6411278503936
sectorsize  4096
nodesize16384
leafsize16384
stripesize  4096
root_dir6
num_devices 8
compat_flags0x0
compat_ro_flags 0x0
incompat_flags  0x161
  ( MIXED_BACKREF |
BIG_METADATA |
EXTENDED_IREF |
SKINNY_METADATA )
cache_generation150682
uuid_tree_generation150679
dev_item.uuid   46abffa8-7afe-451f-93c6-abb8e589c4e8
dev_item.fsid   0576d577-8954-4a60-a02b-9492b3c29318 [match]
dev_item.type   0
dev_item.total_bytes2000398934016
dev_item.bytes_used 1647136735232
dev_item.io_align   4096
dev_item.io_width   4096
dev_item.sector_size4096
dev_item.devid  1
dev_item.dev_group  0
dev_item.seek_speed 0
dev_item.bandwidth  0
dev_item.generation 0
sys_chunk_array[2048]:
  item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 8669487824896)
  length 67108864 owner 2 stripe_len 65536 type SYSTEM|RAID10
  io_align 65536 io_width 65536 sector_size 4096
  num_stripes 8 sub_stripes 2
  stripe 0 devid 7 offset 1083674984448
  dev_uuid 566fb8a3-d6de-4230-8b70-a5fda0a120f6
  stripe 1 devid 8 offset 1083674984448
  dev_uuid 845aefb2-e0a6-479a-957b-a82fb7207d6c
  stripe 2 devid 1 offset 1365901312
  dev_uuid 46abffa8-7afe-451f-93c6-abb8e589c4e8
  stripe 3 devid 3 offset 1345978368
  dev_uuid 95921633-2fc1-479f-a3ba-e6e5a1989755
  stripe 4 devid 4 offset 1345978368
  dev_uuid 20828f0e-4661-4987-ac11-72814c1e423a
  stripe 5 devid 5 offset 1345978368
  dev_uuid 2c3cd71f-5178-48e7-8032-6b6eec023197
  stripe 6 devid 6 offset 1345978368
  dev_uuid 806a47e5-cac4-41c9-abb9-5c49506459e1
  stripe 7 devid 2 offset 1345978368
  dev_uuid e1358e0e-edaf-4505-9c71-ed0862c45841


And I didn't see anything wrong in sys_chunk_array.


Would you please try to mount the fs with latest kernel?
Better later than v4.9, as in that version extra kernel messages are
introduced to give more details about what's going wrong.

Thanks,
Qu

backup_roots[4]:
  backup 0:
  backup_tree_root:   5223857717248   gen: 150680 level: 1
  backup_chunk_root:  8669488005120   gen: 150678 level: 1
  backup_extent_root: 5223867383808   gen: 150680 level: 2
  backup_fs_root: 0   gen: 0  level: 0
  backup_dev_root:5224791523328   gen: 150680 level: 1
  backup_csum_root:   5224802140160   gen: 150680 level: 3
  backup_total_bytes: 16003191472128
  backup_bytes_used:  6411278503936
  backup_num_devices: 8

  backup 1:
  backup_tree_root:   5224155807744   gen: 150681 level: 1
  backup_chunk_root:  8669488005120   gen: 150678 level: 1
  backup_extent_root: 

Re: [PATCH 2/2] btrfs: test decompression in the middle of large extents

2017-02-16 Thread Omar Sandoval
On Fri, Feb 17, 2017 at 01:58:49PM +0800, Eryu Guan wrote:
> On Thu, Feb 16, 2017 at 06:32:49PM -0800, Omar Sandoval wrote:
> > From: Omar Sandoval 
> > 
> > This is a regression test for "Btrfs: fix btrfs_decompress_buf2page()".
> > It fails for zlib on v4.10-rc[1-7].
> > 
> > Signed-off-by: Omar Sandoval 
> > ---
> > This runs in <60 seconds on my test VM, which I think qualifies for the
> > quick group?
> 
> quick test usually takes <30s, so probably it's not quick enough :)
> (it takes ~80s on my test vm, which has 4vcpus and 8G memory)

Ok, I'll remove it from the quick group :)

> > 
> >  common/btrfs|  8 +
> >  tests/btrfs/137 | 96 
> > +
> >  tests/btrfs/137.out |  2 ++
> >  tests/btrfs/group   |  1 +
> >  4 files changed, 107 insertions(+)
> >  create mode 100755 tests/btrfs/137
> >  create mode 100644 tests/btrfs/137.out
> > 
> > diff --git a/common/btrfs b/common/btrfs
> > index 96c3635b..015ce4d2 100644
> > --- a/common/btrfs
> > +++ b/common/btrfs
> > @@ -364,3 +364,11 @@ _reload_btrfs_ko()
> > modprobe -r btrfs || _fail "btrfs unload failed"
> > modprobe btrfs || _fail "btrfs load failed"
> >  }
> > +
> > +_btrfs_compression_algos()
> > +{
> > +   echo zlib
> > +   if [ -e /sys/fs/btrfs/features/compress_lzo ]; then
> > +   echo lzo
> 
> If new compression algorithms are added, are there going to be new
> features/compress_$algo entry added? If so I'd suggest that take a more
> generic way to query supported compression algos, rather than
> "hard-coded" zlib and lzo.

I think that's safe to assume. I'll adapt this.

> > +   fi
> > +}
> > diff --git a/tests/btrfs/137 b/tests/btrfs/137
> > new file mode 100755
> > index ..c3f28cc0
> > --- /dev/null
> > +++ b/tests/btrfs/137
> > @@ -0,0 +1,96 @@
> > +#! /bin/bash
> > +# FS QA Test 137
> > +#
> > +# Test decompression in the middle of large extents. Regression test for 
> > Linux
> > +# kernel commit 6e78b3f7a193 ("Btrfs: fix btrfs_decompress_buf2page()").
> > +#
> > +#---
> > +# Copyright (c) 2017 Facebook.  All Rights Reserved.
> > +#
> > +# This program is free software; you can redistribute it and/or
> > +# modify it under the terms of the GNU General Public License as
> > +# published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope that it would be useful,
> > +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +# GNU General Public License for more details.
> > +#
> > +# You should have received a copy of the GNU General Public License
> > +# along with this program; if not, write the Free Software Foundation,
> > +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> > +#---
> > +#
> > +
> > +seq=`basename $0`
> > +seqres=$RESULT_DIR/$seq
> > +echo "QA output created by $seq"
> > +
> > +here=`pwd`
> > +tmp=/tmp/$$
> > +status=1   # failure is the default!
> > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > +
> > +_cleanup()
> > +{
> > +   cd /
> > +   rm -f $tmp.*
> > +}
> > +
> > +# get standard environment, filters and checks
> > +. ./common/rc
> > +. ./common/filter
> > +
> > +# remove previous $seqres.full before test
> > +rm -f $seqres.full
> > +
> > +# real QA test starts here
> > +
> > +_supported_fs btrfs
> > +_supported_os Linux
> > +_require_scratch
> 
> Better to have
> 
> _require_btrfs_command "property"
> 
> Not a big deal, but nice to have :)

Will fix. Thanks for the review.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs: test decompression in the middle of large extents

2017-02-16 Thread Eryu Guan
On Thu, Feb 16, 2017 at 06:32:49PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> This is a regression test for "Btrfs: fix btrfs_decompress_buf2page()".
> It fails for zlib on v4.10-rc[1-7].
> 
> Signed-off-by: Omar Sandoval 
> ---
> This runs in <60 seconds on my test VM, which I think qualifies for the
> quick group?

quick test usually takes <30s, so probably it's not quick enough :)
(it takes ~80s on my test vm, which has 4vcpus and 8G memory)

> 
>  common/btrfs|  8 +
>  tests/btrfs/137 | 96 
> +
>  tests/btrfs/137.out |  2 ++
>  tests/btrfs/group   |  1 +
>  4 files changed, 107 insertions(+)
>  create mode 100755 tests/btrfs/137
>  create mode 100644 tests/btrfs/137.out
> 
> diff --git a/common/btrfs b/common/btrfs
> index 96c3635b..015ce4d2 100644
> --- a/common/btrfs
> +++ b/common/btrfs
> @@ -364,3 +364,11 @@ _reload_btrfs_ko()
>   modprobe -r btrfs || _fail "btrfs unload failed"
>   modprobe btrfs || _fail "btrfs load failed"
>  }
> +
> +_btrfs_compression_algos()
> +{
> + echo zlib
> + if [ -e /sys/fs/btrfs/features/compress_lzo ]; then
> + echo lzo

If new compression algorithms are added, are there going to be new
features/compress_$algo entry added? If so I'd suggest that take a more
generic way to query supported compression algos, rather than
"hard-coded" zlib and lzo.

> + fi
> +}
> diff --git a/tests/btrfs/137 b/tests/btrfs/137
> new file mode 100755
> index ..c3f28cc0
> --- /dev/null
> +++ b/tests/btrfs/137
> @@ -0,0 +1,96 @@
> +#! /bin/bash
> +# FS QA Test 137
> +#
> +# Test decompression in the middle of large extents. Regression test for 
> Linux
> +# kernel commit 6e78b3f7a193 ("Btrfs: fix btrfs_decompress_buf2page()").
> +#
> +#---
> +# Copyright (c) 2017 Facebook.  All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU General Public License as
> +# published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope that it would be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write the Free Software Foundation,
> +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
> +#---
> +#
> +
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_scratch

Better to have

_require_btrfs_command "property"

Not a big deal, but nice to have :)

Thanks,
Eryu

> +
> +algos=($(_btrfs_compression_algos))
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount
> +# Need 1GB for the uncompressed file plus <1GB for each compressed file.
> +_require_fs_space $SCRATCH_MNT $((1024 * 1024 * (1 + ${#algos[@]})))
> +
> +# Checksum a piece in the middle of the file. This hits the unaligned case 
> that
> +# caused the original bug.
> +do_csum()
> +{
> + dd if="$1" bs=4K skip=555 count=100 2>>$seqres.full| md5sum | cut -d ' 
> ' -f 1
> +}
> +
> +# Create a large, uncompressed (but compressible) file.
> +touch "${SCRATCH_MNT}/uncompressed"
> +$BTRFS_UTIL_PROG property set "${SCRATCH_MNT}/uncompressed" compression ""
> +_ddt of="${SCRATCH_MNT}/uncompressed" bs=1M count=1K 2>&1 | _filter_dd
> +
> +csum="$(do_csum "${SCRATCH_MNT}/uncompressed")"
> +
> +for algo in ${algos[@]}; do
> + echo "Testing ${algo}" >> $seqres.full
> +
> + # Copy the same data, but with compression enabled.
> + touch "${SCRATCH_MNT}/${algo}"
> + $BTRFS_UTIL_PROG property set "${SCRATCH_MNT}/${algo}" compression 
> "${algo}"
> + dd if="${SCRATCH_MNT}/uncompressed" of="${SCRATCH_MNT}/${algo}" bs=1M 
> 2>&1 | _filter_dd
> +
> + # The correct data is likely still cached. Cycle the mount to drop the
> + # cache and start fresh.
> + _scratch_cycle_mount
> +
> + # Check the checksum.
> + compressed_csum="$(do_csum "${SCRATCH_MNT}/${algo}")"
> + if [ "${compressed_csum}" != "${csum}" ]; then
> + echo "Checksum mismatch for ${algo} (expected ${csum}, got 
> ${compressed_csum})"
> + fi
> +done
> +
> +echo "Silence is golden"
> +
> +status=0
> +exit
> diff --git 

Re: 4.9/4.10 Experiences

2017-02-16 Thread Adam Borowski
On Thu, Feb 16, 2017 at 01:37:53PM +0200, Imran Geriskovan wrote:
> What are your experiences for btrfs regarding 4.10 and 4.11 kernels?
> I'm still on 4.8.x. I'd be happy to hear from anyone using 4.1x for
> a very typical single disk setup. Are they reasonably stable/good
> enough for this case?

Somehow, not one disk has spontaneously burst into flame on 4.9 nor 4.10-rc
for me yet, so I guess so.

Also, 4.9 is a LTS that's going to be maintained for 5½ years so it is
supposed to be stable.  And there's no evidence it is not.  Experimental
features continue to be in a bad shape, but you don't appear to be going to
use them.


Meow!
-- 
Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
  ./configure --host=zx-spectrum --build=pdp11
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: qgroup: Move half of the qgroup accounting time out of commit trans

2017-02-16 Thread Nikolay Borisov
Just a minor nit.

On 15.02.2017 04:43, Qu Wenruo wrote:
> Just as Filipe pointed out, the most time consuming parts of qgroup are
> btrfs_qgroup_account_extents() and
> btrfs_qgroup_prepare_account_extents().
> Which both call btrfs_find_all_roots() to get old_roots and new_roots
> ulist.
> 
> What makes things worse is, we're calling that expensive
> btrfs_find_all_roots() at transaction committing time with
> TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction.
> 
> Such behavior is necessary for @new_roots search as current
> btrfs_find_all_roots() can't do it correctly so we do call it just
> before switch commit roots.
> 
> However for @old_roots search, it's not necessary as such search is
> based on commit_root, so it will always be correct and we can move it
> out of transaction committing.
> 
> This patch moves the @old_roots search part out of
> commit_transaction(), so in theory we can half the time qgroup time
> consumption at commit_transaction().
> 
> But please note that, this won't speedup qgroup overall, the total time
> consumption is still the same, just reduce the performance stall.
> 
> Cc: Filipe Manana 
> Signed-off-by: Qu Wenruo 
> ---
> v2:
>   Update commit message to make it more clear.
>   Don't call btrfs_find_all_roots() before insert qgroup extent record,
>   so we can avoid wasting CPU for already inserted qgroup extent record.
> 
> PS:
> If and only if we have fixed and proved btrfs_find_all_roots() can
> get correct result with current root, then we can move all
> expensive btrfs_find_all_roots() out of transaction committing.
> 
> However I strongly doubt if it's possible with current delayed ref,
> and without such prove, move btrfs_find_all_roots() for @new_roots
> out of transaction committing will just screw up qgroup accounting.
> 
> ---
>  fs/btrfs/delayed-ref.c | 20 
>  fs/btrfs/qgroup.c  | 33 +
>  fs/btrfs/qgroup.h  | 33 ++---
>  3 files changed, 75 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index ef724a5fc30e..0ee927ef5a71 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -550,13 +550,14 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>struct btrfs_delayed_ref_node *ref,
>struct btrfs_qgroup_extent_record *qrecord,
>u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
> -  int action, int is_data)
> +  int action, int is_data, int *qrecord_inserted_ret)
>  {
>   struct btrfs_delayed_ref_head *existing;
>   struct btrfs_delayed_ref_head *head_ref = NULL;
>   struct btrfs_delayed_ref_root *delayed_refs;
>   int count_mod = 1;
>   int must_insert_reserved = 0;
> + int qrecord_inserted = 0;

Why use an integer when you only require a boolean value? I doubt it
will make much difference at asm level if you switch to bool but at
least it will make it more apparent it's being used as a true|false
variable. The same comment applies to other functions where you've used
similar variable.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: qgroup: Move half of the qgroup accounting time out of commit trans

2017-02-16 Thread Filipe Manana
On Wed, Feb 15, 2017 at 2:43 AM, Qu Wenruo  wrote:
> Just as Filipe pointed out, the most time consuming parts of qgroup are
> btrfs_qgroup_account_extents() and
> btrfs_qgroup_prepare_account_extents().
> Which both call btrfs_find_all_roots() to get old_roots and new_roots
> ulist.
>
> What makes things worse is, we're calling that expensive
> btrfs_find_all_roots() at transaction committing time with
> TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction.
>
> Such behavior is necessary for @new_roots search as current
> btrfs_find_all_roots() can't do it correctly so we do call it just
> before switch commit roots.
>
> However for @old_roots search, it's not necessary as such search is
> based on commit_root, so it will always be correct and we can move it
> out of transaction committing.
>
> This patch moves the @old_roots search part out of
> commit_transaction(), so in theory we can half the time qgroup time
> consumption at commit_transaction().
>
> But please note that, this won't speedup qgroup overall, the total time
> consumption is still the same, just reduce the performance stall.
>
> Cc: Filipe Manana 
> Signed-off-by: Qu Wenruo 

Reviewed-by: Filipe Manana 

thanks

> ---
> v2:
>   Update commit message to make it more clear.
>   Don't call btrfs_find_all_roots() before insert qgroup extent record,
>   so we can avoid wasting CPU for already inserted qgroup extent record.
>
> PS:
> If and only if we have fixed and proved btrfs_find_all_roots() can
> get correct result with current root, then we can move all
> expensive btrfs_find_all_roots() out of transaction committing.
>
> However I strongly doubt if it's possible with current delayed ref,
> and without such prove, move btrfs_find_all_roots() for @new_roots
> out of transaction committing will just screw up qgroup accounting.
>
> ---
>  fs/btrfs/delayed-ref.c | 20 
>  fs/btrfs/qgroup.c  | 33 +
>  fs/btrfs/qgroup.h  | 33 ++---
>  3 files changed, 75 insertions(+), 11 deletions(-)
>
> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
> index ef724a5fc30e..0ee927ef5a71 100644
> --- a/fs/btrfs/delayed-ref.c
> +++ b/fs/btrfs/delayed-ref.c
> @@ -550,13 +550,14 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
>  struct btrfs_delayed_ref_node *ref,
>  struct btrfs_qgroup_extent_record *qrecord,
>  u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
> -int action, int is_data)
> +int action, int is_data, int *qrecord_inserted_ret)
>  {
> struct btrfs_delayed_ref_head *existing;
> struct btrfs_delayed_ref_head *head_ref = NULL;
> struct btrfs_delayed_ref_root *delayed_refs;
> int count_mod = 1;
> int must_insert_reserved = 0;
> +   int qrecord_inserted = 0;
>
> /* If reserved is provided, it must be a data extent. */
> BUG_ON(!is_data && reserved);
> @@ -623,6 +624,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
> if(btrfs_qgroup_trace_extent_nolock(fs_info,
> delayed_refs, qrecord))
> kfree(qrecord);
> +   else
> +   qrecord_inserted = 1;
> }
>
> spin_lock_init(_ref->lock);
> @@ -650,6 +653,8 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
> atomic_inc(_refs->num_entries);
> trans->delayed_ref_updates++;
> }
> +   if (qrecord_inserted_ret)
> +   *qrecord_inserted_ret = qrecord_inserted;
> return head_ref;
>  }
>
> @@ -779,6 +784,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
> *fs_info,
> struct btrfs_delayed_ref_head *head_ref;
> struct btrfs_delayed_ref_root *delayed_refs;
> struct btrfs_qgroup_extent_record *record = NULL;
> +   int qrecord_inserted;
>
> BUG_ON(extent_op && extent_op->is_data);
> ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
> @@ -806,12 +812,15 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info 
> *fs_info,
>  * the spin lock
>  */
> head_ref = add_delayed_ref_head(fs_info, trans, _ref->node, 
> record,
> -   bytenr, num_bytes, 0, 0, action, 0);
> +   bytenr, num_bytes, 0, 0, action, 0,
> +   _inserted);
>
> add_delayed_tree_ref(fs_info, trans, head_ref, >node, bytenr,
>  num_bytes, parent, ref_root, level, action);
> spin_unlock(_refs->lock);
>
> +   if (qrecord_inserted)
> +   return btrfs_qgroup_trace_extent_post(fs_info, record);
> return 0;
>
>  free_head_ref:
> @@ -836,6 +845,7 

Re: [PATCH v2] btrfs: qgroup: Move half of the qgroup accounting time out of commit trans

2017-02-16 Thread Qu Wenruo



At 02/16/2017 04:45 PM, Nikolay Borisov wrote:

Just a minor nit.

On 15.02.2017 04:43, Qu Wenruo wrote:

Just as Filipe pointed out, the most time consuming parts of qgroup are
btrfs_qgroup_account_extents() and
btrfs_qgroup_prepare_account_extents().
Which both call btrfs_find_all_roots() to get old_roots and new_roots
ulist.

What makes things worse is, we're calling that expensive
btrfs_find_all_roots() at transaction committing time with
TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction.

Such behavior is necessary for @new_roots search as current
btrfs_find_all_roots() can't do it correctly so we do call it just
before switch commit roots.

However for @old_roots search, it's not necessary as such search is
based on commit_root, so it will always be correct and we can move it
out of transaction committing.

This patch moves the @old_roots search part out of
commit_transaction(), so in theory we can half the time qgroup time
consumption at commit_transaction().

But please note that, this won't speedup qgroup overall, the total time
consumption is still the same, just reduce the performance stall.

Cc: Filipe Manana 
Signed-off-by: Qu Wenruo 
---
v2:
  Update commit message to make it more clear.
  Don't call btrfs_find_all_roots() before insert qgroup extent record,
  so we can avoid wasting CPU for already inserted qgroup extent record.

PS:
If and only if we have fixed and proved btrfs_find_all_roots() can
get correct result with current root, then we can move all
expensive btrfs_find_all_roots() out of transaction committing.

However I strongly doubt if it's possible with current delayed ref,
and without such prove, move btrfs_find_all_roots() for @new_roots
out of transaction committing will just screw up qgroup accounting.

---
 fs/btrfs/delayed-ref.c | 20 
 fs/btrfs/qgroup.c  | 33 +
 fs/btrfs/qgroup.h  | 33 ++---
 3 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ef724a5fc30e..0ee927ef5a71 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -550,13 +550,14 @@ add_delayed_ref_head(struct btrfs_fs_info *fs_info,
 struct btrfs_delayed_ref_node *ref,
 struct btrfs_qgroup_extent_record *qrecord,
 u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
-int action, int is_data)
+int action, int is_data, int *qrecord_inserted_ret)
 {
struct btrfs_delayed_ref_head *existing;
struct btrfs_delayed_ref_head *head_ref = NULL;
struct btrfs_delayed_ref_root *delayed_refs;
int count_mod = 1;
int must_insert_reserved = 0;
+   int qrecord_inserted = 0;


Why use an integer when you only require a boolean value? I doubt it
will make much difference at asm level if you switch to bool but at
least it will make it more apparent it's being used as a true|false
variable. The same comment applies to other functions where you've used
similar variable.


In fact I didn't see the point to use bool in most case.
And we are using int as bool almost everywhere, just like 
must_insert_reserved and is_data.


Or is there some coding style documentation prohibiting us to use int as 
bool?


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: qgroup: Move half of the qgroup accounting time out of commit trans

2017-02-16 Thread Qu Wenruo



At 02/16/2017 04:56 PM, Nikolay Borisov wrote:



On 16.02.2017 10:50, Qu Wenruo wrote:



At 02/16/2017 04:45 PM, Nikolay Borisov wrote:

Just a minor nit.

On 15.02.2017 04:43, Qu Wenruo wrote:

Just as Filipe pointed out, the most time consuming parts of qgroup are
btrfs_qgroup_account_extents() and
btrfs_qgroup_prepare_account_extents().
Which both call btrfs_find_all_roots() to get old_roots and new_roots
ulist.

What makes things worse is, we're calling that expensive
btrfs_find_all_roots() at transaction committing time with
TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction.

Such behavior is necessary for @new_roots search as current
btrfs_find_all_roots() can't do it correctly so we do call it just
before switch commit roots.

However for @old_roots search, it's not necessary as such search is
based on commit_root, so it will always be correct and we can move it
out of transaction committing.

This patch moves the @old_roots search part out of
commit_transaction(), so in theory we can half the time qgroup time
consumption at commit_transaction().

But please note that, this won't speedup qgroup overall, the total time
consumption is still the same, just reduce the performance stall.

Cc: Filipe Manana 
Signed-off-by: Qu Wenruo 
---
v2:
  Update commit message to make it more clear.
  Don't call btrfs_find_all_roots() before insert qgroup extent record,
  so we can avoid wasting CPU for already inserted qgroup extent record.

PS:
If and only if we have fixed and proved btrfs_find_all_roots() can
get correct result with current root, then we can move all
expensive btrfs_find_all_roots() out of transaction committing.

However I strongly doubt if it's possible with current delayed ref,
and without such prove, move btrfs_find_all_roots() for @new_roots
out of transaction committing will just screw up qgroup accounting.

---
 fs/btrfs/delayed-ref.c | 20 
 fs/btrfs/qgroup.c  | 33 +
 fs/btrfs/qgroup.h  | 33 ++---
 3 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ef724a5fc30e..0ee927ef5a71 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -550,13 +550,14 @@ add_delayed_ref_head(struct btrfs_fs_info
*fs_info,
  struct btrfs_delayed_ref_node *ref,
  struct btrfs_qgroup_extent_record *qrecord,
  u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
- int action, int is_data)
+ int action, int is_data, int *qrecord_inserted_ret)
 {
 struct btrfs_delayed_ref_head *existing;
 struct btrfs_delayed_ref_head *head_ref = NULL;
 struct btrfs_delayed_ref_root *delayed_refs;
 int count_mod = 1;
 int must_insert_reserved = 0;
+int qrecord_inserted = 0;


Why use an integer when you only require a boolean value? I doubt it
will make much difference at asm level if you switch to bool but at
least it will make it more apparent it's being used as a true|false
variable. The same comment applies to other functions where you've used
similar variable.


In fact I didn't see the point to use bool in most case.
And we are using int as bool almost everywhere, just like
must_insert_reserved and is_data.

Or is there some coding style documentation prohibiting us to use int as
bool?


There most certainly isn't. It's just more of a personal preference so
that's why I don't have a very strong opinion on this. In this case then
it's better to use int for the sake of consistency.


I'll keep it in mind if I'm going to create some new functions.

Although this makes sense for readability, in this case I prefer 
consistency more.


Thanks,
Qu





Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Opps.. Should be 4.9/4.10 Experiences

2017-02-16 Thread Imran Geriskovan
Opps.. I mean 4.9/4.10 Experiences

On 2/16/17, Imran Geriskovan  wrote:
> What are your experiences for btrfs regarding 4.10 and 4.11 kernels?
> I'm still on 4.8.x. I'd be happy to hear from anyone using 4.1x for
> a very typical single disk setup. Are they reasonably stable/good
> enough for this case?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: Handle btrfs_reloc_clone_csums error correctly to avoid deadlock

2017-02-16 Thread Filipe Manana
On Thu, Feb 16, 2017 at 12:39 AM, Qu Wenruo  wrote:
>
>
> At 02/15/2017 11:59 PM, Filipe Manana wrote:
>>
>> On Wed, Feb 15, 2017 at 8:49 AM, Qu Wenruo 
>> wrote:
>>>
>>> If run btrfs/125 with nospace_cache or space_cache=v2 mount option,
>>> btrfs will block with the following backtrace:
>>>
>>> Call Trace:
>>>  __schedule+0x2d4/0xae0
>>>  schedule+0x3d/0x90
>>>  btrfs_start_ordered_extent+0x160/0x200 [btrfs]
>>>  ? wake_atomic_t_function+0x60/0x60
>>>  btrfs_run_ordered_extent_work+0x25/0x40 [btrfs]
>>>  btrfs_scrubparity_helper+0x1c1/0x620 [btrfs]
>>>  btrfs_flush_delalloc_helper+0xe/0x10 [btrfs]
>>>  process_one_work+0x2af/0x720
>>>  ? process_one_work+0x22b/0x720
>>>  worker_thread+0x4b/0x4f0
>>>  kthread+0x10f/0x150
>>>  ? process_one_work+0x720/0x720
>>>  ? kthread_create_on_node+0x40/0x40
>>>  ret_from_fork+0x2e/0x40
>>>
>>> The direct cause is the error handler in run_delalloc_nocow() doesn't
>>> handle error from btrfs_reloc_clone_csums() well.
>>>
>>> The related part call path will be:
>>> __extent_writepage
>>> |- writepage_delalloc()
>>> |  |- run_delalloc_range()
>>> | |- run_delalloc_nocow()
>>> ||- btrfs_add_ordered_extent()
>>> ||  Now one ordered extent for file range, e.g [0, 1M) is
>>> inserted
>>> ||
>>> ||- btrfs_reloc_clone_csums()
>>> ||  Fails with -EIO, as RAID5/6 doesn't repair some csum tree
>>> ||  blocks
>>> ||
>>> ||- extent_clear_unlock_delalloc()
>>> |   Error routine, unlock and clear page DIRTY, end page
>>> writeback
>>> |   So the remaining 255 pages will not go through writeback
>>> |
>>> |- __extent_writepage_io()
>>>|- writepage_end_io_hook()
>>>   |- btrfs_dev_test_ordered_pending()
>>>  Reduce ordered_extent->bytes_left by 4K.
>>>  Still have (1M - 4K) to finish.
>>>
>>> While the remaining 255 pages will not go through IO nor trigger
>>> writepage_end_io_hook(), the ordered extent for [0, 1M) will
>>> never finish, and blocking current transaction forever.
>>>
>>> Although the root cause is still in RAID5/6, it won't hurt to fix the
>>> error routine first.
>>>
>>> This patch will cleanup the ordered extent in error routine, so at least
>>> we won't cause deadlock.
>>>
>>> Signed-off-by: Qu Wenruo 
>>> ---
>>>  fs/btrfs/extent_io.c|  1 -
>>>  fs/btrfs/inode.c| 10 --
>>>  fs/btrfs/ordered-data.c | 25 +
>>>  fs/btrfs/ordered-data.h | 10 ++
>>>  4 files changed, 43 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>>> index 4ac383a3a649..a14d1b0840c5 100644
>>> --- a/fs/btrfs/extent_io.c
>>> +++ b/fs/btrfs/extent_io.c
>>> @@ -3258,7 +3258,6 @@ static noinline_for_stack int
>>> writepage_delalloc(struct inode *inode,
>>>delalloc_end,
>>>_started,
>>>nr_written);
>>> -   /* File system has been set read-only */
>>> if (ret) {
>>> SetPageError(page);
>>> /* fill_delalloc should be return < 0 for error
>>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>>> index 1e861a063721..3c3ade58afd7 100644
>>> --- a/fs/btrfs/inode.c
>>> +++ b/fs/btrfs/inode.c
>>> @@ -1052,8 +1052,11 @@ static noinline int cow_file_range(struct inode
>>> *inode,
>>> BTRFS_DATA_RELOC_TREE_OBJECTID) {
>>> ret = btrfs_reloc_clone_csums(inode, start,
>>>   cur_alloc_size);
>>> -   if (ret)
>>> +   if (ret) {
>>> +   btrfs_clean_ordered_extent(inode, start,
>>> +  ram_size);
>>> goto out_drop_extent_cache;
>>> +   }
>>> }
>>>
>>> btrfs_dec_block_group_reservations(fs_info,
>>> ins.objectid);
>>> @@ -1538,7 +1541,7 @@ static noinline int run_delalloc_nocow(struct inode
>>> *inode,
>>> if (!ret)
>>> ret = err;
>>>
>>> -   if (ret && cur_offset < end)
>>> +   if (ret && cur_offset < end) {
>>> extent_clear_unlock_delalloc(inode, cur_offset, end, end,
>>>  locked_page, EXTENT_LOCKED |
>>>  EXTENT_DELALLOC |
>>> EXTENT_DEFRAG |
>>> @@ -1546,6 +1549,9 @@ static noinline int run_delalloc_nocow(struct inode
>>> *inode,
>>>  PAGE_CLEAR_DIRTY |
>>>  PAGE_SET_WRITEBACK |
>>>  PAGE_END_WRITEBACK);
>>> +   btrfs_clean_ordered_extent(inode, 

4.10/4.11 Experiences

2017-02-16 Thread Imran Geriskovan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs fixes and improvements for 4.11

2017-02-16 Thread fdmanana
From: Filipe Manana 

Hi Chris.

Please consider the following changes for the 4.11 merge window.
This time there is nothing particularly outstanding when compared to the
usual set of bug fixes. These are mostly fixes for send and the no-holes
feature introduced in 3.14. Test cases for fstests were sent for half of
these changes, with some already merged and two not yet merged.

Thanks.

The following changes since commit 6e78b3f7a193546b1c00a6d084596e774f147169:

  Btrfs: fix btrfs_decompress_buf2page() (2017-02-10 19:11:03 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git 
for-chris-4.11

for you to fetch changes up to 0a511ba79d7152b5b3b6b54573511fbdeb014abb:

  Btrfs: fix data loss after truncate when using the no-holes feature 
(2017-02-15 20:16:19 +)


Filipe Manana (7):
  Btrfs: incremental send, do not delay rename when parent inode is new
  Btrfs: bulk delete checksum items in the same leaf
  Btrfs: do not create explicit holes when replaying log tree if NO_HOLES 
enabled
  Btrfs: fix assertion failure when freeing block groups at close_ctree()
  Btrfs: fix use-after-free due to wrong order of destroying work queues
  Btrfs: incremental send, fix unnecessary hole writes for sparse files
  Btrfs: fix data loss after truncate when using the no-holes feature

Robbie Ko (3):
  Btrfs: send, fix failure to rename top level inode due to name collision
  Btrfs: incremental send, do not issue invalid rmdir operations
  Btrfs: fix leak of subvolume writers counter

 fs/btrfs/disk-io.c |  15 ++-
 fs/btrfs/extent-tree.c |   9 ++---
 fs/btrfs/file-item.c   |  28 +++-
 fs/btrfs/inode.c   |  29 ++---
 fs/btrfs/send.c| 125 
+++--
 fs/btrfs/tree-log.c|   5 +
 6 files changed, 181 insertions(+), 30 deletions(-)

-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 4.10/4.11 Experiences

2017-02-16 Thread Imran Geriskovan
What are your experiences for btrfs regarding 4.10 and 4.11 kernels?
I'm still on 4.8.x. I'd be happy to hear from anyone using 4.1x for
a very typical single disk setup. Are they reasonably stable/good
enough for this case?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: qgroup: Move half of the qgroup accounting time out of commit trans

2017-02-16 Thread Nikolay Borisov


On 16.02.2017 10:50, Qu Wenruo wrote:
> 
> 
> At 02/16/2017 04:45 PM, Nikolay Borisov wrote:
>> Just a minor nit.
>>
>> On 15.02.2017 04:43, Qu Wenruo wrote:
>>> Just as Filipe pointed out, the most time consuming parts of qgroup are
>>> btrfs_qgroup_account_extents() and
>>> btrfs_qgroup_prepare_account_extents().
>>> Which both call btrfs_find_all_roots() to get old_roots and new_roots
>>> ulist.
>>>
>>> What makes things worse is, we're calling that expensive
>>> btrfs_find_all_roots() at transaction committing time with
>>> TRANS_STATE_COMMIT_DOING, which will blocks all incoming transaction.
>>>
>>> Such behavior is necessary for @new_roots search as current
>>> btrfs_find_all_roots() can't do it correctly so we do call it just
>>> before switch commit roots.
>>>
>>> However for @old_roots search, it's not necessary as such search is
>>> based on commit_root, so it will always be correct and we can move it
>>> out of transaction committing.
>>>
>>> This patch moves the @old_roots search part out of
>>> commit_transaction(), so in theory we can half the time qgroup time
>>> consumption at commit_transaction().
>>>
>>> But please note that, this won't speedup qgroup overall, the total time
>>> consumption is still the same, just reduce the performance stall.
>>>
>>> Cc: Filipe Manana 
>>> Signed-off-by: Qu Wenruo 
>>> ---
>>> v2:
>>>   Update commit message to make it more clear.
>>>   Don't call btrfs_find_all_roots() before insert qgroup extent record,
>>>   so we can avoid wasting CPU for already inserted qgroup extent record.
>>>
>>> PS:
>>> If and only if we have fixed and proved btrfs_find_all_roots() can
>>> get correct result with current root, then we can move all
>>> expensive btrfs_find_all_roots() out of transaction committing.
>>>
>>> However I strongly doubt if it's possible with current delayed ref,
>>> and without such prove, move btrfs_find_all_roots() for @new_roots
>>> out of transaction committing will just screw up qgroup accounting.
>>>
>>> ---
>>>  fs/btrfs/delayed-ref.c | 20 
>>>  fs/btrfs/qgroup.c  | 33 +
>>>  fs/btrfs/qgroup.h  | 33 ++---
>>>  3 files changed, 75 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
>>> index ef724a5fc30e..0ee927ef5a71 100644
>>> --- a/fs/btrfs/delayed-ref.c
>>> +++ b/fs/btrfs/delayed-ref.c
>>> @@ -550,13 +550,14 @@ add_delayed_ref_head(struct btrfs_fs_info
>>> *fs_info,
>>>   struct btrfs_delayed_ref_node *ref,
>>>   struct btrfs_qgroup_extent_record *qrecord,
>>>   u64 bytenr, u64 num_bytes, u64 ref_root, u64 reserved,
>>> - int action, int is_data)
>>> + int action, int is_data, int *qrecord_inserted_ret)
>>>  {
>>>  struct btrfs_delayed_ref_head *existing;
>>>  struct btrfs_delayed_ref_head *head_ref = NULL;
>>>  struct btrfs_delayed_ref_root *delayed_refs;
>>>  int count_mod = 1;
>>>  int must_insert_reserved = 0;
>>> +int qrecord_inserted = 0;
>>
>> Why use an integer when you only require a boolean value? I doubt it
>> will make much difference at asm level if you switch to bool but at
>> least it will make it more apparent it's being used as a true|false
>> variable. The same comment applies to other functions where you've used
>> similar variable.
> 
> In fact I didn't see the point to use bool in most case.
> And we are using int as bool almost everywhere, just like
> must_insert_reserved and is_data.
> 
> Or is there some coding style documentation prohibiting us to use int as
> bool?

There most certainly isn't. It's just more of a personal preference so
that's why I don't have a very strong opinion on this. In this case then
it's better to use int for the sake of consistency.

> 
> Thanks,
> Qu
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: RAID5:Inject data stripe corruption and verify scrub fixes it.

2017-02-16 Thread Lakshmipathi.G
On Thu, Feb 16, 2017 at 09:12:31AM +0800, Qu Wenruo wrote:
> 
> 
> At 02/16/2017 04:56 AM, Lakshmipathi.G wrote:
> >On Wed, Feb 15, 2017 at 05:29:33PM +0800, Qu Wenruo wrote:
> >>
> >>
> >>At 02/15/2017 05:03 PM, Lakshmipathi.G wrote:
> >>>Signed-off-by: Lakshmipathi.G 
> >>>---
> >>>.../020-raid5-datastripe-corruption/test.sh| 224 
> >>>+
> >>>1 file changed, 224 insertions(+)
> >>>create mode 100755 tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>
> >>>diff --git a/tests/misc-tests/020-raid5-datastripe-corruption/test.sh 
> >>>b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>new file mode 100755
> >>>index 000..d04c430
> >>>--- /dev/null
> >>>+++ b/tests/misc-tests/020-raid5-datastripe-corruption/test.sh
> >>>@@ -0,0 +1,224 @@
> >>>+#!/bin/bash
> >>>+#
> >>>+# Raid5: Inject data stripe corruption and fix them using scrub.
> >>>+#
> >>>+# Script will perform the following:
> >>>+# 1) Create Raid5 using 3 loopback devices.
> >>>+# 2) Ensure file layout is created in a predictable manner.
> >>>+#Each data stripe(64KB) should uniquely start with 'DN',
> >>>+#where N represents the data stripe number.(ex:D0,D1 etc)
> >>
> >>If you want really predictable layout, you could just upload compressed
> >>images for this purpose.
> >>
> >>Which makes things super easy, and unlike fstests, btrfs-progs self-test
> >>accepts such images.
> >>
> >>>+# 3) Once file is created with specific layout, gather data stripe details
> >>>+#like devicename, position and actual on-disk data.
> >>>+# 4) Now use 'dd' to verify the data-stripe against its expected value
> >>>+#and inject corruption by zero'ing out contents.
> >>>+# 5) After injecting corruption, running online-scrub is expected to fix
> >>>+#the corrupted data stripe with the help of parity block and
> >>>+#corresponding data stripe.
> >>
> >>You should also verify parity stripe is not corrupted.
> >>It's already known that RAID5/6 will corrupted parity while recovering data
> >>stripe.
> >>
> >>Kernel patch for this, with detailed bug info.
> >>https://patchwork.kernel.org/patch/9553581/
> >>
> >>>+# 6) Finally, validate the data stripe has original un-corrupted value.
> >>>+#
> >>>+#  Note: This script doesn't handle parity block corruption.
> >>
> >>Normally such test case should belong to xfstests (renamed to fstests
> >>recently) as we're verifying kernel behavior, not btrfs-progs behavior.
> >>
> >>But since fstests test case should be as generic as possible, and we don't
> >>have a good enough tool to corrupt given data/parity stripe, my previously
> >>submitted test case is rejected.
> >>
> >>Personally speaking, this seems to be a dilemma for me.
> >>
> >>We really need a test case for this, bugs has been spotted that RAID5/6
> >>scrub will corrupt P/Q while recovering data stripe.
> >>But we need to enhance btrfs-corrupt-block to a better shape to make fstests
> >>to accept it, and it won't take a short time.
> >>
> >>So I really have no idea what should we do for such test.
> >>
> >>Thanks,
> >>Qu
> >
> >Will check compressed images for parity strpe testing. I assume at the 
> >moment,
> >we currently support single static compressed image. Adding more than one 
> >static
> >compressed images like disk1.img disk2.img disk3.img for RAID is supported in
> >existing test framework?
> 
> Not yet, but since you can use test.sh instead of running check_image() from
> test frameset, it's never a big problem.
> 
ok, will check it out.
> >
> >Using compressed images for checking parity seems little easier than 
> >computing
> >via scripting.
> >
> >Looked into patch description:
> >
> >After scrubbing dev3 only:
> >0xcdcd (Good)  |  0xcdcd  | 0xcdcd (Bad)
> >(D1)  (D2)(P)
> >
> >So the Parity stripe (P) always get replaced by exact content of D1/D2 
> >(data-stripe)
> >or by random  data?
> 
> Neither. it's just XOR result of D2(never changed, 0xcdcd) and old D1(wrong,
> 0x).
> 0xcdcd XOR 0x = 0xcdcd
> 
> So you got 0xcdcd, bad result.
> 
> If you corrupt D1 with random data, then parity will be random then.
> 
> >If it always  get replaced by exact value from either
> >D1 or D2.  I think current script can be modified to detect that bug. If 
> >parity gets
> >replaced by random value, then it will the make task more difficult.
> 
> Not hard to detect.
> As the content is completely under your control, you know the correct parity
> value, and you can verify it very easy.
> 
The script corrupts data-stripe (D1 or D2) in the random manner. So lets assume 
wrong 
parity will be in random format.

I tried for one-liners in computing XOR of two strings. 
str1 = "D0x"
str2 = "D1x"

failed to figure it out. I think parity will be 
""0001"
for above case. For higher-numbered  data-stripe (D15), parity will be 
slighly 
different like "1000"


Re: How to dump/find parity of RAID-5 file?

2017-02-16 Thread Lakshmipathi.G
On Wed, Feb 15, 2017 at 07:24:55PM +0100, Goffredo Baroncelli wrote:
> The chunk-tree maps the logical address [145096704...145096704+134217728) 
> [size=128MB] to the physical ones 
>   devid3 : [6368..6368+67108864) [size=64MB]
>   devid1 : [6368..6368+67108864) [size=64MB]
>   devid2 : [83034112..83034112+67108864) [size=64MB]
> 
> So because the logical address is divided in pieces of 64k, interleaved by 
> the parity, we know that:
> * first 128kb
> logical address [145096704  ..145096704+64k)   -> devid1, [6368
> ..6368+64k)
> logical address [145096704+64k  ..145096704+2x64k) -> devid2, [83034112
> ..83034112+64k)
> parity:-> devid3, [6368
> ..6368+64k)
> * second 128kb
> logical address [145096704+2x64k..145096704+3x64k) -> devid2, 
> [83034112+64k..83034112+2x64k)
> logical address [145096704+3x64k..145096704+4x64k) -> devid3, 
> [6368+64k..6368+2x64k)
> parity:-> devid1, 
> [6368+64k..6368+2x64k)
> And so on...
> 
> (NB: 145096704+2x64k == 145227776)
> 
> The fs-tree, maps the file content [0..131072) [size=128k] to the logical 
> address [145227776..145227776+131072) [size=128k]
> 
> So the file content is stored starting from the disk devid2, at 
> 83034112+64k=83099648 (first 64k). The second 64k is placed in disk devid3 at 
> 6368+64k=63176704; the parity is stored at disk1, 6368+64k = 63176704
> 
> 
> BR
> G.Baroncelli
> 

Thanks for the detailed example with exact numbers. Now understood the address 
mapping better. With this as a reference, I think
it should be possible to access parity/data-stripes more sensible manner 
instead using expensive "cat /device/ | hexdump | grep"
combination. thanks.

Cheers.
Lakshmipathi.G
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Device ID issue on btrfs

2017-02-16 Thread Goffredo Baroncelli
On 2017-02-16 14:00, Ilan Schwarts wrote:
> Hi,
> Found the solution at: https://patchwork.kernel.org/patch/2825842/

The patches below provided by suse seem more general


http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/vfs-add-super_operations-get_inode_dev

http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/btrfs-provide-super_operations-get_inode_dev

Unfortunately is quite invasive

> 
> On Thu, Feb 16, 2017 at 12:05 PM, Ilan Schwarts  > wrote:
> 
> Hi,
> 
> I hope its ok im reaching to you via mail..
> I read your message at: 
> _https://marc.info/?l=linux-btrfs=148701370108477=2 
> _
> 
> I have the same issue in my kernel driver, Did you find an answer ?
> 
> Thanks
> -
> Ilan Schwarts
> 
> 
> 
> 
> -- 
> 
> 
> -
> Ilan Schwarts


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


man filesystems(5) doesn't contain Btrfs

2017-02-16 Thread Chris Murphy
Hi,

This man page contains a list for pretty much every other file system,
with a oneliner description: ext4, XFS is in there, and even NTFS, but
not Btrfs.

Also, /etc/filesystems doesn't contain Btrfs. Anyone know if either,
or both, ought to contain an entry for Btrfs?

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Way to force allocation of more metadata?

2017-02-16 Thread Austin S. Hemmelgarn

On 2017-02-16 15:13, E V wrote:

It would be nice if there was an easy way to tell btrfs to allocate
another metadata chunk. For example, the below fs is full due to
exhausted metadata:

Device size:1013.28GiB
Device allocated:   1013.28GiB
Device unallocated:2.00MiB
Device missing:  0.00B
Used:981.94GiB
Free (estimated): 15.16GiB  (min: 15.16GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  510.31MiB  (used: 0.00B)

 Data  Metadata  System
Id Path  RAID1 RAID1 RAID1Unallocated
-- - - -  ---
 1 /dev/sdv1 505.63GiB   1.00GiB  8.00MiB 1.00MiB
 2 /dev/sdw1 505.63GiB   1.00GiB  8.00MiB 1.00MiB
-- - - -  ---
   Total 505.63GiB   1.00GiB  8.00MiB 2.00MiB
   Used  490.47GiB 510.88MiB 96.00KiB

I can delete a multi GB file and get several GB of unallocated space,
however if I try and copy big files to it again the same exact thing
happens. However, if I play with balance and deleting files and such
and manage to get it to allocate another metadata chunk while there is
unallocated space then the filesystem will happily fill up all of the
data chunks. Failing an automatic allocation out of global reserve, or
saving metadata as soon as unallocated space is available it would be
nice if I could just delete a file and then tell btrfs to allocate
more metadata immediately. Makes sense? No idea how easy this would be
to do, but seems like it should be a simple thing btrfs file could do.
The potentially tricky bit about this is that BTRFS (at least, with 
recent kernels) will deallocate completely empty chunks.  The horribly 
ironic part of this is that that behavior got added to help avoid 
situations like this.  There is technically logic to allocate extra 
metadata chunks for every few data chunks that get allocated (and I 
think the reverse too), but because of the auto-deallocation behavior, 
this ends up just spinning and wasting cycles (and bandwidth).  I would 
personally love to see the following happen in regard to this all:
1. Add a switch to disable the auto-deallocation (ideally a mount 
option).  This bit _should_ be pretty easy, but I'm not certain.
2. Disable the auto-allocation of chunks of type X every N chunks of 
type Y that get allocated when the auto-deallocation is enabled.  This 
one should in theory be pretty easy too.
3. Add a tool (not sure where exactly makes the most sense) to force 
allocation of a specific chunk type.  This would also be insanely useful 
for debugging, but is probably the hardest part (would likely need a new 
ioctl).
4. Possibly add an option to reserve some percentage of the space to 
only be used for System and Metadata chunks.  This would help prevent 
this kind of thing from happening.  I'm not entirely sold on this right 
now, but it _seems_ like a generically good idea.


Sadly, I don't currently have the time to work on any of that myself, 
and there are other things that are higher priorities, so I don't know 
when (if ever) this may actually happen.  I might have some time to look 
into the first two things I listed some time soon, but I don't know how 
soon that might be.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: man filesystems(5) doesn't contain Btrfs

2017-02-16 Thread Austin S. Hemmelgarn

On 2017-02-16 15:36, Chris Murphy wrote:

Hi,

This man page contains a list for pretty much every other file system,
with a oneliner description: ext4, XFS is in there, and even NTFS, but
not Btrfs.

Also, /etc/filesystems doesn't contain Btrfs. Anyone know if either,
or both, ought to contain an entry for Btrfs?

The man-page absolutely should.  Ideally, that should be kept in sync 
with with the mount manpages, but I think they're in separate projects 
(mount is util-linux IIRC, while filesystems(5) is part of the Linux 
man-pages project).


As far as /etc/filesystems, that should be irrelevant.  Of the big 
distros, only SUSE, Fedora/CentOS/OEL, and Gentoo use it (I checked 
Arch, Gentoo, openSUSE, Fedora, Debian, and Ubuntu, which should cover 
at least 80% of users since this is low enough level derivatives won't 
be likely to change it), and all it does is dictate what order 'mount -t 
auto' will try filesystem types in.  All the distros that have it have a 
'*' on the last line, which means to check /proc/filesystems, and 
therefore will automatically try BTRFS if there's a kernel module for it 
(unless the module is blacklisted or prevented from being loaded on boot 
automatically).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Way to force allocation of more metadata?

2017-02-16 Thread E V
It would be nice if there was an easy way to tell btrfs to allocate
another metadata chunk. For example, the below fs is full due to
exhausted metadata:

Device size:1013.28GiB
Device allocated:   1013.28GiB
Device unallocated:2.00MiB
Device missing:  0.00B
Used:981.94GiB
Free (estimated): 15.16GiB  (min: 15.16GiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  510.31MiB  (used: 0.00B)

 Data  Metadata  System
Id Path  RAID1 RAID1 RAID1Unallocated
-- - - -  ---
 1 /dev/sdv1 505.63GiB   1.00GiB  8.00MiB 1.00MiB
 2 /dev/sdw1 505.63GiB   1.00GiB  8.00MiB 1.00MiB
-- - - -  ---
   Total 505.63GiB   1.00GiB  8.00MiB 2.00MiB
   Used  490.47GiB 510.88MiB 96.00KiB

I can delete a multi GB file and get several GB of unallocated space,
however if I try and copy big files to it again the same exact thing
happens. However, if I play with balance and deleting files and such
and manage to get it to allocate another metadata chunk while there is
unallocated space then the filesystem will happily fill up all of the
data chunks. Failing an automatic allocation out of global reserve, or
saving metadata as soon as unallocated space is available it would be
nice if I could just delete a file and then tell btrfs to allocate
more metadata immediately. Makes sense? No idea how easy this would be
to do, but seems like it should be a simple thing btrfs file could do.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] btrfs: cleanup patches

2017-02-16 Thread David Sterba
On Wed, Feb 15, 2017 at 04:28:26PM -0500, je...@suse.com wrote:
> From: Jeff Mahoney 
> 
> Hi all -
> 
> Here's another around of cleanup patches.  The first 7 cleanup API
> blemishes with unused arguments and/or root -> fs_info conversion.  The
> last converts the pr_debug in btrfs_abort_transaction to use btrfs_debug.
> 
> -Jeff
> 
> Jeff Mahoney (8):
>   btrfs: pass fs_info to (more) routines that are only called with
> extent_root
>   btrfs: flush_space always takes fs_info->fs_root
>   btrfs: convert btrfs_inc_block_group_ro to accept fs_info
>   btrfs: free-space-cache, clean up unnecessary root arguments
>   btrfs: btrfs_wait_cache_io_root doesn't use path argument
>   btrfs: btrfs_truncate_free_space_cache always allocates path
>   btrfs: drop unused extent_op arg from btrfs_add_delayed_data_ref
>   btrfs: use btrfs_debug instead of pr_debug in transaction abort

Thanks, added to 4.11 queue, with some minor fixups.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Device ID issue on btrfs

2017-02-16 Thread Jeff Mahoney
On 2/16/17 1:23 PM, Goffredo Baroncelli wrote:
> On 2017-02-16 14:00, Ilan Schwarts wrote:
>> Hi,
>> Found the solution at: https://patchwork.kernel.org/patch/2825842/
> 
> The patches below provided by suse seem more general
> 
>   
> http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/vfs-add-super_operations-get_inode_dev
>   
> http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/btrfs-provide-super_operations-get_inode_dev
> 
> Unfortunately is quite invasive

They're also incomplete.  The openSUSE kernel also needs:
http://kernel.suse.com/cgit/kernel-source/tree/patches.suse/btrfs-fs-super.c-add-new-super-block-devices-super_block_d.patch?h=SLE12-SP2
http://kernel.suse.com/cgit/kernel-source/tree/patches.suse/btrfs-btrfs-use-the-new-VFS-super_block_dev.patch?h=SLE12-SP2

... for ustat() to work properly.  I've just forward ported those to the
master kernel and am testing now.

The /proc/mountinfo thing isn't fixed with these patches either since
subvolumes report different device numbers and we don't represent
subvolumes as mounts.  That's what the patches that I had started
writing up a while ago do, but it's still very early for those.

-Jeff

>>
>> On Thu, Feb 16, 2017 at 12:05 PM, Ilan Schwarts > > wrote:
>>
>> Hi,
>>
>> I hope its ok im reaching to you via mail..
>> I read your message at: 
>> _https://marc.info/?l=linux-btrfs=148701370108477=2 
>> _
>>
>> I have the same issue in my kernel driver, Did you find an answer ?
>>
>> Thanks
>> -
>> Ilan Schwarts
>>
>>
>>
>>
>> -- 
>>
>>
>> -
>> Ilan Schwarts
> 
> 


-- 
Jeff Mahoney
SUSE Labs



signature.asc
Description: OpenPGP digital signature


[PATCH v2 2/2] btrfs: test decompression in the middle of large extents

2017-02-16 Thread Omar Sandoval
From: Omar Sandoval 

This is a regression test for "Btrfs: fix btrfs_decompress_buf2page()".
It fails for zlib on v4.10-rc[1-7].

Signed-off-by: Omar Sandoval 
---
Changed from v1:

- Remove from quick group
- Handle compress features in sysfs generically
- Require btrfs property command

 common/btrfs|  8 +
 tests/btrfs/137 | 97 +
 tests/btrfs/137.out |  2 ++
 tests/btrfs/group   |  1 +
 4 files changed, 108 insertions(+)
 create mode 100755 tests/btrfs/137
 create mode 100644 tests/btrfs/137.out

diff --git a/common/btrfs b/common/btrfs
index 96c3635b..b342644c 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -364,3 +364,11 @@ _reload_btrfs_ko()
modprobe -r btrfs || _fail "btrfs unload failed"
modprobe btrfs || _fail "btrfs load failed"
 }
+
+_btrfs_compression_algos()
+{
+   echo zlib
+   for feature in /sys/fs/btrfs/features/compress_*; do
+   echo "${feature#/sys/fs/btrfs/features/compress_}"
+   done
+}
diff --git a/tests/btrfs/137 b/tests/btrfs/137
new file mode 100755
index ..d5e4774b
--- /dev/null
+++ b/tests/btrfs/137
@@ -0,0 +1,97 @@
+#! /bin/bash
+# FS QA Test 137
+#
+# Test decompression in the middle of large extents. Regression test for Linux
+# kernel commit 6e78b3f7a193 ("Btrfs: fix btrfs_decompress_buf2page()").
+#
+#---
+# Copyright (c) 2017 Facebook.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_btrfs_command property
+
+algos=($(_btrfs_compression_algos))
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount
+# Need 1GB for the uncompressed file plus <1GB for each compressed file.
+_require_fs_space $SCRATCH_MNT $((1024 * 1024 * (1 + ${#algos[@]})))
+
+# Checksum a piece in the middle of the file. This hits the unaligned case that
+# caused the original bug.
+do_csum()
+{
+   dd if="$1" bs=4K skip=555 count=100 2>>$seqres.full| md5sum | cut -d ' 
' -f 1
+}
+
+# Create a large, uncompressed (but compressible) file.
+touch "${SCRATCH_MNT}/uncompressed"
+$BTRFS_UTIL_PROG property set "${SCRATCH_MNT}/uncompressed" compression ""
+_ddt of="${SCRATCH_MNT}/uncompressed" bs=1M count=1K 2>&1 | _filter_dd
+
+csum="$(do_csum "${SCRATCH_MNT}/uncompressed")"
+
+for algo in ${algos[@]}; do
+   echo "Testing ${algo}" >> $seqres.full
+
+   # Copy the same data, but with compression enabled.
+   touch "${SCRATCH_MNT}/${algo}"
+   $BTRFS_UTIL_PROG property set "${SCRATCH_MNT}/${algo}" compression 
"${algo}"
+   dd if="${SCRATCH_MNT}/uncompressed" of="${SCRATCH_MNT}/${algo}" bs=1M 
2>&1 | _filter_dd
+
+   # The correct data is likely still cached. Cycle the mount to drop the
+   # cache and start fresh.
+   _scratch_cycle_mount
+
+   # Check the checksum.
+   compressed_csum="$(do_csum "${SCRATCH_MNT}/${algo}")"
+   if [ "${compressed_csum}" != "${csum}" ]; then
+   echo "Checksum mismatch for ${algo} (expected ${csum}, got 
${compressed_csum})"
+   fi
+done
+
+echo "Silence is golden"
+
+status=0
+exit
diff --git a/tests/btrfs/137.out b/tests/btrfs/137.out
new file mode 100644
index ..eb48dd21
--- /dev/null
+++ b/tests/btrfs/137.out
@@ -0,0 +1,2 @@
+QA output created by 137
+Silence is golden
diff --git a/tests/btrfs/group b/tests/btrfs/group
index ea88ba4e..a8d1851d 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -139,3 +139,4 @@
 134 auto quick send
 135 auto quick send
 136 auto convert
+137 auto compress
-- 
2.11.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] common/rc: remove unnecessary cat in _ddt

2017-02-16 Thread Omar Sandoval
From: Omar Sandoval 

Signed-off-by: Omar Sandoval 
---
Identical to v1

 common/rc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/common/rc b/common/rc
index 76515e2b..6304d690 100644
--- a/common/rc
+++ b/common/rc
@@ -2417,7 +2417,7 @@ _die()
 # convert urandom incompressible data to compressible text data
 _ddt()
 {
-   cat /dev/urandom | od | dd iflag=fullblock ${*}
+   od /dev/urandom | dd iflag=fullblock ${*}
 }
 
 #takes files, randomdata
-- 
2.11.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html