Re: Wiki suggestions

2015-07-14 Thread David Sterba
On Mon, Jul 13, 2015 at 07:33:13PM +0200, Marc Joliet wrote:
 Am Mon, 13 Jul 2015 19:21:54 +0200
 schrieb Marc Joliet mar...@gmx.de:
 
  OK, I'll make the changes then (sans kernel log).
 
 Just a heads up: I accepted the terms of service, but the link goes to a
 non-existent wiki page.

I have reported that to kernel.org admins some time ago (02/2014), but the
ticket is still open. I'll ping it.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Recreate Snapshots and their Parent Relationships on a second Server.

2015-07-14 Thread Robert Krig
Hi.

I have an Old Server with a bunch of btrfs Snapshots.
I'm setting up a new server and I would like to transfer those Snapshots
as efficiently as possible, while still maintaining their parent-child
relationships for space efficient storage.

Apart from manually using btrfs send and btrfs send -p where
applicable, is there an easy way to transfer everything in one go?

Can I identify Snapshot relationships via their ID or some other data
that I can display using btrfs tools?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BTRFS raid6 unmountable after a couple of days of usage.

2015-07-14 Thread Austin S Hemmelgarn
So, after experiencing this same issue multiple times (on almost a dozen 
different kernel versions since 4.0) and ruling out the possibility of it being 
caused by my hardware (or at least, the RAM, SATA controller and disk drives 
themselves), I've decided to report it here.

The general symptom is that raid6 profile filesystems that I have are working 
fine for multiple weeks, until I either reboot or otherwise try to remount 
them, at which point the system refuses to mount them.

I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been 
seeing this with versions of both since 4.0.

Output of 'btrfs fi show' for the most recent fs that I had this issue with:
Label: 'altroot'  uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5
Total devices 4 FS bytes used 9.70GiB
devid1 size 24.00GiB used 6.03GiB path /dev/mapper/vg-altroot.0
devid2 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.1
devid3 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.2
devid4 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.3

btrfs-progs v4.1

Each of the individual LVS that are in the FS is just a flat chunk of space on 
a separate disk from the others.

The FS itself passes btrfs check just fine (no reported errors, exit value of 
0), but the kernel refuses to mount it with the message 'open_ctree failed'.

I've run btrfs chunk recover and attached the output from that.

Here's a link to an image from 'btrfs image -c9 -w': 
https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0
(That link will expire in 30 days, let me know if you need access to it beyond 
that).

The filesystems in question all see relatively light but consistent usage as 
targets for receiving daily incremental snapshots for on-system backups (and 
because I know someone will mention it, yes, I do have other backups of the 
data, these are just my online backups).
All Devices:
Device: id = 4, name = /dev/mapper/vg-altroot.3
Device: id = 3, name = /dev/mapper/vg-altroot.2
Device: id = 2, name = /dev/mapper/vg-altroot.1
Device: id = 1, name = /dev/vg/altroot.0

DEVICE SCAN RESULT:
Filesystem Information:
sectorsize: 4096
leafsize: 16384
tree root generation: 26
chunk root generation: 11

All Devices:
Device: id = 4, name = /dev/mapper/vg-altroot.3
Device: id = 3, name = /dev/mapper/vg-altroot.2
Device: id = 2, name = /dev/mapper/vg-altroot.1
Device: id = 1, name = /dev/vg/altroot.0

All Block Groups:
Block Group: start = 0, len = 4194304, flag = 2
Block Group: start = 4194304, len = 8388608, flag = 4
Block Group: start = 12582912, len = 8388608, flag = 1
Block Group: start = 20971520, len = 16777216, flag = 102
Block Group: start = 37748736, len = 2147483648, flag = 104
Block Group: start = 2185232384, len = 2147483648, flag = 101
Block Group: start = 4332716032, len = 2147483648, flag = 101
Block Group: start = 6480199680, len = 2147483648, flag = 101
Block Group: start = 8627683328, len = 2147483648, flag = 101
Block Group: start = 10775166976, len = 2147483648, flag = 101

All Chunks:
Chunk: start = 0, len = 4194304, type = 2, num_stripes = 1
Stripes list:
[ 0] Stripe: devid = 1, offset = 0
Chunk: start = 4194304, len = 8388608, type = 4, num_stripes = 1
Stripes list:
[ 0] Stripe: devid = 1, offset = 4194304
Chunk: start = 12582912, len = 8388608, type = 1, num_stripes = 1
Stripes list:
[ 0] Stripe: devid = 1, offset = 12582912
Chunk: start = 20971520, len = 16777216, type = 102, num_stripes = 4
Stripes list:
[ 0] Stripe: devid = 4, offset = 1048576
[ 1] Stripe: devid = 3, offset = 1048576
[ 2] Stripe: devid = 2, offset = 1048576
[ 3] Stripe: devid = 1, offset = 20971520
Chunk: start = 37748736, len = 2147483648, type = 104, num_stripes = 4
Stripes list:
[ 0] Stripe: devid = 4, offset = 9437184
[ 1] Stripe: devid = 3, offset = 9437184
[ 2] Stripe: devid = 2, offset = 9437184
[ 3] Stripe: devid = 1, offset = 29360128
Chunk: start = 2185232384, len = 2147483648, type = 101, num_stripes =
4
Stripes list:
[ 0] Stripe: devid = 4, offset = 1083179008
[ 1] Stripe: devid = 3, offset = 1083179008
[ 2] Stripe: devid = 2, offset = 1083179008
[ 3] Stripe: devid = 1, offset = 1103101952
Chunk: start = 4332716032, len = 2147483648, type = 101, num_stripes =
4
Stripes list:
[ 0] Stripe: devid = 2, offset = 2156920832
[ 1] Stripe: devid = 3, offset = 2156920832
[ 2] Stripe: devid = 4, offset = 2156920832
[ 3] Stripe: devid = 1, 

Btrfs filesystem-fail observations and hints

2015-07-14 Thread M G Berberich
Hello,

at the weekend we had a disk-fail in a 5-disk BtrFS-RAID1
setup. Ideally one failing disk in a RAID1 setup should (at least
temporarily) degrade the filesystem and inform root about the
situation, but should let the rest of the system unaffected. That’s
not what happend. Processes accessing the filesystem hung
device-waiting and the filesystem itself “hung” too, producing lots of

  BTRFS: lost page write due to I/O error on /dev/sdd
  BTRFS: bdev /dev/sdd errs: wr …, rd …, flush 0, corrupt 0, gen 0

messages. Attempts to reboot the system regularly failed. Only after
physically removing the failed (hotplugable) disk from the system, it
was possible to reboot the system somewhat normal.

Afterwards, trying to get the system running again, the following
observation where made:

· “btrfs device delete missing”

  There seems to be no straight-forward way to monitor the progress of
  the “rebalancing” of the filesystem. It took about 6 hours and while
  it was possible to estimate the time of finish by watching “btrfs fi
  show” and extrapolating device-usagem, a method to monitor the
  progess like “btrfs balance status” would be fine. (“btrfs balance
  status” says “No balance found on …”)

· “btrfs fi df”

  During “btrfs device delete missing”-rebalance “btrfs fi df” does
  not reflect the current state of the filesystem. It says p.e.

Data, RAID1: total=1.46TiB, used=1.46TiB
Data, single: total=8.00MiB, used=0.00B

  while actually, depending of the advance of the rebalance, about 0
  to 300 GByte have only one copy on the devices. So p.e.

Data, RAID1: total=1.1TiB, used=1.1TiB
Data, single: total=290GiB, used=290GiB

  would be better reflecting the state of the system.

MfG
bmg

-- 
„Des is völlig wurscht, was heut beschlos- | M G Berberich
 sen wird: I bin sowieso dagegn!“  | berbe...@fmi.uni-passau.de
(SPD-Stadtrat Kurt Schindler; Regensburg)  | www.fmi.uni-passau.de/~berberic
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs progs release 4.1.2 (urgent fix, do not use 4.1.1)

2015-07-14 Thread David Sterba
Hi,

due to a buggy bugfix to mkfs, filesystems created with version 4.1.1 are not
entirely correct.

To check if the filesystem is affected run 'btrfs check' and look for

Chunk[256, 228, 0]: length(4194304), offset(0), type(2) mismatch with block 
group[0, 192, 4194304]: offset(4194304), objectid(0), flags(34)
Chunk[256, 228, 4194304]: length(8388608), offset(4194304), type(4) mismatch 
with block group[4194304, 192, 8388608]: offset(8388608), objectid(4194304), 
flags(36)

at the beginning of the output. Such filesystems must not be used and
must be recreated. Read-only mount should be safe to retrieve the data,
but at the moment this hasn't been verified.

Thanks to Qu Wenruo for identifying the problem, I'm sorry for the trouble.

Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert btrfs-progs: mkfs: create only desired block groups for single device

2015-07-14 Thread David Sterba
On Tue, Jul 14, 2015 at 10:13:01AM +0800, Qu Wenruo wrote:
 This reverts commit 5f8232e5c8f0b0de0ef426274911385b0e877392.

Thanks.  The revert is justified for the severity of the problem, I'll
release 4.1.2 asap.

 This commit causes a regression:
 ---

BTW, do not use --- in the changelog as 'git am' will ignore the text
between that and the diff.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-07-14 Thread Patrik Lundquist
On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:

 Regardless of whether 1 or huge -t means maximum defrag, however, the
 nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
 should be considered ideally defragged at 31 extents.  This is a
 departure from ext4, which AFAIK in theory has no extent upper limit, so
 should be able to do that 30 GiB file in a single extent.

 But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
 still indicates at least some remaining fragmentation.

So I converted the VMware VMDK file to a VirtualBox VDI file:

-rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
-rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi

$ filefrag Windows7.vdi
Windows7.vdi: 15 extents found

$ btrfs filesystem defragment -t 3g Windows7.vdi
$ filefrag Windows7.vdi
Windows7.vdi: 24 extents found

How can it be less than 28 extents with a chunk size of 1 GiB?

E2fsprogs version 1.42.12
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS raid6 unmountable after a couple of days of usage.

2015-07-14 Thread Austin S Hemmelgarn

On 2015-07-14 07:49, Austin S Hemmelgarn wrote:

So, after experiencing this same issue multiple times (on almost a dozen 
different kernel versions since 4.0) and ruling out the possibility of it being 
caused by my hardware (or at least, the RAM, SATA controller and disk drives 
themselves), I've decided to report it here.

The general symptom is that raid6 profile filesystems that I have are working 
fine for multiple weeks, until I either reboot or otherwise try to remount 
them, at which point the system refuses to mount them.

I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been 
seeing this with versions of both since 4.0.

Output of 'btrfs fi show' for the most recent fs that I had this issue with:
 Label: 'altroot'  uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5
Total devices 4 FS bytes used 9.70GiB
devid1 size 24.00GiB used 6.03GiB path /dev/mapper/vg-altroot.0
devid2 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.1
devid3 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.2
devid4 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.3

 btrfs-progs v4.1

Each of the individual LVS that are in the FS is just a flat chunk of space on 
a separate disk from the others.

The FS itself passes btrfs check just fine (no reported errors, exit value of 
0), but the kernel refuses to mount it with the message 'open_ctree failed'.

I've run btrfs chunk recover and attached the output from that.

Here's a link to an image from 'btrfs image -c9 -w': 
https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0
(That link will expire in 30 days, let me know if you need access to it beyond 
that).

The filesystems in question all see relatively light but consistent usage as 
targets for receiving daily incremental snapshots for on-system backups (and 
because I know someone will mention it, yes, I do have other backups of the 
data, these are just my online backups).

Further updates, I just tried mounting the filesystem from the image 
above again, this time passing device= options for each device in the 
FS, and it seems to be working fine now.  I've tried this with the other 
filesystems however, and they still won't mount.




smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH] fstests: regression test for the btrfs clone ioctl

2015-07-14 Thread fdmanana
From: Filipe Manana fdman...@suse.com

This tests that we can not clone an inline extent into a non-zero file
offset. Inline extents at non-zero offsets is something btrfs is not
prepared for and results in all sorts of corruption and crashes on
future IO operations, such as the following BUG_ON() triggered by the
last write operation done by this test:

  [152154.035903] [ cut here ]
  [152154.036424] kernel BUG at mm/page-writeback.c:2286!
  [152154.036424] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
  (...)
  [152154.036424] RIP: 0010:[8111a9d5]  [8111a9d5] 
clear_page_dirty_for_io+0x1e/0x90
  (...)
  [152154.036424] Call Trace:
  [152154.036424]  [a04e97c1] 
lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
  [152154.036424]  [a04ea82c] __btrfs_buffered_write+0x245/0x4c8 
[btrfs]
  [152154.036424]  [a04ed14b] ? btrfs_file_write_iter+0x150/0x3e0 
[btrfs]
  [152154.036424]  [a04ed15a] ? btrfs_file_write_iter+0x15f/0x3e0 
[btrfs]
  [152154.036424]  [a04ed2c7] btrfs_file_write_iter+0x2cc/0x3e0 
[btrfs]
  [152154.036424]  [81165a4a] __vfs_write+0x7c/0xa5
  [152154.036424]  [81165f89] vfs_write+0xa0/0xe4
  [152154.036424]  [81166855] SyS_pwrite64+0x64/0x82
  [152154.036424]  [81465197] system_call_fastpath+0x12/0x6f
  (...)
  [152154.242621] ---[ end trace e3d3376b23a57041 ]---

This issue is addressed by the following linux kernel patch for btrfs:
Btrfs: fix file corruption after cloning inline extents.

Signed-off-by: Filipe Manana fdman...@suse.com
---
 tests/btrfs/096 | 80 +
 tests/btrfs/096.out | 12 
 tests/btrfs/group   |  1 +
 3 files changed, 93 insertions(+)
 create mode 100755 tests/btrfs/096
 create mode 100644 tests/btrfs/096.out

diff --git a/tests/btrfs/096 b/tests/btrfs/096
new file mode 100755
index 000..f5b3a7f
--- /dev/null
+++ b/tests/btrfs/096
@@ -0,0 +1,80 @@
+#! /bin/bash
+# FSQA Test No. 096
+#
+# Test that we can not clone an inline extent into a non-zero file offset.
+#
+#---
+#
+# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana fdman...@suse.com
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap _cleanup; exit \$status 0 1 2 3 15
+
+_cleanup()
+{
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_cloner
+
+rm -f $seqres.full
+
+_scratch_mkfs $seqres.full 21
+_scratch_mount
+
+# Create our test files. File foo has the same 2K of data at offset 4K as file
+# bar has at its offset 0.
+$XFS_IO_PROG -f -s -c pwrite -S 0xaa 0 4K \
+   -c pwrite -S 0xbb 4k 2K \
+   -c pwrite -S 0xcc 8K 4K \
+   $SCRATCH_MNT/foo | _filter_xfs_io
+
+# File bar consists of a single inline extent (2K size).
+$XFS_IO_PROG -f -s -c pwrite -S 0xbb 0 2K \
+   $SCRATCH_MNT/bar | _filter_xfs_io
+
+# Now call the clone ioctl to clone the extent of file bar into file foo at its
+# offset 4K. This made file foo have an inline extent at offset 4K, something
+# which the btrfs code can not deal with in future IO operations because all
+# inline extents are supposed to start at an offset of 0, resulting in all 
sorts
+# of chaos.
+# So here we validate that the clone ioctl returns an EOPNOTSUPP, which is what
+# it returns for other cases dealing with inlined extents.
+$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
+   $SCRATCH_MNT/bar $SCRATCH_MNT/foo
+
+# Because of the inline extent at offset 4K, the following write made the 
kernel
+# crash with a BUG_ON().
+$XFS_IO_PROG -c pwrite -S 0xdd 6K 2K $SCRATCH_MNT/foo | _filter_xfs_io
+
+status=0
+exit
diff --git a/tests/btrfs/096.out b/tests/btrfs/096.out
new file mode 100644
index 000..235198d
--- /dev/null
+++ b/tests/btrfs/096.out
@@ -0,0 +1,12 @@
+QA output created by 096
+wrote 4096/4096 bytes at offset 0
+XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec 

[PATCH] Btrfs: fix file corruption after cloning inline extents

2015-07-14 Thread fdmanana
From: Filipe Manana fdman...@suse.com

Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.

For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:

  _scratch_mkfs $seqres.full 21
  _scratch_mount

  # Create our test files. File foo has the same 2K of data at offset 4K
  # as file bar has at its offset 0.
  $XFS_IO_PROG -f -s -c pwrite -S 0xaa 0 4K \
  -c pwrite -S 0xbb 4k 2K \
  -c pwrite -S 0xcc 8K 4K \
  $SCRATCH_MNT/foo | _filter_xfs_io

  # File bar consists of a single inline extent (2K size).
  $XFS_IO_PROG -f -s -c pwrite -S 0xbb 0 2K \
 $SCRATCH_MNT/bar | _filter_xfs_io

  # Now call the clone ioctl to clone the extent of file bar into file
  # foo at its offset 4K. This made file foo have an inline extent at
  # offset 4K, something which the btrfs code can not deal with in future
  # IO operations because all inline extents are supposed to start at an
  # offset of 0, resulting in all sorts of chaos.
  # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
  # what it returns for other cases dealing with inlined extents.
  $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
  $SCRATCH_MNT/bar $SCRATCH_MNT/foo

  # Because of the inline extent at offset 4K, the following write made
  # the kernel crash with a BUG_ON().
  $XFS_IO_PROG -c pwrite -S 0xdd 6K 2K $SCRATCH_MNT/foo | _filter_xfs_io

  status=0
  exit

The stack trace of the BUG_ON() triggered by the last write is:

  [152154.035903] [ cut here ]
  [152154.036424] kernel BUG at mm/page-writeback.c:2286!
  [152154.036424] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
  [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor 
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
loop fuse parport_pc acpi_cpu$
  [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: GW   
4.1.0-rc6-btrfs-next-11+ #2
  [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
  [152154.036424] task: 880429f70990 ti: 880429efc000 task.ti: 
880429efc000
  [152154.036424] RIP: 0010:[8111a9d5]  [8111a9d5] 
clear_page_dirty_for_io+0x1e/0x90
  [152154.036424] RSP: 0018:880429effc68  EFLAGS: 00010246
  [152154.036424] RAX: 02000806 RBX: ea0006a6d8f0 RCX: 
0001
  [152154.036424] RDX:  RSI: 81155d1b RDI: 
ea0006a6d8f0
  [152154.036424] RBP: 880429effc78 R08: 8801ce389fe0 R09: 
0001
  [152154.036424] R10: 2000 R11:  R12: 
8800200dce68
  [152154.036424] R13:  R14: 8800200dcc88 R15: 
8803d5736d80
  [152154.036424] FS:  7fbf119f6700() GS:88043d28() 
knlGS:
  [152154.036424] CS:  0010 DS:  ES:  CR0: 80050033
  [152154.036424] CR2: 01bdc000 CR3: 0003aa555000 CR4: 
06e0
  [152154.036424] Stack:
  [152154.036424]  8803d5736d80 0001 880429effcd8 
a04e97c1
  [152154.036424]  880429effd68 880429effd60 0001 
8800200dc9c8
  [152154.036424]  0001 8800200dcc88  
1000
  [152154.036424] Call Trace:
  [152154.036424]  [a04e97c1] 
lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
  [152154.036424]  [a04ea82c] __btrfs_buffered_write+0x245/0x4c8 
[btrfs]
  [152154.036424]  [a04ed14b] ? btrfs_file_write_iter+0x150/0x3e0 
[btrfs]
  [152154.036424]  [a04ed15a] ? btrfs_file_write_iter+0x15f/0x3e0 
[btrfs]
  [152154.036424]  [a04ed2c7] btrfs_file_write_iter+0x2cc/0x3e0 
[btrfs]
  [152154.036424]  [81165a4a] __vfs_write+0x7c/0xa5
  [152154.036424]  [81165f89] vfs_write+0xa0/0xe4
  [152154.036424]  [81166855] SyS_pwrite64+0x64/0x82
  [152154.036424]  [81465197] system_call_fastpath+0x12/0x6f
  [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 
55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 0f 
0b 4d 85 e4 74 59 49 8b 3c 2$
  [152154.036424] RIP  [8111a9d5] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424]  RSP 880429effc68
  [152154.242621] ---[ end trace e3d3376b23a57041 ]---

Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by 

Re: [PATCH] Documentation: update btrfs-replace manual to support RAID5/6

2015-07-14 Thread David Sterba
On Tue, Jul 07, 2015 at 10:12:16AM +0800, Wang Yanfeng wrote:
 Man manual need to be updated since RAID5/6 has been supported
 by btrfs-replace.
 
 Signed-off-by: Wang Yanfeng wangyf-f...@cn.fujitsu.com

Applied, thanks. Please do not forget to add 'btrfs-progs' into the subject
otherwise, I might miss progs patches while skimming the mailinglist and
delay merging unnecessarily.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Disk failed while doing scrub

2015-07-14 Thread Duncan
Dāvis Mosāns posted on Tue, 14 Jul 2015 04:54:27 +0300 as excerpted:

 2015-07-13 11:12 GMT+03:00 Duncan 1i5t5.dun...@cox.net:
 You say five disk, but nowhere in your post do you mention what raid
 mode you were using, neither do you post btrfs filesystem show and
 btrfs filesystem df, as suggested on the wiki and which list that
 information.
 
 Sorry, I forgot. I'm running Arch Linux 4.0.7, with btrfs-progs v4.1
 Using RAID1 for metadata and single for data, with features
 big_metadata, extended_iref, mixed_backref, no_holes, skinny_metadata
 and mounted with noatime,compress=zlib,space_cache,autodefrag

Thanks.  FWIW, pretty similar here, but running gentoo, now with btrfs-
progs v4.1.1 and the mainline 4.2-rc1+ kernel.

BTW, note that space_cache has been the default for quite some time, 
now.  I've never actually manually mounted with space_cache on any of my 
filesystems over several years, now, yet they all report it when I check 
/proc/mounts, etc.  So if you're adding that manually, you can kill that 
option and save the commandline/fstab space. =:^)

 Label: 'Data'  uuid: 1ec5b839-acc6-4f70-be9d-6f9e6118c71c
Total devices 5 FS bytes used 7.16TiB
devid1 size 2.73TiB used 2.35TiB path /dev/sdc
devid2 size 1.82TiB used 1.44TiB path /dev/sdd
devid3 size 1.82TiB used 1.44TiB path /dev/sde
devid4 size 1.82TiB used 1.44TiB path /dev/sdg
devid5 size 931.51GiB used 539.01GiB path /dev/sdh
 
 Data, single: total=7.15TiB, used=7.15TiB
 System, RAID1: total=8.00MiB, used=784.00KiB
 System, single: total=4.00MiB, used=0.00B
 Metadata, RAID1: total=16.00GiB, used=14.37GiB
 Metadata, single: total=8.00MiB, used=0.00B
 GlobalReserve, single: total=512.00MiB, used=0.00B

And note that you can easily and quickly remove those empty single-mode 
system and metadata chunks, which are an artifact of the way mkfs.btrfs 
works, using balance filters.

btrfs balance start -mprofile=single

... should do it.  They're actually working on mkfs.btrfs patches to fix 
it not to do that, right now.  There's active patch and testing threads 
discussing it.  Hopefully for btrfs-progs v4.2.  (4.1.1 has the patches 
for single-device and prep work for multi-device, according to the 
changelog.)

 Because filesystem still mounts, I assume I should do btrfs device
 delete /dev/sdd /mntpoint and then restore damaged files from backup.

 You can try a replace, but with a failing drive still connected, people
 report mixed results.  It's likely to fail as it can't read certain
 blocks to transfer them to the new device.
 
 As I understand, device delete will copy data from that disk and
 distribute across rest of disks, while btrfs replace will copy to new
 disk which must be atleast size of disk I'm replacing.

Sorry.  You wrote delete, I read replace.  How'd I do that? =:^(

You are absolutely correct.  Delete would be better here.

I guess I had just been reading a thread discussing the problems I 
mentioned with replace, and saw what I expected to see, not what you 
actually wrote.

 There's no such partial-file with null-fill tools shipped just yet.

 From journal I have only 14 files mentioned where errors occurred. Now
 13 files from them don't throw any errors and their SHA's match to my
 backups so they're fine.

Good.  I was going on the assumption that the questionable device was in 
much worse shape than that.

 And actually btrfs does allow to copy/read that one damaged file, only I
 get I/O error when trying to read data from those broken sectors

Good, and good to know.  Thanks. =:^)

 best and correct way to recover a file is using ddrescue

I was just going to mention ddrescue. =:^)

 $ du -m /tmp/damaged_file 6251/tmp/damaged_file
 
 so basically only like 8K bytes are unrecoverable from this file.
 Probably there could be created some tool which could get even more data
 knowing about btrfs.
 
 There /is/, however, a command that can be used to either regenerate or
 zero-out the checksum tree.  See btrfs check --init-csum-tree.

 Seems, you can't specify a path/file for it and it's quite destructive
 action if you want to get data only about some one specific file.

Yes.  It's whole-filesystem-all-or-nothing, unfortunately. =:^(

 I did scrub second time and this time there aren't that many
 uncorrectable errors and also there's no csum_errors so --init-csum-tree
 is useless here I think.

Agreed.

 Most likely previously scrub got that many errors because it still
 continued for a bit even if disk didn't respond.

Yes.

 scrub status [...]
read_errors: 2
csum_errors: 0
verify_errors: 0
no_csum: 89600
csum_discards: 656214
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 2
unverified_errors: 0
corrected_errors: 0
last_physical: 2590041112576

OK, that matches up with 8 KiB bad, since blocks are 4 KiB and there's 
two uncorrectable errors.  With the scrub 

'btrfs subvolume list /' executed as non-root produces a not-so-nice error message

2015-07-14 Thread Johannes Ernst
$ btrfs subvolume list /
ERROR: can't perform the search - Operation not permitted
ERROR: can't get rootid for ‘/'

I don't know what a 'rootid' is as a user, and i don't really want to ponder 
whether I need to find out.

What about a simple
ERROR: Permission denied.
instead?

Cheers,



Johannes.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-07-14 Thread Patrik Lundquist
On 14 July 2015 at 20:41, Hugo Mills h...@carfax.org.uk wrote:
 On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
 On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:
 
  Regardless of whether 1 or huge -t means maximum defrag, however, the
  nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
  should be considered ideally defragged at 31 extents.  This is a
  departure from ext4, which AFAIK in theory has no extent upper limit, so
  should be able to do that 30 GiB file in a single extent.
 
  But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
  still indicates at least some remaining fragmentation.

 So I converted the VMware VMDK file to a VirtualBox VDI file:

 -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
 -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi

 $ filefrag Windows7.vdi
 Windows7.vdi: 15 extents found

 $ btrfs filesystem defragment -t 3g Windows7.vdi
 $ filefrag Windows7.vdi
 Windows7.vdi: 24 extents found

 How can it be less than 28 extents with a chunk size of 1 GiB?

I _think_ the fragment size will be limited by the block group
 size. This is not the same as the chunk size for some RAID levels --
 for example, RAID-0, a block group can be anything from 2 to n chunks
 (across the same number of devices), where each chunk is 1 GiB, so
 potentially you could have arbitrary-sized block groups. The same
 would apply to RAID-10, -5 and -6.

(Note, I haven't verified this, but it makes sense based on what I
 know of the internal data structures).

It's a raid1 filesystem, so the block group ought to be the same size
as the chunk, right?

A 2GiB block group would suffice to explain it though.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-07-14 Thread Hugo Mills
On Tue, Jul 14, 2015 at 09:09:00PM +0200, Patrik Lundquist wrote:
 On 14 July 2015 at 20:41, Hugo Mills h...@carfax.org.uk wrote:
  On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
  On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:
  
   Regardless of whether 1 or huge -t means maximum defrag, however, the
   nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
   should be considered ideally defragged at 31 extents.  This is a
   departure from ext4, which AFAIK in theory has no extent upper limit, so
   should be able to do that 30 GiB file in a single extent.
  
   But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
   still indicates at least some remaining fragmentation.
 
  So I converted the VMware VMDK file to a VirtualBox VDI file:
 
  -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
  -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
 
  $ filefrag Windows7.vdi
  Windows7.vdi: 15 extents found
 
  $ btrfs filesystem defragment -t 3g Windows7.vdi
  $ filefrag Windows7.vdi
  Windows7.vdi: 24 extents found
 
  How can it be less than 28 extents with a chunk size of 1 GiB?
 
 I _think_ the fragment size will be limited by the block group
  size. This is not the same as the chunk size for some RAID levels --
  for example, RAID-0, a block group can be anything from 2 to n chunks
  (across the same number of devices), where each chunk is 1 GiB, so
  potentially you could have arbitrary-sized block groups. The same
  would apply to RAID-10, -5 and -6.
 
 (Note, I haven't verified this, but it makes sense based on what I
  know of the internal data structures).
 
 It's a raid1 filesystem, so the block group ought to be the same size
 as the chunk, right?

   Yes.

 A 2GiB block group would suffice to explain it though.

   Not with RAID-1 -- I'd expect the block group size to be 1 GiB.

   Hugo.

-- 
Hugo Mills | There isn't a noun that can't be verbed.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Anyone tried out btrbk yet?

2015-07-14 Thread Marc MERLIN
On Wed, Jul 15, 2015 at 10:03:16AM +1000, Paul Harvey wrote:
 The way it works in snazzer (and btrbk and I think also btrfs-sxbackup
 as well), local snapshots continue to happen as normal (Eg. daily or
 hourly) and so when your backup media or backup server is finally
 available again, the size of each individual incremental is still the
 same as usual, it just has to perform more of them.
 
Good point. My system is not as smart. Every night, it'll make a new
backup and only send one incremental and hope it gets there. It doesn't
make a bunch of incrementals and send multiple.

The other options do a better job here.

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS raid6 unmountable after a couple of days of usage.

2015-07-14 Thread Chris Murphy
On Tue, Jul 14, 2015 at 7:25 AM, Austin S Hemmelgarn
ahferro...@gmail.com wrote:
 On 2015-07-14 07:49, Austin S Hemmelgarn wrote:

 So, after experiencing this same issue multiple times (on almost a dozen
 different kernel versions since 4.0) and ruling out the possibility of it
 being caused by my hardware (or at least, the RAM, SATA controller and disk
 drives themselves), I've decided to report it here.

 The general symptom is that raid6 profile filesystems that I have are
 working fine for multiple weeks, until I either reboot or otherwise try to
 remount them, at which point the system refuses to mount them.

 I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been
 seeing this with versions of both since 4.0.

 Output of 'btrfs fi show' for the most recent fs that I had this issue
 with:
  Label: 'altroot'  uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5
 Total devices 4 FS bytes used 9.70GiB
 devid1 size 24.00GiB used 6.03GiB path
 /dev/mapper/vg-altroot.0
 devid2 size 24.00GiB used 6.01GiB path
 /dev/mapper/vg-altroot.1
 devid3 size 24.00GiB used 6.01GiB path
 /dev/mapper/vg-altroot.2
 devid4 size 24.00GiB used 6.01GiB path
 /dev/mapper/vg-altroot.3

  btrfs-progs v4.1

 Each of the individual LVS that are in the FS is just a flat chunk of
 space on a separate disk from the others.

 The FS itself passes btrfs check just fine (no reported errors, exit value
 of 0), but the kernel refuses to mount it with the message 'open_ctree
 failed'.

 I've run btrfs chunk recover and attached the output from that.

 Here's a link to an image from 'btrfs image -c9 -w':
 https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0
 (That link will expire in 30 days, let me know if you need access to it
 beyond that).

 The filesystems in question all see relatively light but consistent usage
 as targets for receiving daily incremental snapshots for on-system backups
 (and because I know someone will mention it, yes, I do have other backups of
 the data, these are just my online backups).

 Further updates, I just tried mounting the filesystem from the image above
 again, this time passing device= options for each device in the FS, and it
 seems to be working fine now.  I've tried this with the other filesystems
 however, and they still won't mount.


And it's the same message with the usual suspects: recovery,
ro,recovery ? How about degraded even though it's not degraded? And
what about 'btrfs rescue zero-log' ?

Of course it's weird that btrfs check doesn't complain, but mount
does. I don't understand that, so it's good you've got an image. If
either recovery or zero-log fix the problem, my understanding is this
suggests hardware did something Btrfs didn't expect.

What about 'btrfs check --check-data-csum which should act similar to
a read-only scrub (different output though)? Hmm, nah. The thing is
the failure to mount is failing on some aspect of metadata, not data.
So the fact that check (on metadata) passes but mount fails is a bug
somewhere...

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anyone tried out btrbk yet?

2015-07-14 Thread Paul Harvey
The way it works in snazzer (and btrbk and I think also btrfs-sxbackup
as well), local snapshots continue to happen as normal (Eg. daily or
hourly) and so when your backup media or backup server is finally
available again, the size of each individual incremental is still the
same as usual, it just has to perform more of them.

Separating snapshotting from transport lends to more flexibility IMHO,
Eg. with snazzer I can keep multiple physical backup media in sync
with each other even if I only rotate/attach those disks once a
week/month (maintain backup filesystems in parallel), the
snazzer-receive script is very dumb - it just receives all the missing
snapshots from the source. However it does filter them cf. btrfs
subvolume list /subvolume | snazzer-prune-candidates --invert first
in case some would just be deleted again shortly after according to
retention policy.

For the ssh transport, you can do the same things but in series: push
the snapshots up to a local server and then on to remote storage
elsewhere (maintain backup filesystems in series).

Because the snapshotting, transport and pruning operations are
asynchronous the logic for all this is relatively simple.

It's thanks to seeing send/receive struggles such as yours on this
list (which has also happened to me, but only very rarely: it seems I
tend to have reliable connectivity), among other issues, that I wrote
snazzer-measure. It ends up appending reproducible sha512sum and pgp
signatures to a measurements file for each snapshot, measurements
happen more than just once so they're timestamped with hostname - the
hope is I should spot any corruption that happens after the first
measurements are taken.

This is also a separate/async operation (it's the most I/O and CPU
intense operation of all).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120

2015-07-14 Thread Chris Murphy
On Thu, Jul 9, 2015 at 10:45 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:


 Chris Murphy wrote on 2015/07/09 18:45 -0600:

 On Thu, Jul 9, 2015 at 6:34 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 One of my patch addressed a problem that a converted btrfs can't pass
 btrfsck.

 Not sure if that is the cause, but if you can try btrfs-progs v3.19.1,
 the
 one without my btrfs-progs patches and some other newer convert related
 patches, and see the result?

 I think this would at least provide the base for bisect the btrfs-progs
 if
 the bug is in btrfs-progs.


 I'm happy to regression test with 3.19.1 but I'm confused. After
 conversion, btrfs check (4.1) finds no problems. After ext2_saved
 snapshot is deleted, btrfsck finds no problems. After defrag, again
 btrfsck finds no problems. After the failed balance, btrfsck finds no
 problems but crashes with Aborted (core dump).

 Even btrfsck reports no error, some btrfs-convert behavior change may lead
 to kernel mis-function.

 But we are not sure it's btrfs-progs or kernel itself has bug.
 Maybe btrfs convert did something wrong/different triggering the bug, or
 just kernel regression?

 So hat I'd like to check is, with 3.19.1 progs (kernel version doesn't
 change), whether the kernel still failes to do balance.

 If the problem still happens, then  we can focus on kernel part, or at
 least, put at least less effort on btrfs-progs.


 Should I still test 3.19.1?

I'm not able to reproduce this for reasons I don't understand. The
setup is in a qemu-kvm VM, with the ext4 original as a qcow2. I had
been using 'qemu-img create -f qcow2 -o nocow=on -b original.qcow2
converted qcow2' and then doing the conversions with the
converted.qcow2 file in the VM. I did this half a dozen times and
always at the balance step it imploded (but differently for 4.1 and
4.2). Now the balance completes with no errors. btrfs check doesn't
complaint either. Very irritating as nothing else has changed with
that VM.

There is another user who had this similar problem with the converted
ext4 going read-only. They ran btrfs check with both 3.19.1 and 4.0.
Their results are here, hopefully it's helpful until I can figure out
how to get this reproducing again.
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg44701.html



-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recreate Snapshots and their Parent Relationships on a second Server.

2015-07-14 Thread Paul Harvey
btrfs subvolume list -uq /some/subvol can help figure out the
existing parent relationships, but in practice if your snapshots are
simply a linear series over time then I doubt you'll gain much by
parsing all those UUIDs over simply doing an initial btrfs
send/receive without any parent followed by send/receive operations
using the previous snapshot as the parent.

I'm assuming of course that your snapshots are easier to sort by
chronological order than parsing the UUIDs out of the btrfs subvol
list -up output.

In my experience I actually end up using slightly less disk space with
the new parent/child relationships on the new filesystem, I assume
because the original source filesystem had missing parents that no
longer exist as they've long since been pruned, but it might also just
be that there's less fragmentation and smaller metadata consumption to
hold the new relationships.

On 14 July 2015 at 20:14, Robert Krig robert.k...@render-wahnsinn.de wrote:
 Hi.

 I have an Old Server with a bunch of btrfs Snapshots.
 I'm setting up a new server and I would like to transfer those Snapshots
 as efficiently as possible, while still maintaining their parent-child
 relationships for space efficient storage.

 Apart from manually using btrfs send and btrfs send -p where
 applicable, is there an easy way to transfer everything in one go?

 Can I identify Snapshot relationships via their ID or some other data
 that I can display using btrfs tools?
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()

2015-07-14 Thread Zhaolei
From: Zhao Lei zhao...@cn.fujitsu.com

Liu Bo bo.li@oracle.com reported a lockdep warning of
delayed_iput_sem in xfstests generic/241:
  [ 2061.345955] =
  [ 2061.346027] [ INFO: possible recursive locking detected ]
  [ 2061.346027] 4.1.0+ #268 Tainted: GW
  [ 2061.346027] -
  [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock:
  [ 2061.346027]  (fs_info-delayed_iput_sem){..}, at:
  [814063ab] btrfs_run_delayed_iputs+0x6b/0x100
  [ 2061.346027] but task is already holding lock:
  [ 2061.346027]  (fs_info-delayed_iput_sem){..}, at: 
[814063ab] btrfs_run_delayed_iputs+0x6b/0x100
  [ 2061.346027] other info that might help us debug this:
  [ 2061.346027]  Possible unsafe locking scenario:

  [ 2061.346027]CPU0
  [ 2061.346027]
  [ 2061.346027]   lock(fs_info-delayed_iput_sem);
  [ 2061.346027]   lock(fs_info-delayed_iput_sem);
  [ 2061.346027]
   *** DEADLOCK ***
It is rarely happened, about 1/400 in my test env.

The reason is recursion of btrfs_run_delayed_iputs():
  cleaner_kthread
  - btrfs_run_delayed_iputs() *1
  - get delayed_iput_sem lock *2
  - iput()
  - ...
  - btrfs_commit_transaction()
  - btrfs_run_delayed_iputs() *1
  - get delayed_iput_sem lock (dead lock) *2
  *1: recursion of btrfs_run_delayed_iputs()
  *2: warning of lockdep about delayed_iput_sem

When fs is in high stress, new iputs may added into fs_info-delayed_iputs
list when btrfs_run_delayed_iputs() is running, which cause
second btrfs_run_delayed_iputs() run into down_read(fs_info-delayed_iput_sem)
again, and cause above lockdep warning.

Actually, it will not cause real problem because both locks are read lock,
but to avoid lockdep warning, we can do a fix.

Fix:
  Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for
  cleaner_kthread thread to break above recursion path.
  cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code,
  and don't need to call btrfs_run_delayed_iputs() again in
  btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow.

Test:
  No above lockdep warning after patch in 1200 generic/241 tests.

Reported-by: Liu Bo bo.li@oracle.com
Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
---
 fs/btrfs/transaction.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c0f18e7..31248ad 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2152,7 +2152,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle 
*trans,
 
kmem_cache_free(btrfs_trans_handle_cachep, trans);
 
-   if (current != root-fs_info-transaction_kthread)
+   if (current != root-fs_info-transaction_kthread 
+   current != root-fs_info-cleaner_kthread)
btrfs_run_delayed_iputs(root);
 
return ret;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert btrfs-progs: mkfs: create only desired block groups for single device

2015-07-14 Thread Qu Wenruo



David Sterba wrote on 2015/07/14 13:45 +0200:

On Tue, Jul 14, 2015 at 10:13:01AM +0800, Qu Wenruo wrote:

This reverts commit 5f8232e5c8f0b0de0ef426274911385b0e877392.


Thanks.  The revert is justified for the severity of the problem, I'll
release 4.1.2 asap.


This commit causes a regression:
---


BTW, do not use --- in the changelog as 'git am' will ignore the text
between that and the diff.


Oh, sorry I forgot that --- will be ignored by git.
Next time I'll use === to avoid such careless problem.

BTW, for the mkfs test case, it will be delayed for a while as the 
following bugs are making things quite tricky.


1) fsck ignore chunk errors and return 0.
Cause is known and easy to fix, but if fixed, most of fsck test won't pass.
As the following bug is causing problem.

2) btrfs-image restore bug, causing missing dev_extent for DUP chunk.
Investigating. that's the reason causing a lot of dev extent missing in 
mkfs test.

And thanks to the previous bug, we can pass fsck test like a miracle.

So I'm afraid the corresponding regression test case won't be in time 
with 4.1.2 hotfix release.


Thanks,
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: counting fragments takes more time than defragmenting

2015-07-14 Thread Hugo Mills
On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote:
 On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:
 
  Regardless of whether 1 or huge -t means maximum defrag, however, the
  nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
  should be considered ideally defragged at 31 extents.  This is a
  departure from ext4, which AFAIK in theory has no extent upper limit, so
  should be able to do that 30 GiB file in a single extent.
 
  But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents
  still indicates at least some remaining fragmentation.
 
 So I converted the VMware VMDK file to a VirtualBox VDI file:
 
 -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
 -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
 
 $ filefrag Windows7.vdi
 Windows7.vdi: 15 extents found
 
 $ btrfs filesystem defragment -t 3g Windows7.vdi
 $ filefrag Windows7.vdi
 Windows7.vdi: 24 extents found
 
 How can it be less than 28 extents with a chunk size of 1 GiB?

   I _think_ the fragment size will be limited by the block group
size. This is not the same as the chunk size for some RAID levels --
for example, RAID-0, a block group can be anything from 2 to n chunks
(across the same number of devices), where each chunk is 1 GiB, so
potentially you could have arbitrary-sized block groups. The same
would apply to RAID-10, -5 and -6.

   (Note, I haven't verified this, but it makes sense based on what I
know of the internal data structures).

   Hugo.

-- 
Hugo Mills | Go not to the elves for counsel, for they will say
hugo@... carfax.org.uk | both no and yes.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: counting fragments takes more time than defragmenting

2015-07-14 Thread Duncan
Patrik Lundquist posted on Tue, 14 Jul 2015 13:57:07 +0200 as excerpted:

 On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote:

 Regardless of whether 1 or huge -t means maximum defrag, however, the
 nominal data chunk size of 1 GiB means that 30 GiB file you mentioned
 should be considered ideally defragged at 31 extents.  This is a
 departure from ext4, which AFAIK in theory has no extent upper limit,
 so should be able to do that 30 GiB file in a single extent.

 But btrfs or ext4, 31 extents ideal or a single extent ideal, 150
 extents still indicates at least some remaining fragmentation.
 
 So I converted the VMware VMDK file to a VirtualBox VDI file:
 
 -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk
 -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi
 
 $ filefrag Windows7.vdi Windows7.vdi: 15 extents found
 
 $ btrfs filesystem defragment -t 3g Windows7.vdi $ filefrag Windows7.vdi
 Windows7.vdi: 24 extents found
 
 How can it be less than 28 extents with a chunk size of 1 GiB?
 
 E2fsprogs version 1.42.12

That's why I said nominal[1] 1 GiB.  I'm just a list and filesystem 
user, not a dev, and I don't know the details, but someone (a dev or at 
least someone that can actually read code, but not a btrfs dev) mentioned 
in reply to a post of mine a few months ago, that under the right 
conditions, btrfs can allocate larger-than 1 GiB data chunks.

I /believe/ data chunk allocation size has something to do with the 
amount of unallocated space on the filesystem; that on large (TiB plus, 
perhaps) btrfs some of the initial allocations will be multiple GiB, 
which of course would allow greater-than 1 GiB extents as well.  But I 
really don't know the conditions under which that can happen and I've not 
seen an actual btrfs dev comment on it, and AFAIK the base data chunk 
size remains 1 GiB under most conditions.  Meanwhile, I tend to partition 
up my storage here, and while I have multiple separate btrfs, the 
partitions are all under 50 GiB, so I'm unlikely to see that sort of  1 
GiB data chunk allocations at all, here.

So rather than go to the complexity of explaining all this detail that 
I'm not sure of anyway, I deliberately blurred out a bit as not necessary 
to the primary point, which was that for files over a GiB, don't expect 
to see or be able to defrag to a single extent, as 1 GiB data chunks and 
thus extents are nominal/normal.

If it does happen, I'd consider it due to those data superchunks and 
wouldn't be entirely surprised, but the point remains that you're 
unlikely to get the number of extents much below the file size number in 
GiB using defrag, even when everything is working perfectly as designed.

---
[1] Nominal: In the sense of normal or standard as-designed value, see 
wiktionary's English adjective sense 6 and 10, as well as the wikipedia 
writeups on real vs. nominal values and nominal size:

https://en.wiktionary.org/wiki/nominal#Adjective
https://en.wikipedia.org/wiki/Real_versus_nominal_value
https://en.wikipedia.org/wiki/Nominal_size

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior.

2015-07-14 Thread David Sterba
On Tue, Jul 07, 2015 at 04:15:28PM +0800, Qu Wenruo wrote:
[...]
 
 Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com

Applied, thanks a lot. I've tested several data/metadata combinations
and the resulting 'fi df' looks ok.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html