Re: Wiki suggestions
On Mon, Jul 13, 2015 at 07:33:13PM +0200, Marc Joliet wrote: Am Mon, 13 Jul 2015 19:21:54 +0200 schrieb Marc Joliet mar...@gmx.de: OK, I'll make the changes then (sans kernel log). Just a heads up: I accepted the terms of service, but the link goes to a non-existent wiki page. I have reported that to kernel.org admins some time ago (02/2014), but the ticket is still open. I'll ping it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Recreate Snapshots and their Parent Relationships on a second Server.
Hi. I have an Old Server with a bunch of btrfs Snapshots. I'm setting up a new server and I would like to transfer those Snapshots as efficiently as possible, while still maintaining their parent-child relationships for space efficient storage. Apart from manually using btrfs send and btrfs send -p where applicable, is there an easy way to transfer everything in one go? Can I identify Snapshot relationships via their ID or some other data that I can display using btrfs tools? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BTRFS raid6 unmountable after a couple of days of usage.
So, after experiencing this same issue multiple times (on almost a dozen different kernel versions since 4.0) and ruling out the possibility of it being caused by my hardware (or at least, the RAM, SATA controller and disk drives themselves), I've decided to report it here. The general symptom is that raid6 profile filesystems that I have are working fine for multiple weeks, until I either reboot or otherwise try to remount them, at which point the system refuses to mount them. I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been seeing this with versions of both since 4.0. Output of 'btrfs fi show' for the most recent fs that I had this issue with: Label: 'altroot' uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5 Total devices 4 FS bytes used 9.70GiB devid1 size 24.00GiB used 6.03GiB path /dev/mapper/vg-altroot.0 devid2 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.1 devid3 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.2 devid4 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.3 btrfs-progs v4.1 Each of the individual LVS that are in the FS is just a flat chunk of space on a separate disk from the others. The FS itself passes btrfs check just fine (no reported errors, exit value of 0), but the kernel refuses to mount it with the message 'open_ctree failed'. I've run btrfs chunk recover and attached the output from that. Here's a link to an image from 'btrfs image -c9 -w': https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0 (That link will expire in 30 days, let me know if you need access to it beyond that). The filesystems in question all see relatively light but consistent usage as targets for receiving daily incremental snapshots for on-system backups (and because I know someone will mention it, yes, I do have other backups of the data, these are just my online backups). All Devices: Device: id = 4, name = /dev/mapper/vg-altroot.3 Device: id = 3, name = /dev/mapper/vg-altroot.2 Device: id = 2, name = /dev/mapper/vg-altroot.1 Device: id = 1, name = /dev/vg/altroot.0 DEVICE SCAN RESULT: Filesystem Information: sectorsize: 4096 leafsize: 16384 tree root generation: 26 chunk root generation: 11 All Devices: Device: id = 4, name = /dev/mapper/vg-altroot.3 Device: id = 3, name = /dev/mapper/vg-altroot.2 Device: id = 2, name = /dev/mapper/vg-altroot.1 Device: id = 1, name = /dev/vg/altroot.0 All Block Groups: Block Group: start = 0, len = 4194304, flag = 2 Block Group: start = 4194304, len = 8388608, flag = 4 Block Group: start = 12582912, len = 8388608, flag = 1 Block Group: start = 20971520, len = 16777216, flag = 102 Block Group: start = 37748736, len = 2147483648, flag = 104 Block Group: start = 2185232384, len = 2147483648, flag = 101 Block Group: start = 4332716032, len = 2147483648, flag = 101 Block Group: start = 6480199680, len = 2147483648, flag = 101 Block Group: start = 8627683328, len = 2147483648, flag = 101 Block Group: start = 10775166976, len = 2147483648, flag = 101 All Chunks: Chunk: start = 0, len = 4194304, type = 2, num_stripes = 1 Stripes list: [ 0] Stripe: devid = 1, offset = 0 Chunk: start = 4194304, len = 8388608, type = 4, num_stripes = 1 Stripes list: [ 0] Stripe: devid = 1, offset = 4194304 Chunk: start = 12582912, len = 8388608, type = 1, num_stripes = 1 Stripes list: [ 0] Stripe: devid = 1, offset = 12582912 Chunk: start = 20971520, len = 16777216, type = 102, num_stripes = 4 Stripes list: [ 0] Stripe: devid = 4, offset = 1048576 [ 1] Stripe: devid = 3, offset = 1048576 [ 2] Stripe: devid = 2, offset = 1048576 [ 3] Stripe: devid = 1, offset = 20971520 Chunk: start = 37748736, len = 2147483648, type = 104, num_stripes = 4 Stripes list: [ 0] Stripe: devid = 4, offset = 9437184 [ 1] Stripe: devid = 3, offset = 9437184 [ 2] Stripe: devid = 2, offset = 9437184 [ 3] Stripe: devid = 1, offset = 29360128 Chunk: start = 2185232384, len = 2147483648, type = 101, num_stripes = 4 Stripes list: [ 0] Stripe: devid = 4, offset = 1083179008 [ 1] Stripe: devid = 3, offset = 1083179008 [ 2] Stripe: devid = 2, offset = 1083179008 [ 3] Stripe: devid = 1, offset = 1103101952 Chunk: start = 4332716032, len = 2147483648, type = 101, num_stripes = 4 Stripes list: [ 0] Stripe: devid = 2, offset = 2156920832 [ 1] Stripe: devid = 3, offset = 2156920832 [ 2] Stripe: devid = 4, offset = 2156920832 [ 3] Stripe: devid = 1,
Btrfs filesystem-fail observations and hints
Hello, at the weekend we had a disk-fail in a 5-disk BtrFS-RAID1 setup. Ideally one failing disk in a RAID1 setup should (at least temporarily) degrade the filesystem and inform root about the situation, but should let the rest of the system unaffected. That’s not what happend. Processes accessing the filesystem hung device-waiting and the filesystem itself “hung” too, producing lots of BTRFS: lost page write due to I/O error on /dev/sdd BTRFS: bdev /dev/sdd errs: wr …, rd …, flush 0, corrupt 0, gen 0 messages. Attempts to reboot the system regularly failed. Only after physically removing the failed (hotplugable) disk from the system, it was possible to reboot the system somewhat normal. Afterwards, trying to get the system running again, the following observation where made: · “btrfs device delete missing” There seems to be no straight-forward way to monitor the progress of the “rebalancing” of the filesystem. It took about 6 hours and while it was possible to estimate the time of finish by watching “btrfs fi show” and extrapolating device-usagem, a method to monitor the progess like “btrfs balance status” would be fine. (“btrfs balance status” says “No balance found on …”) · “btrfs fi df” During “btrfs device delete missing”-rebalance “btrfs fi df” does not reflect the current state of the filesystem. It says p.e. Data, RAID1: total=1.46TiB, used=1.46TiB Data, single: total=8.00MiB, used=0.00B while actually, depending of the advance of the rebalance, about 0 to 300 GByte have only one copy on the devices. So p.e. Data, RAID1: total=1.1TiB, used=1.1TiB Data, single: total=290GiB, used=290GiB would be better reflecting the state of the system. MfG bmg -- „Des is völlig wurscht, was heut beschlos- | M G Berberich sen wird: I bin sowieso dagegn!“ | berbe...@fmi.uni-passau.de (SPD-Stadtrat Kurt Schindler; Regensburg) | www.fmi.uni-passau.de/~berberic -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs progs release 4.1.2 (urgent fix, do not use 4.1.1)
Hi, due to a buggy bugfix to mkfs, filesystems created with version 4.1.1 are not entirely correct. To check if the filesystem is affected run 'btrfs check' and look for Chunk[256, 228, 0]: length(4194304), offset(0), type(2) mismatch with block group[0, 192, 4194304]: offset(4194304), objectid(0), flags(34) Chunk[256, 228, 4194304]: length(8388608), offset(4194304), type(4) mismatch with block group[4194304, 192, 8388608]: offset(8388608), objectid(4194304), flags(36) at the beginning of the output. Such filesystems must not be used and must be recreated. Read-only mount should be safe to retrieve the data, but at the moment this hasn't been verified. Thanks to Qu Wenruo for identifying the problem, I'm sorry for the trouble. Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/ Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert btrfs-progs: mkfs: create only desired block groups for single device
On Tue, Jul 14, 2015 at 10:13:01AM +0800, Qu Wenruo wrote: This reverts commit 5f8232e5c8f0b0de0ef426274911385b0e877392. Thanks. The revert is justified for the severity of the problem, I'll release 4.1.2 asap. This commit causes a regression: --- BTW, do not use --- in the changelog as 'git am' will ignore the text between that and the diff. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: counting fragments takes more time than defragmenting
On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote: Regardless of whether 1 or huge -t means maximum defrag, however, the nominal data chunk size of 1 GiB means that 30 GiB file you mentioned should be considered ideally defragged at 31 extents. This is a departure from ext4, which AFAIK in theory has no extent upper limit, so should be able to do that 30 GiB file in a single extent. But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents still indicates at least some remaining fragmentation. So I converted the VMware VMDK file to a VirtualBox VDI file: -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 15 extents found $ btrfs filesystem defragment -t 3g Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 24 extents found How can it be less than 28 extents with a chunk size of 1 GiB? E2fsprogs version 1.42.12 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS raid6 unmountable after a couple of days of usage.
On 2015-07-14 07:49, Austin S Hemmelgarn wrote: So, after experiencing this same issue multiple times (on almost a dozen different kernel versions since 4.0) and ruling out the possibility of it being caused by my hardware (or at least, the RAM, SATA controller and disk drives themselves), I've decided to report it here. The general symptom is that raid6 profile filesystems that I have are working fine for multiple weeks, until I either reboot or otherwise try to remount them, at which point the system refuses to mount them. I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been seeing this with versions of both since 4.0. Output of 'btrfs fi show' for the most recent fs that I had this issue with: Label: 'altroot' uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5 Total devices 4 FS bytes used 9.70GiB devid1 size 24.00GiB used 6.03GiB path /dev/mapper/vg-altroot.0 devid2 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.1 devid3 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.2 devid4 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.3 btrfs-progs v4.1 Each of the individual LVS that are in the FS is just a flat chunk of space on a separate disk from the others. The FS itself passes btrfs check just fine (no reported errors, exit value of 0), but the kernel refuses to mount it with the message 'open_ctree failed'. I've run btrfs chunk recover and attached the output from that. Here's a link to an image from 'btrfs image -c9 -w': https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0 (That link will expire in 30 days, let me know if you need access to it beyond that). The filesystems in question all see relatively light but consistent usage as targets for receiving daily incremental snapshots for on-system backups (and because I know someone will mention it, yes, I do have other backups of the data, these are just my online backups). Further updates, I just tried mounting the filesystem from the image above again, this time passing device= options for each device in the FS, and it seems to be working fine now. I've tried this with the other filesystems however, and they still won't mount. smime.p7s Description: S/MIME Cryptographic Signature
[PATCH] fstests: regression test for the btrfs clone ioctl
From: Filipe Manana fdman...@suse.com This tests that we can not clone an inline extent into a non-zero file offset. Inline extents at non-zero offsets is something btrfs is not prepared for and results in all sorts of corruption and crashes on future IO operations, such as the following BUG_ON() triggered by the last write operation done by this test: [152154.035903] [ cut here ] [152154.036424] kernel BUG at mm/page-writeback.c:2286! [152154.036424] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC (...) [152154.036424] RIP: 0010:[8111a9d5] [8111a9d5] clear_page_dirty_for_io+0x1e/0x90 (...) [152154.036424] Call Trace: [152154.036424] [a04e97c1] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs] [152154.036424] [a04ea82c] __btrfs_buffered_write+0x245/0x4c8 [btrfs] [152154.036424] [a04ed14b] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs] [152154.036424] [a04ed15a] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs] [152154.036424] [a04ed2c7] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs] [152154.036424] [81165a4a] __vfs_write+0x7c/0xa5 [152154.036424] [81165f89] vfs_write+0xa0/0xe4 [152154.036424] [81166855] SyS_pwrite64+0x64/0x82 [152154.036424] [81465197] system_call_fastpath+0x12/0x6f (...) [152154.242621] ---[ end trace e3d3376b23a57041 ]--- This issue is addressed by the following linux kernel patch for btrfs: Btrfs: fix file corruption after cloning inline extents. Signed-off-by: Filipe Manana fdman...@suse.com --- tests/btrfs/096 | 80 + tests/btrfs/096.out | 12 tests/btrfs/group | 1 + 3 files changed, 93 insertions(+) create mode 100755 tests/btrfs/096 create mode 100644 tests/btrfs/096.out diff --git a/tests/btrfs/096 b/tests/btrfs/096 new file mode 100755 index 000..f5b3a7f --- /dev/null +++ b/tests/btrfs/096 @@ -0,0 +1,80 @@ +#! /bin/bash +# FSQA Test No. 096 +# +# Test that we can not clone an inline extent into a non-zero file offset. +# +#--- +# +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_cloner + +rm -f $seqres.full + +_scratch_mkfs $seqres.full 21 +_scratch_mount + +# Create our test files. File foo has the same 2K of data at offset 4K as file +# bar has at its offset 0. +$XFS_IO_PROG -f -s -c pwrite -S 0xaa 0 4K \ + -c pwrite -S 0xbb 4k 2K \ + -c pwrite -S 0xcc 8K 4K \ + $SCRATCH_MNT/foo | _filter_xfs_io + +# File bar consists of a single inline extent (2K size). +$XFS_IO_PROG -f -s -c pwrite -S 0xbb 0 2K \ + $SCRATCH_MNT/bar | _filter_xfs_io + +# Now call the clone ioctl to clone the extent of file bar into file foo at its +# offset 4K. This made file foo have an inline extent at offset 4K, something +# which the btrfs code can not deal with in future IO operations because all +# inline extents are supposed to start at an offset of 0, resulting in all sorts +# of chaos. +# So here we validate that the clone ioctl returns an EOPNOTSUPP, which is what +# it returns for other cases dealing with inlined extents. +$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \ + $SCRATCH_MNT/bar $SCRATCH_MNT/foo + +# Because of the inline extent at offset 4K, the following write made the kernel +# crash with a BUG_ON(). +$XFS_IO_PROG -c pwrite -S 0xdd 6K 2K $SCRATCH_MNT/foo | _filter_xfs_io + +status=0 +exit diff --git a/tests/btrfs/096.out b/tests/btrfs/096.out new file mode 100644 index 000..235198d --- /dev/null +++ b/tests/btrfs/096.out @@ -0,0 +1,12 @@ +QA output created by 096 +wrote 4096/4096 bytes at offset 0 +XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec
[PATCH] Btrfs: fix file corruption after cloning inline extents
From: Filipe Manana fdman...@suse.com Using the clone ioctl (or extent_same ioctl, which calls the same extent cloning function as well) we end up allowing copy an inline extent from the source file into a non-zero offset of the destination file. This is something not expected and that the btrfs code is not prepared to deal with - all inline extents must be at a file offset equals to 0. For example, the following excerpt of a test case for fstests triggers a crash/BUG_ON() on a write operation after an inline extent is cloned into a non-zero offset: _scratch_mkfs $seqres.full 21 _scratch_mount # Create our test files. File foo has the same 2K of data at offset 4K # as file bar has at its offset 0. $XFS_IO_PROG -f -s -c pwrite -S 0xaa 0 4K \ -c pwrite -S 0xbb 4k 2K \ -c pwrite -S 0xcc 8K 4K \ $SCRATCH_MNT/foo | _filter_xfs_io # File bar consists of a single inline extent (2K size). $XFS_IO_PROG -f -s -c pwrite -S 0xbb 0 2K \ $SCRATCH_MNT/bar | _filter_xfs_io # Now call the clone ioctl to clone the extent of file bar into file # foo at its offset 4K. This made file foo have an inline extent at # offset 4K, something which the btrfs code can not deal with in future # IO operations because all inline extents are supposed to start at an # offset of 0, resulting in all sorts of chaos. # So here we validate that clone ioctl returns an EOPNOTSUPP, which is # what it returns for other cases dealing with inlined extents. $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \ $SCRATCH_MNT/bar $SCRATCH_MNT/foo # Because of the inline extent at offset 4K, the following write made # the kernel crash with a BUG_ON(). $XFS_IO_PROG -c pwrite -S 0xdd 6K 2K $SCRATCH_MNT/foo | _filter_xfs_io status=0 exit The stack trace of the BUG_ON() triggered by the last write is: [152154.035903] [ cut here ] [152154.036424] kernel BUG at mm/page-writeback.c:2286! [152154.036424] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$ [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: GW 4.1.0-rc6-btrfs-next-11+ #2 [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [152154.036424] task: 880429f70990 ti: 880429efc000 task.ti: 880429efc000 [152154.036424] RIP: 0010:[8111a9d5] [8111a9d5] clear_page_dirty_for_io+0x1e/0x90 [152154.036424] RSP: 0018:880429effc68 EFLAGS: 00010246 [152154.036424] RAX: 02000806 RBX: ea0006a6d8f0 RCX: 0001 [152154.036424] RDX: RSI: 81155d1b RDI: ea0006a6d8f0 [152154.036424] RBP: 880429effc78 R08: 8801ce389fe0 R09: 0001 [152154.036424] R10: 2000 R11: R12: 8800200dce68 [152154.036424] R13: R14: 8800200dcc88 R15: 8803d5736d80 [152154.036424] FS: 7fbf119f6700() GS:88043d28() knlGS: [152154.036424] CS: 0010 DS: ES: CR0: 80050033 [152154.036424] CR2: 01bdc000 CR3: 0003aa555000 CR4: 06e0 [152154.036424] Stack: [152154.036424] 8803d5736d80 0001 880429effcd8 a04e97c1 [152154.036424] 880429effd68 880429effd60 0001 8800200dc9c8 [152154.036424] 0001 8800200dcc88 1000 [152154.036424] Call Trace: [152154.036424] [a04e97c1] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs] [152154.036424] [a04ea82c] __btrfs_buffered_write+0x245/0x4c8 [btrfs] [152154.036424] [a04ed14b] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs] [152154.036424] [a04ed15a] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs] [152154.036424] [a04ed2c7] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs] [152154.036424] [81165a4a] __vfs_write+0x7c/0xa5 [152154.036424] [81165f89] vfs_write+0xa0/0xe4 [152154.036424] [81166855] SyS_pwrite64+0x64/0x82 [152154.036424] [81465197] system_call_fastpath+0x12/0x6f [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 0f 0b 4d 85 e4 74 59 49 8b 3c 2$ [152154.036424] RIP [8111a9d5] clear_page_dirty_for_io+0x1e/0x90 [152154.036424] RSP 880429effc68 [152154.242621] ---[ end trace e3d3376b23a57041 ]--- Fix this by returning the error EOPNOTSUPP if an attempt to copy an inline extent into a non-zero offset happens, just like what is done for other scenarios that would require copying/splitting inline extents, which were introduced by
Re: [PATCH] Documentation: update btrfs-replace manual to support RAID5/6
On Tue, Jul 07, 2015 at 10:12:16AM +0800, Wang Yanfeng wrote: Man manual need to be updated since RAID5/6 has been supported by btrfs-replace. Signed-off-by: Wang Yanfeng wangyf-f...@cn.fujitsu.com Applied, thanks. Please do not forget to add 'btrfs-progs' into the subject otherwise, I might miss progs patches while skimming the mailinglist and delay merging unnecessarily. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Disk failed while doing scrub
Dāvis Mosāns posted on Tue, 14 Jul 2015 04:54:27 +0300 as excerpted: 2015-07-13 11:12 GMT+03:00 Duncan 1i5t5.dun...@cox.net: You say five disk, but nowhere in your post do you mention what raid mode you were using, neither do you post btrfs filesystem show and btrfs filesystem df, as suggested on the wiki and which list that information. Sorry, I forgot. I'm running Arch Linux 4.0.7, with btrfs-progs v4.1 Using RAID1 for metadata and single for data, with features big_metadata, extended_iref, mixed_backref, no_holes, skinny_metadata and mounted with noatime,compress=zlib,space_cache,autodefrag Thanks. FWIW, pretty similar here, but running gentoo, now with btrfs- progs v4.1.1 and the mainline 4.2-rc1+ kernel. BTW, note that space_cache has been the default for quite some time, now. I've never actually manually mounted with space_cache on any of my filesystems over several years, now, yet they all report it when I check /proc/mounts, etc. So if you're adding that manually, you can kill that option and save the commandline/fstab space. =:^) Label: 'Data' uuid: 1ec5b839-acc6-4f70-be9d-6f9e6118c71c Total devices 5 FS bytes used 7.16TiB devid1 size 2.73TiB used 2.35TiB path /dev/sdc devid2 size 1.82TiB used 1.44TiB path /dev/sdd devid3 size 1.82TiB used 1.44TiB path /dev/sde devid4 size 1.82TiB used 1.44TiB path /dev/sdg devid5 size 931.51GiB used 539.01GiB path /dev/sdh Data, single: total=7.15TiB, used=7.15TiB System, RAID1: total=8.00MiB, used=784.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=16.00GiB, used=14.37GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B And note that you can easily and quickly remove those empty single-mode system and metadata chunks, which are an artifact of the way mkfs.btrfs works, using balance filters. btrfs balance start -mprofile=single ... should do it. They're actually working on mkfs.btrfs patches to fix it not to do that, right now. There's active patch and testing threads discussing it. Hopefully for btrfs-progs v4.2. (4.1.1 has the patches for single-device and prep work for multi-device, according to the changelog.) Because filesystem still mounts, I assume I should do btrfs device delete /dev/sdd /mntpoint and then restore damaged files from backup. You can try a replace, but with a failing drive still connected, people report mixed results. It's likely to fail as it can't read certain blocks to transfer them to the new device. As I understand, device delete will copy data from that disk and distribute across rest of disks, while btrfs replace will copy to new disk which must be atleast size of disk I'm replacing. Sorry. You wrote delete, I read replace. How'd I do that? =:^( You are absolutely correct. Delete would be better here. I guess I had just been reading a thread discussing the problems I mentioned with replace, and saw what I expected to see, not what you actually wrote. There's no such partial-file with null-fill tools shipped just yet. From journal I have only 14 files mentioned where errors occurred. Now 13 files from them don't throw any errors and their SHA's match to my backups so they're fine. Good. I was going on the assumption that the questionable device was in much worse shape than that. And actually btrfs does allow to copy/read that one damaged file, only I get I/O error when trying to read data from those broken sectors Good, and good to know. Thanks. =:^) best and correct way to recover a file is using ddrescue I was just going to mention ddrescue. =:^) $ du -m /tmp/damaged_file 6251/tmp/damaged_file so basically only like 8K bytes are unrecoverable from this file. Probably there could be created some tool which could get even more data knowing about btrfs. There /is/, however, a command that can be used to either regenerate or zero-out the checksum tree. See btrfs check --init-csum-tree. Seems, you can't specify a path/file for it and it's quite destructive action if you want to get data only about some one specific file. Yes. It's whole-filesystem-all-or-nothing, unfortunately. =:^( I did scrub second time and this time there aren't that many uncorrectable errors and also there's no csum_errors so --init-csum-tree is useless here I think. Agreed. Most likely previously scrub got that many errors because it still continued for a bit even if disk didn't respond. Yes. scrub status [...] read_errors: 2 csum_errors: 0 verify_errors: 0 no_csum: 89600 csum_discards: 656214 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 2 unverified_errors: 0 corrected_errors: 0 last_physical: 2590041112576 OK, that matches up with 8 KiB bad, since blocks are 4 KiB and there's two uncorrectable errors. With the scrub
'btrfs subvolume list /' executed as non-root produces a not-so-nice error message
$ btrfs subvolume list / ERROR: can't perform the search - Operation not permitted ERROR: can't get rootid for ‘/' I don't know what a 'rootid' is as a user, and i don't really want to ponder whether I need to find out. What about a simple ERROR: Permission denied. instead? Cheers, Johannes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: counting fragments takes more time than defragmenting
On 14 July 2015 at 20:41, Hugo Mills h...@carfax.org.uk wrote: On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote: On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote: Regardless of whether 1 or huge -t means maximum defrag, however, the nominal data chunk size of 1 GiB means that 30 GiB file you mentioned should be considered ideally defragged at 31 extents. This is a departure from ext4, which AFAIK in theory has no extent upper limit, so should be able to do that 30 GiB file in a single extent. But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents still indicates at least some remaining fragmentation. So I converted the VMware VMDK file to a VirtualBox VDI file: -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 15 extents found $ btrfs filesystem defragment -t 3g Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 24 extents found How can it be less than 28 extents with a chunk size of 1 GiB? I _think_ the fragment size will be limited by the block group size. This is not the same as the chunk size for some RAID levels -- for example, RAID-0, a block group can be anything from 2 to n chunks (across the same number of devices), where each chunk is 1 GiB, so potentially you could have arbitrary-sized block groups. The same would apply to RAID-10, -5 and -6. (Note, I haven't verified this, but it makes sense based on what I know of the internal data structures). It's a raid1 filesystem, so the block group ought to be the same size as the chunk, right? A 2GiB block group would suffice to explain it though. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: counting fragments takes more time than defragmenting
On Tue, Jul 14, 2015 at 09:09:00PM +0200, Patrik Lundquist wrote: On 14 July 2015 at 20:41, Hugo Mills h...@carfax.org.uk wrote: On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote: On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote: Regardless of whether 1 or huge -t means maximum defrag, however, the nominal data chunk size of 1 GiB means that 30 GiB file you mentioned should be considered ideally defragged at 31 extents. This is a departure from ext4, which AFAIK in theory has no extent upper limit, so should be able to do that 30 GiB file in a single extent. But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents still indicates at least some remaining fragmentation. So I converted the VMware VMDK file to a VirtualBox VDI file: -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 15 extents found $ btrfs filesystem defragment -t 3g Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 24 extents found How can it be less than 28 extents with a chunk size of 1 GiB? I _think_ the fragment size will be limited by the block group size. This is not the same as the chunk size for some RAID levels -- for example, RAID-0, a block group can be anything from 2 to n chunks (across the same number of devices), where each chunk is 1 GiB, so potentially you could have arbitrary-sized block groups. The same would apply to RAID-10, -5 and -6. (Note, I haven't verified this, but it makes sense based on what I know of the internal data structures). It's a raid1 filesystem, so the block group ought to be the same size as the chunk, right? Yes. A 2GiB block group would suffice to explain it though. Not with RAID-1 -- I'd expect the block group size to be 1 GiB. Hugo. -- Hugo Mills | There isn't a noun that can't be verbed. hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Anyone tried out btrbk yet?
On Wed, Jul 15, 2015 at 10:03:16AM +1000, Paul Harvey wrote: The way it works in snazzer (and btrbk and I think also btrfs-sxbackup as well), local snapshots continue to happen as normal (Eg. daily or hourly) and so when your backup media or backup server is finally available again, the size of each individual incremental is still the same as usual, it just has to perform more of them. Good point. My system is not as smart. Every night, it'll make a new backup and only send one incremental and hope it gets there. It doesn't make a bunch of incrementals and send multiple. The other options do a better job here. Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS raid6 unmountable after a couple of days of usage.
On Tue, Jul 14, 2015 at 7:25 AM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2015-07-14 07:49, Austin S Hemmelgarn wrote: So, after experiencing this same issue multiple times (on almost a dozen different kernel versions since 4.0) and ruling out the possibility of it being caused by my hardware (or at least, the RAM, SATA controller and disk drives themselves), I've decided to report it here. The general symptom is that raid6 profile filesystems that I have are working fine for multiple weeks, until I either reboot or otherwise try to remount them, at which point the system refuses to mount them. I'm currently using btrfs-progs v4.1 with kernel 4.1.2, although I've been seeing this with versions of both since 4.0. Output of 'btrfs fi show' for the most recent fs that I had this issue with: Label: 'altroot' uuid: 86eef6b9-febe-4350-a316-4cb00c40bbc5 Total devices 4 FS bytes used 9.70GiB devid1 size 24.00GiB used 6.03GiB path /dev/mapper/vg-altroot.0 devid2 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.1 devid3 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.2 devid4 size 24.00GiB used 6.01GiB path /dev/mapper/vg-altroot.3 btrfs-progs v4.1 Each of the individual LVS that are in the FS is just a flat chunk of space on a separate disk from the others. The FS itself passes btrfs check just fine (no reported errors, exit value of 0), but the kernel refuses to mount it with the message 'open_ctree failed'. I've run btrfs chunk recover and attached the output from that. Here's a link to an image from 'btrfs image -c9 -w': https://www.dropbox.com/s/pl7gs305ej65u9q/altroot.btrfs.img?dl=0 (That link will expire in 30 days, let me know if you need access to it beyond that). The filesystems in question all see relatively light but consistent usage as targets for receiving daily incremental snapshots for on-system backups (and because I know someone will mention it, yes, I do have other backups of the data, these are just my online backups). Further updates, I just tried mounting the filesystem from the image above again, this time passing device= options for each device in the FS, and it seems to be working fine now. I've tried this with the other filesystems however, and they still won't mount. And it's the same message with the usual suspects: recovery, ro,recovery ? How about degraded even though it's not degraded? And what about 'btrfs rescue zero-log' ? Of course it's weird that btrfs check doesn't complain, but mount does. I don't understand that, so it's good you've got an image. If either recovery or zero-log fix the problem, my understanding is this suggests hardware did something Btrfs didn't expect. What about 'btrfs check --check-data-csum which should act similar to a read-only scrub (different output though)? Hmm, nah. The thing is the failure to mount is failing on some aspect of metadata, not data. So the fact that check (on metadata) passes but mount fails is a bug somewhere... -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
The way it works in snazzer (and btrbk and I think also btrfs-sxbackup as well), local snapshots continue to happen as normal (Eg. daily or hourly) and so when your backup media or backup server is finally available again, the size of each individual incremental is still the same as usual, it just has to perform more of them. Separating snapshotting from transport lends to more flexibility IMHO, Eg. with snazzer I can keep multiple physical backup media in sync with each other even if I only rotate/attach those disks once a week/month (maintain backup filesystems in parallel), the snazzer-receive script is very dumb - it just receives all the missing snapshots from the source. However it does filter them cf. btrfs subvolume list /subvolume | snazzer-prune-candidates --invert first in case some would just be deleted again shortly after according to retention policy. For the ssh transport, you can do the same things but in series: push the snapshots up to a local server and then on to remote storage elsewhere (maintain backup filesystems in series). Because the snapshotting, transport and pruning operations are asynchronous the logic for all this is relatively simple. It's thanks to seeing send/receive struggles such as yours on this list (which has also happened to me, but only very rarely: it seems I tend to have reliable connectivity), among other issues, that I wrote snazzer-measure. It ends up appending reproducible sha512sum and pgp signatures to a measurements file for each snapshot, measurements happen more than just once so they're timestamped with hostname - the hope is I should spot any corruption that happens after the first measurements are taken. This is also a separate/async operation (it's the most I/O and CPU intense operation of all). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
On Thu, Jul 9, 2015 at 10:45 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: Chris Murphy wrote on 2015/07/09 18:45 -0600: On Thu, Jul 9, 2015 at 6:34 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: One of my patch addressed a problem that a converted btrfs can't pass btrfsck. Not sure if that is the cause, but if you can try btrfs-progs v3.19.1, the one without my btrfs-progs patches and some other newer convert related patches, and see the result? I think this would at least provide the base for bisect the btrfs-progs if the bug is in btrfs-progs. I'm happy to regression test with 3.19.1 but I'm confused. After conversion, btrfs check (4.1) finds no problems. After ext2_saved snapshot is deleted, btrfsck finds no problems. After defrag, again btrfsck finds no problems. After the failed balance, btrfsck finds no problems but crashes with Aborted (core dump). Even btrfsck reports no error, some btrfs-convert behavior change may lead to kernel mis-function. But we are not sure it's btrfs-progs or kernel itself has bug. Maybe btrfs convert did something wrong/different triggering the bug, or just kernel regression? So hat I'd like to check is, with 3.19.1 progs (kernel version doesn't change), whether the kernel still failes to do balance. If the problem still happens, then we can focus on kernel part, or at least, put at least less effort on btrfs-progs. Should I still test 3.19.1? I'm not able to reproduce this for reasons I don't understand. The setup is in a qemu-kvm VM, with the ext4 original as a qcow2. I had been using 'qemu-img create -f qcow2 -o nocow=on -b original.qcow2 converted qcow2' and then doing the conversions with the converted.qcow2 file in the VM. I did this half a dozen times and always at the balance step it imploded (but differently for 4.1 and 4.2). Now the balance completes with no errors. btrfs check doesn't complaint either. Very irritating as nothing else has changed with that VM. There is another user who had this similar problem with the converted ext4 going read-only. They ran btrfs check with both 3.19.1 and 4.0. Their results are here, hopefully it's helpful until I can figure out how to get this reproducing again. http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg44701.html -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recreate Snapshots and their Parent Relationships on a second Server.
btrfs subvolume list -uq /some/subvol can help figure out the existing parent relationships, but in practice if your snapshots are simply a linear series over time then I doubt you'll gain much by parsing all those UUIDs over simply doing an initial btrfs send/receive without any parent followed by send/receive operations using the previous snapshot as the parent. I'm assuming of course that your snapshots are easier to sort by chronological order than parsing the UUIDs out of the btrfs subvol list -up output. In my experience I actually end up using slightly less disk space with the new parent/child relationships on the new filesystem, I assume because the original source filesystem had missing parents that no longer exist as they've long since been pruned, but it might also just be that there's less fragmentation and smaller metadata consumption to hold the new relationships. On 14 July 2015 at 20:14, Robert Krig robert.k...@render-wahnsinn.de wrote: Hi. I have an Old Server with a bunch of btrfs Snapshots. I'm setting up a new server and I would like to transfer those Snapshots as efficiently as possible, while still maintaining their parent-child relationships for space efficient storage. Apart from manually using btrfs send and btrfs send -p where applicable, is there an easy way to transfer everything in one go? Can I identify Snapshot relationships via their ID or some other data that I can display using btrfs tools? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Fix lockdep warning of btrfs_run_delayed_iputs()
From: Zhao Lei zhao...@cn.fujitsu.com Liu Bo bo.li@oracle.com reported a lockdep warning of delayed_iput_sem in xfstests generic/241: [ 2061.345955] = [ 2061.346027] [ INFO: possible recursive locking detected ] [ 2061.346027] 4.1.0+ #268 Tainted: GW [ 2061.346027] - [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock: [ 2061.346027] (fs_info-delayed_iput_sem){..}, at: [814063ab] btrfs_run_delayed_iputs+0x6b/0x100 [ 2061.346027] but task is already holding lock: [ 2061.346027] (fs_info-delayed_iput_sem){..}, at: [814063ab] btrfs_run_delayed_iputs+0x6b/0x100 [ 2061.346027] other info that might help us debug this: [ 2061.346027] Possible unsafe locking scenario: [ 2061.346027]CPU0 [ 2061.346027] [ 2061.346027] lock(fs_info-delayed_iput_sem); [ 2061.346027] lock(fs_info-delayed_iput_sem); [ 2061.346027] *** DEADLOCK *** It is rarely happened, about 1/400 in my test env. The reason is recursion of btrfs_run_delayed_iputs(): cleaner_kthread - btrfs_run_delayed_iputs() *1 - get delayed_iput_sem lock *2 - iput() - ... - btrfs_commit_transaction() - btrfs_run_delayed_iputs() *1 - get delayed_iput_sem lock (dead lock) *2 *1: recursion of btrfs_run_delayed_iputs() *2: warning of lockdep about delayed_iput_sem When fs is in high stress, new iputs may added into fs_info-delayed_iputs list when btrfs_run_delayed_iputs() is running, which cause second btrfs_run_delayed_iputs() run into down_read(fs_info-delayed_iput_sem) again, and cause above lockdep warning. Actually, it will not cause real problem because both locks are read lock, but to avoid lockdep warning, we can do a fix. Fix: Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for cleaner_kthread thread to break above recursion path. cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code, and don't need to call btrfs_run_delayed_iputs() again in btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow. Test: No above lockdep warning after patch in 1200 generic/241 tests. Reported-by: Liu Bo bo.li@oracle.com Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/transaction.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index c0f18e7..31248ad 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -2152,7 +2152,8 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_trans_handle_cachep, trans); - if (current != root-fs_info-transaction_kthread) + if (current != root-fs_info-transaction_kthread + current != root-fs_info-cleaner_kthread) btrfs_run_delayed_iputs(root); return ret; -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Revert btrfs-progs: mkfs: create only desired block groups for single device
David Sterba wrote on 2015/07/14 13:45 +0200: On Tue, Jul 14, 2015 at 10:13:01AM +0800, Qu Wenruo wrote: This reverts commit 5f8232e5c8f0b0de0ef426274911385b0e877392. Thanks. The revert is justified for the severity of the problem, I'll release 4.1.2 asap. This commit causes a regression: --- BTW, do not use --- in the changelog as 'git am' will ignore the text between that and the diff. Oh, sorry I forgot that --- will be ignored by git. Next time I'll use === to avoid such careless problem. BTW, for the mkfs test case, it will be delayed for a while as the following bugs are making things quite tricky. 1) fsck ignore chunk errors and return 0. Cause is known and easy to fix, but if fixed, most of fsck test won't pass. As the following bug is causing problem. 2) btrfs-image restore bug, causing missing dev_extent for DUP chunk. Investigating. that's the reason causing a lot of dev extent missing in mkfs test. And thanks to the previous bug, we can pass fsck test like a miracle. So I'm afraid the corresponding regression test case won't be in time with 4.1.2 hotfix release. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: counting fragments takes more time than defragmenting
On Tue, Jul 14, 2015 at 01:57:07PM +0200, Patrik Lundquist wrote: On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote: Regardless of whether 1 or huge -t means maximum defrag, however, the nominal data chunk size of 1 GiB means that 30 GiB file you mentioned should be considered ideally defragged at 31 extents. This is a departure from ext4, which AFAIK in theory has no extent upper limit, so should be able to do that 30 GiB file in a single extent. But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents still indicates at least some remaining fragmentation. So I converted the VMware VMDK file to a VirtualBox VDI file: -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 15 extents found $ btrfs filesystem defragment -t 3g Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 24 extents found How can it be less than 28 extents with a chunk size of 1 GiB? I _think_ the fragment size will be limited by the block group size. This is not the same as the chunk size for some RAID levels -- for example, RAID-0, a block group can be anything from 2 to n chunks (across the same number of devices), where each chunk is 1 GiB, so potentially you could have arbitrary-sized block groups. The same would apply to RAID-10, -5 and -6. (Note, I haven't verified this, but it makes sense based on what I know of the internal data structures). Hugo. -- Hugo Mills | Go not to the elves for counsel, for they will say hugo@... carfax.org.uk | both no and yes. http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: counting fragments takes more time than defragmenting
Patrik Lundquist posted on Tue, 14 Jul 2015 13:57:07 +0200 as excerpted: On 24 June 2015 at 12:46, Duncan 1i5t5.dun...@cox.net wrote: Regardless of whether 1 or huge -t means maximum defrag, however, the nominal data chunk size of 1 GiB means that 30 GiB file you mentioned should be considered ideally defragged at 31 extents. This is a departure from ext4, which AFAIK in theory has no extent upper limit, so should be able to do that 30 GiB file in a single extent. But btrfs or ext4, 31 extents ideal or a single extent ideal, 150 extents still indicates at least some remaining fragmentation. So I converted the VMware VMDK file to a VirtualBox VDI file: -rw--- 1 plu plu 28845539328 jul 13 13:36 Windows7-disk1.vmdk -rw--- 1 plu plu 28993126400 jul 13 14:04 Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 15 extents found $ btrfs filesystem defragment -t 3g Windows7.vdi $ filefrag Windows7.vdi Windows7.vdi: 24 extents found How can it be less than 28 extents with a chunk size of 1 GiB? E2fsprogs version 1.42.12 That's why I said nominal[1] 1 GiB. I'm just a list and filesystem user, not a dev, and I don't know the details, but someone (a dev or at least someone that can actually read code, but not a btrfs dev) mentioned in reply to a post of mine a few months ago, that under the right conditions, btrfs can allocate larger-than 1 GiB data chunks. I /believe/ data chunk allocation size has something to do with the amount of unallocated space on the filesystem; that on large (TiB plus, perhaps) btrfs some of the initial allocations will be multiple GiB, which of course would allow greater-than 1 GiB extents as well. But I really don't know the conditions under which that can happen and I've not seen an actual btrfs dev comment on it, and AFAIK the base data chunk size remains 1 GiB under most conditions. Meanwhile, I tend to partition up my storage here, and while I have multiple separate btrfs, the partitions are all under 50 GiB, so I'm unlikely to see that sort of 1 GiB data chunk allocations at all, here. So rather than go to the complexity of explaining all this detail that I'm not sure of anyway, I deliberately blurred out a bit as not necessary to the primary point, which was that for files over a GiB, don't expect to see or be able to defrag to a single extent, as 1 GiB data chunks and thus extents are nominal/normal. If it does happen, I'd consider it due to those data superchunks and wouldn't be entirely surprised, but the point remains that you're unlikely to get the number of extents much below the file size number in GiB using defrag, even when everything is working perfectly as designed. --- [1] Nominal: In the sense of normal or standard as-designed value, see wiktionary's English adjective sense 6 and 10, as well as the wikipedia writeups on real vs. nominal values and nominal size: https://en.wiktionary.org/wiki/nominal#Adjective https://en.wikipedia.org/wiki/Real_versus_nominal_value https://en.wikipedia.org/wiki/Nominal_size -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] btrfs-progs: mkfs: Cleanup temporary chunk to avoid strange balance behavior.
On Tue, Jul 07, 2015 at 04:15:28PM +0800, Qu Wenruo wrote: [...] Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Applied, thanks a lot. I've tested several data/metadata combinations and the resulting 'fi df' looks ok. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html