[PATCH] Btrfs: fix possible deadlock in btrfs_cleanup_transaction
[13654.480669] == [13654.480905] [ INFO: possible circular locking dependency detected ] [13654.481003] 3.12.0+ #4 Tainted: GW O [13654.481060] --- [13654.481060] btrfs-transacti/9347 is trying to acquire lock: [13654.481060] ((root-ordered_extent_lock)-rlock){+.+...}, at: [a02d30a1] btrfs_cleanup_transaction+0x271/0x570 [btrfs] [13654.481060] but task is already holding lock: [13654.481060] ((fs_info-ordered_root_lock)-rlock){+.+...}, at: [a02d3015] btrfs_cleanup_transaction+0x1e5/0x570 [btrfs] [13654.481060] which lock already depends on the new lock. [13654.481060] the existing dependency chain (in reverse order) is: [13654.481060] - #1 ((fs_info-ordered_root_lock)-rlock){+.+...}: [13654.481060][810c4103] lock_acquire+0x93/0x130 [13654.481060][81689991] _raw_spin_lock+0x41/0x50 [13654.481060][a02f011b] __btrfs_add_ordered_extent+0x39b/0x450 [btrfs] [13654.481060][a02f0202] btrfs_add_ordered_extent+0x32/0x40 [btrfs] [13654.481060][a02df6aa] run_delalloc_nocow+0x78a/0x9d0 [btrfs] [13654.481060][a02dfc0d] run_delalloc_range+0x31d/0x390 [btrfs] [13654.481060][a02f7c00] __extent_writepage+0x310/0x780 [btrfs] [13654.481060][a02f830a] extent_write_cache_pages.isra.29.constprop.48+0x29a/0x410 [btrfs] [13654.481060][a02f879d] extent_writepages+0x4d/0x70 [btrfs] [13654.481060][a02d9f68] btrfs_writepages+0x28/0x30 [btrfs] [13654.481060][8114be91] do_writepages+0x21/0x50 [13654.481060][81140d49] __filemap_fdatawrite_range+0x59/0x60 [13654.481060][81140e13] filemap_fdatawrite_range+0x13/0x20 [13654.481060][a02f1db9] btrfs_wait_ordered_range+0x49/0x140 [btrfs] [13654.481060][a0318fe2] __btrfs_write_out_cache+0x682/0x8b0 [btrfs] [13654.481060][a031952d] btrfs_write_out_cache+0x8d/0xe0 [btrfs] [13654.481060][a02c7083] btrfs_write_dirty_block_groups+0x593/0x680 [btrfs] [13654.481060][a0345307] commit_cowonly_roots+0x14b/0x20d [btrfs] [13654.481060][a02d7c1a] btrfs_commit_transaction+0x43a/0x9d0 [btrfs] [13654.481060][a030061a] btrfs_create_uuid_tree+0x5a/0x100 [btrfs] [13654.481060][a02d5a8a] open_ctree+0x21da/0x2210 [btrfs] [13654.481060][a02ab6fe] btrfs_mount+0x68e/0x870 [btrfs] [13654.481060][811b2409] mount_fs+0x39/0x1b0 [13654.481060][811cd653] vfs_kern_mount+0x63/0xf0 [13654.481060][811cfcce] do_mount+0x23e/0xa90 [13654.481060][811d05a3] SyS_mount+0x83/0xc0 [13654.481060][81692b52] system_call_fastpath+0x16/0x1b [13654.481060] - #0 ((root-ordered_extent_lock)-rlock){+.+...}: [13654.481060][810c340a] __lock_acquire+0x150a/0x1a70 [13654.481060][810c4103] lock_acquire+0x93/0x130 [13654.481060][81689991] _raw_spin_lock+0x41/0x50 [13654.481060][a02d30a1] btrfs_cleanup_transaction+0x271/0x570 [btrfs] [13654.481060][a02d35ce] transaction_kthread+0x22e/0x270 [btrfs] [13654.481060][81079efa] kthread+0xea/0xf0 [13654.481060][81692aac] ret_from_fork+0x7c/0xb0 [13654.481060] other info that might help us debug this: [13654.481060] Possible unsafe locking scenario: [13654.481060]CPU0CPU1 [13654.481060] [13654.481060] lock((fs_info-ordered_root_lock)-rlock); [13654.481060] lock((root-ordered_extent_lock)-rlock); [13654.481060] lock((fs_info-ordered_root_lock)-rlock); [13654.481060] lock((root-ordered_extent_lock)-rlock); [13654.481060] *** DEADLOCK *** [...] == btrfs_destroy_all_ordered_extents() gets fs_info-ordered_root_lock __BEFORE__ acquiring root-ordered_extent_lock, while btrfs_[add,remove]_ordered_extent() acquires fs_info-ordered_root_lock __AFTER__ getting root-ordered_extent_lock. This patch fixes the above problem. Signed-off-by: Liu Bo bo.li@oracle.com --- fs/btrfs/disk-io.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 3903bd3..8cd48c3 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3794,9 +3794,11 @@ static void btrfs_destroy_all_ordered_extents(struct btrfs_fs_info *fs_info) list_move_tail(root-ordered_root, fs_info-ordered_roots); + spin_unlock(fs_info-ordered_root_lock); btrfs_destroy_ordered_extents(root); - cond_resched_lock(fs_info-ordered_root_lock); +
Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
Hi Filipe, If you disable CONFIG_BTRFS_FS_RUN_SANITY_TESTS, does it still crash? I tried disabling CONFIG_BTRFS_FS_RUN_SANITY_TESTS in the reported 3 randconfigs and they all boot fine. Thanks, Fengguang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Provide a better free space estimate on RAID1
On Mon, 10 Feb 2014 00:02:38 + (UTC) Duncan 1i5t5.dun...@cox.net wrote: Meanwhile, you said it yourself, users aren't normally concerned about this. I think you're being mistaken here, the point that users aren't looking at the free space, hence it is not important to provide a correct estimate was made by someone else, not me. Personally I found that to be just a bit too surrealistic to try and seriously answer; much like the rest of your message. -- With respect, Roman signature.asc Description: PGP signature
Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
On Sat, Feb 08, 2014 at 03:10:37PM -0500, Tejun Heo wrote: Hello, David, Fengguang, Chris. On Fri, Feb 07, 2014 at 01:13:06PM -0800, David Rientjes wrote: On Fri, 7 Feb 2014, Fengguang Wu wrote: On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote: On Fri, 7 Feb 2014, Fengguang Wu wrote: [1.625020] BTRFS: selftest: Running btrfs_split_item tests [1.627004] BTRFS: selftest: Running find delalloc tests [2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1, oom_score_adj=0 [ 292.086439] kthreadd cpuset= [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at 0038 [ 292.087372] IP: [812119de] pr_cont_kernfs_name+0x1b/0x6c This looks like a problem with the cpuset cgroup name, are you sure this isn't related to the removal of cgroup-name? It looks not related to patch cgroup: remove cgroup-name, because that patch lies in the cgroup tree and not contained in output of git log BAD_COMMIT. Sorry I was wrong here. I find that the above dmesg is for commit 4830363 which is a merge HEAD that contains the cgroup code. The dmesg for commit 878a876b2e1 (Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs) looks different, which hangs after the tsc line: [2.428110] Btrfs loaded, assert=on, integrity-checker=on [2.429469] BTRFS: selftest: Running btrfs free space cache tests [2.430874] BTRFS: selftest: Running extent only tests [2.432135] BTRFS: selftest: Running bitmap only tests [2.433359] BTRFS: selftest: Running bitmap and extent tests [2.434675] BTRFS: selftest: Free space cache tests finished [2.435959] BTRFS: selftest: Running extent buffer operation tests [2.437350] BTRFS: selftest: Running btrfs_split_item tests [2.438843] BTRFS: selftest: Running find delalloc tests [3.158351] tsc: Refined TSC clocksource calibration: 2666.596 MHz It's dying on pr_cont_kernfs_name which is some tree that has kernfs: implement kernfs_get_parent(), kernfs_name/path() and friends, which is not in linux-next, and is obviously printing the cpuset cgroup name. It doesn't look like it has anything at all to do with btrfs or why they would care about this failure. Yeah, this is from a patch in cgroup/review-post-kernfs-conversion branch which updates cgroup to use pr_cont_kernfs_name(). I forget that cgrp-kn is NULL for the dummy_root's top cgroup and thus it ends up calling the kernfs functions with NULL kn and thus the oops. I posted an updated patch and the git branch has been updated. http://lkml.kernel.org/g/20140208200640.gb10...@htj.dyndns.org So, nothing to do with btrfs and it looks like somehow the test appratus is mixing up branches? Yes - I may do random merges and boot test the resulted kernels. Thanks, Fengguang -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Btrfs: avoid warning bomb of btrfs_invalidate_inodes
So after transaction is aborted, we need to cleanup inode resources by calling btrfs_invalidate_inodes(), and btrfs_invalidate_inodes() hopes roots' refs to be zero in old times and sets a WARN_ON(), however, this is not always true within cleaning up transaction, so we get to detect transaction abortion and not warn at all. Signed-off-by: Liu Bo bo.li@oracle.com --- v2: Follow Josef's advice, ie. in case of aborting transaction, we no more warn. fs/btrfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1af34d0..e876c1e 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4920,7 +4920,8 @@ void btrfs_invalidate_inodes(struct btrfs_root *root) struct inode *inode; u64 objectid = 0; - WARN_ON(btrfs_root_refs(root-root_item) != 0); + if (!test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state)) + WARN_ON(btrfs_root_refs(root-root_item) != 0); spin_lock(root-inode_lock); again: -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with btrfs balance
I've experienced the following with balance: Setup: - Kernel 3.12.9 - 11 DVD sized (4.3GB) loopback devices. (9 Read-Only Seed devices + 2 Read/Write devices) - 9 device seed created with -m single -d single and made Read-only with btrfstune -S 1 ... - 2 devices was added at different dates. NO balance performed until now. - NOW add 1 more device to the array and perform a balance. Result: Balance did run for a while and exited displaying Process Killed. Any attempt to unmount the array failed, preventing any shutdown. Hence I had no option other than hard reset. After reboot, issuing balance command gave the message Balance in progress. I cancelled the balance and tried to remove the last device which ended with a kernel crash. So I dumped 2 + 1 normal devices. Former 9 multi device seed was OK and was mountable. Regards, Imran -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xfstests: btrfs/004: fix to make test really work
From: Wang Shilong wangsl.f...@cn.fujitsu.com So i was wandering why test 004 could pass my previous wrong kernel patch while it defenitely should not. By some debugging, i found here perl script is wrong, we did not filter out anything and this unit test did not work acutally.so it came out we will never fail this test. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- tests/btrfs/004 | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) mode change 100755 = 100644 tests/btrfs/004 diff --git a/tests/btrfs/004 b/tests/btrfs/004 old mode 100755 new mode 100644 index 14da9f1..17a6e34 --- a/tests/btrfs/004 +++ b/tests/btrfs/004 @@ -57,10 +57,9 @@ _require_command /usr/sbin/filefrag rm -f $seqres.full -FILEFRAG_FILTER='if (/, blocksize (\d+)/) {$blocksize = $1; next} ($ext, '\ -'$logical, $physical, $expected, $length, $flags) = (/^\s*(\d+)\s+(\d+)'\ -'\s+(\d+)\s+(?:(\d+)\s+)?(\d+)\s+(.*)/) or next; $flags =~ '\ -'/(?:^|,)inline(?:,|$)/ and next; print $physical * $blocksize, #, '\ +FILEFRAG_FILTER='if (/blocks of (\d+) bytes/) {$blocksize = $1; next} ($ext, '\ +'$logical, $physical, $length) = (/^\s*(\d+):\s+(\d+)..\s+\d+:'\ +'\s+(\d+)..\s+\d+:\s+(\d+):/) or next; print $physical * $blocksize, #, '\ '$length * $blocksize, #, $logical * $blocksize, ' # this makes filefrag output script readable by using a perl helper. -- 1.8.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs-progs: receive: don't output normal message into stderr
From: Wang Shilong wangsl.f...@cn.fujitsu.com Don't output normal messages into stderr, this make xfstests filter output easier. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- cmds-receive.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cmds-receive.c b/cmds-receive.c index cce37a7..2d55c53 100644 --- a/cmds-receive.c +++ b/cmds-receive.c @@ -213,7 +213,7 @@ static int process_snapshot(const char *path, const u8 *uuid, u64 ctransid, free(r-full_subvol_path); r-full_subvol_path = path_cat3(r-root_path, r-dest_dir_path, path); - fprintf(stderr, At snapshot %s\n, path); + fprintf(stdout, At snapshot %s\n, path); memcpy(r-cur_subvol-received_uuid, uuid, BTRFS_UUID_SIZE); r-cur_subvol-stransid = ctransid; -- 1.8.4 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs send runs out of memory and file handles
On Mon, Feb 10, 2014 at 01:28:38PM +, Frank Kingswood wrote: Hi, I'm attempting to back up a btrfs subvolume with $ btrfs send /path/to/subvol | nc and the receiving end does $ nc -l | btrfs receive /path/to/volume This subvolume holds ~250 GB of data, about half full, and uses RAID1. Doing so runs out of file descriptors on the sending machine (having over 100k files open) and eventually runs out of memory and gets killed by the OOM killer. This sounds like a known bug, and I think it was fixed in 3.13. What kernel version are you using? Hugo. What are the memory requirements of btrfs send? This is the initial send so the entire volume must be transferred. Are later send operations equally memory-intensive? Is it possible to do an incremental send, or send partial snapshots and combine them later on? -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Everything simple is false. Everything which is --- complex is unusable. signature.asc Description: Digital signature
Re: btrfs send runs out of memory and file handles
On 10/02/14 13:47, Hugo Mills wrote: On Mon, Feb 10, 2014 at 01:28:38PM +, Frank Kingswood wrote: I'm attempting to back up a btrfs subvolume [...] Doing so runs out of file descriptors on the sending machine (having over 100k files open) and eventually runs out of memory and gets killed by the OOM killer. This sounds like a known bug, and I think it was fixed in 3.13. What kernel version are you using? This is on 3.12.5. I can build and install 3.13.2 and test it again. Frank -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
scrub crashed?
root@fortknox:~# uname -a Linux fortknox 3.12-0.bpo.1-amd64 #1 SMP Debian 3.12.9-1~bpo70+1 (2014-02-07) x86_64 GNU/Linux root@fortknox:~# btrfs version Btrfs v3.12 root@fortknox:~# btrfs scrub status -d /bunker scrub status for 11312131-3372-4637-b526-35a4ef0c31eb scrub device /dev/mapper/bunkerA (id 1) status scrub started at Sun Feb 9 19:12:22 2014, running for 11317 seconds total bytes scrubbed: 325.32GiB with 0 errors scrub device /dev/dm-1 (id 2) status scrub started at Sun Feb 9 19:12:22 2014, running for 11317 seconds total bytes scrubbed: 321.76GiB with 0 errors root@fortknox:~# btrfs scrub cancel /bunker ERROR: scrub cancel failed on /bunker: not running root@fortknox:~# btrfs scrub start /bunker ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /bunker'. To see the status use 'btrfs scrub status [-d] /bunker'. root@fortknox:~# ps -A|grep btrfs 3704 ?00:00:00 btrfs-genwork-1 3705 ?00:00:00 btrfs-submit-1 3706 ?00:00:00 btrfs-delalloc- 3707 ?00:00:00 btrfs-fixup-1 3708 ?00:00:02 btrfs-endio-1 3709 ?00:00:00 btrfs-endio-met 3710 ?00:00:00 btrfs-rmw-1 3711 ?00:00:00 btrfs-endio-rai 3712 ?00:00:00 btrfs-endio-met 3714 ?00:00:05 btrfs-freespace 3715 ?00:00:00 btrfs-delayed-m 3716 ?00:00:00 btrfs-cache-1 3717 ?00:00:00 btrfs-readahead 3718 ?00:00:00 btrfs-flush_del 3719 ?00:00:00 btrfs-qgroup-re 3720 ?00:00:00 btrfs-cleaner 3721 ?00:02:00 btrfs-transacti 8380 ?00:00:00 btrfs-endio-wri 8836 ?00:00:00 btrfs-worker-4 8936 ?00:00:00 btrfs-worker-3 8961 ?00:00:00 btrfs-worker-3 9138 ?00:00:00 btrfs-worker-4 What can/should I do now? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lost with degraded RAID1
Thanks, that explains something. There was indeed a BIOS-Problem (the drive vanished was disabled in BIOS suddenly and was only useable again after reactivating it in the BIOS again). So should have been a BIOS-problem. 2014-02-09 Duncan 1i5t5.dun...@cox.net: Johan Kröckel posted on Sat, 08 Feb 2014 12:09:46 +0100 as excerpted: Ok, I did nuke it now and created the fs again using 3.12 kernel. So far so good. Runs fine. Finally, I know its kind of offtopic, but can some help me interpreting this (I think this is the error in the smart-log which started the whole mess)? Error 1 occurred at disk power-on lifetime: 2576 hours (107 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 71 00 ff ff ff 0f Device Fault; Error: ABRT at LBA = 0x0fff = 268435455 I'm no SMART expert, but that LBA number is incredibly suspicious. With standard 512-byte sectors that's the 128 GiB boundary, the old 28-bit LBA limit (LBA28, introduced with ATA-1 in 1994, modern drives are LBA48, introduced in 2003 with ATA-6 and offering an addressing capacity of 128 PiB, according to wikipedia's article on LBA). It looks like something flipped back to LBA28, and when a continuing operation happened to write past that value... it triggered the abort you see in the SMART log. Double-check your BIOS to be sure it didn't somehow revert to the old LBA28 compatibility mode or some such, and the drives, to make sure they aren't clipped to LBA28 compatibility mode as well. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issue with btrfs balance
On 2014-02-10 08:41, Brendan Hide wrote: On 2014/02/10 04:33 AM, Austin S Hemmelgarn wrote: snip Apparently, trying to use -mconvert=dup or -sconvert=dup on a multi-device filesystem using one of the RAID profiles for metadata fails with a statement to look at the kernel log, which doesn't show anything at all about the failure. ^ If this is the case then it is definitely a bug. Can you provide some version info? Specifically kernel, btrfs-tools, and Distro. In this case, btrfs-progs 3.12, kernel 3.13.2, and Gentoo. snip it appears that the kernel stops you from converting to a dup profile for metadata in this case because it thinks that such a profile doesn't work on multiple devices, despite the fact that you can take a single device filesystem, and a device, and it will still work fine even without converting the metadata/system profiles. I believe dup used to work on multiple devices but the facility was removed. In the standard case it doesn't make sense to use dup with multiple devices: It uses the same amount of diskspace but is more vulnerable than the RAID1 alternative. snip Ideally, this should be changed to allow converting to dup so that when converting a multi-device filesystem to single-device, you never have to have metadata or system chunks use a single profile. This is a good use-case for having the facility. I'm thinking that, if it is brought back in, the only caveat is that appropriate warnings should be put in place to indicate that it is inappropriate. My guess on how you'd like to migrate from raid1/raid1 to single/dup, assuming sda and sdb: btrfs balance start -dconvert=single -mconvert=dup / btrfs device delete /dev/sdb / Ideally, yes. The exact command I tried to use was: btrfs balance start -dconvert=single -mconvert=dup -sconvert=dup -f -v / Trying again without the system chunk conversion also failed. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: scrub crashed?
Hello Johan, This should be a known problem. The problem seemed that scrub log file is corrupt, so i added an option -f something like: btrfs scrub start -f .. You can update latest btrfs-progs from david's latest integration branch and try it. if you don't want to do that, just rm /var/lib/btrfs/scrub* should fix your problem. Hopely it can help you.^_^ Thanks, Wang 2014-02-10 22:02 GMT+08:00 Johan Kröckel johan.kroec...@gmail.com: root@fortknox:~# uname -a Linux fortknox 3.12-0.bpo.1-amd64 #1 SMP Debian 3.12.9-1~bpo70+1 (2014-02-07) x86_64 GNU/Linux root@fortknox:~# btrfs version Btrfs v3.12 root@fortknox:~# btrfs scrub status -d /bunker scrub status for 11312131-3372-4637-b526-35a4ef0c31eb scrub device /dev/mapper/bunkerA (id 1) status scrub started at Sun Feb 9 19:12:22 2014, running for 11317 seconds total bytes scrubbed: 325.32GiB with 0 errors scrub device /dev/dm-1 (id 2) status scrub started at Sun Feb 9 19:12:22 2014, running for 11317 seconds total bytes scrubbed: 321.76GiB with 0 errors root@fortknox:~# btrfs scrub cancel /bunker ERROR: scrub cancel failed on /bunker: not running root@fortknox:~# btrfs scrub start /bunker ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /bunker'. To see the status use 'btrfs scrub status [-d] /bunker'. root@fortknox:~# ps -A|grep btrfs 3704 ?00:00:00 btrfs-genwork-1 3705 ?00:00:00 btrfs-submit-1 3706 ?00:00:00 btrfs-delalloc- 3707 ?00:00:00 btrfs-fixup-1 3708 ?00:00:02 btrfs-endio-1 3709 ?00:00:00 btrfs-endio-met 3710 ?00:00:00 btrfs-rmw-1 3711 ?00:00:00 btrfs-endio-rai 3712 ?00:00:00 btrfs-endio-met 3714 ?00:00:05 btrfs-freespace 3715 ?00:00:00 btrfs-delayed-m 3716 ?00:00:00 btrfs-cache-1 3717 ?00:00:00 btrfs-readahead 3718 ?00:00:00 btrfs-flush_del 3719 ?00:00:00 btrfs-qgroup-re 3720 ?00:00:00 btrfs-cleaner 3721 ?00:02:00 btrfs-transacti 8380 ?00:00:00 btrfs-endio-wri 8836 ?00:00:00 btrfs-worker-4 8936 ?00:00:00 btrfs-worker-3 8961 ?00:00:00 btrfs-worker-3 9138 ?00:00:00 btrfs-worker-4 What can/should I do now? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: scrub crashed?
Thank you Shilong, that was the problem. 2014-02-10 Shilong Wang wangshilong1...@gmail.com: Hello Johan, This should be a known problem. The problem seemed that scrub log file is corrupt, so i added an option -f something like: btrfs scrub start -f .. You can update latest btrfs-progs from david's latest integration branch and try it. if you don't want to do that, just rm /var/lib/btrfs/scrub* should fix your problem. Hopely it can help you.^_^ Thanks, Wang 2014-02-10 22:02 GMT+08:00 Johan Kröckel johan.kroec...@gmail.com: root@fortknox:~# uname -a Linux fortknox 3.12-0.bpo.1-amd64 #1 SMP Debian 3.12.9-1~bpo70+1 (2014-02-07) x86_64 GNU/Linux root@fortknox:~# btrfs version Btrfs v3.12 root@fortknox:~# btrfs scrub status -d /bunker scrub status for 11312131-3372-4637-b526-35a4ef0c31eb scrub device /dev/mapper/bunkerA (id 1) status scrub started at Sun Feb 9 19:12:22 2014, running for 11317 seconds total bytes scrubbed: 325.32GiB with 0 errors scrub device /dev/dm-1 (id 2) status scrub started at Sun Feb 9 19:12:22 2014, running for 11317 seconds total bytes scrubbed: 321.76GiB with 0 errors root@fortknox:~# btrfs scrub cancel /bunker ERROR: scrub cancel failed on /bunker: not running root@fortknox:~# btrfs scrub start /bunker ERROR: scrub is already running. To cancel use 'btrfs scrub cancel /bunker'. To see the status use 'btrfs scrub status [-d] /bunker'. root@fortknox:~# ps -A|grep btrfs 3704 ?00:00:00 btrfs-genwork-1 3705 ?00:00:00 btrfs-submit-1 3706 ?00:00:00 btrfs-delalloc- 3707 ?00:00:00 btrfs-fixup-1 3708 ?00:00:02 btrfs-endio-1 3709 ?00:00:00 btrfs-endio-met 3710 ?00:00:00 btrfs-rmw-1 3711 ?00:00:00 btrfs-endio-rai 3712 ?00:00:00 btrfs-endio-met 3714 ?00:00:05 btrfs-freespace 3715 ?00:00:00 btrfs-delayed-m 3716 ?00:00:00 btrfs-cache-1 3717 ?00:00:00 btrfs-readahead 3718 ?00:00:00 btrfs-flush_del 3719 ?00:00:00 btrfs-qgroup-re 3720 ?00:00:00 btrfs-cleaner 3721 ?00:02:00 btrfs-transacti 8380 ?00:00:00 btrfs-endio-wri 8836 ?00:00:00 btrfs-worker-4 8936 ?00:00:00 btrfs-worker-3 8961 ?00:00:00 btrfs-worker-3 9138 ?00:00:00 btrfs-worker-4 What can/should I do now? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: system stuck with flush-btrfs-4 at 100% after filesystem resize
As a follow-up, at some point over the weekend things did finish on their own: romulus:/vms/johnn-sles11sp3 # df -h /vms Filesystem Size Used Avail Use% Mounted on /dev/dm-4 2.6T 1.6T 1.1T 60% /vms romulus:/vms/johnn-sles11sp3 # I'd still be interested in any comments about what was going on or suggestions. Thanks, -john On 2/8/2014 10:36 AM, John Navitsky wrote: Hello, I have a large file system that has been growing. We've resized it a couple of times with the following approach: lvextend -L +800G /dev/raid/virtual_machines btrfs filesystem resize +800G /vms I think the FS started out at 200G, we increased it by 200GB a time or two, then by 800GB and everything worked fine. The filesystem hosts a number of virtual machines so the file system is in use, although the VMs individually tend not to be overly active. VMs tend to be in subvolumes, and some of those subvolumes have snapshots. This time, I increased it by another 800GB, and it it has hung for many hours (over night) with flush-btrfs-4 near 100% cpu all that time. I'm not clear at this point that it will finish or where to go from here. Any pointers would be much appreciated. Thanks, -john (newbie to BTRFS) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 2/2] Revert Btrfs: remove transaction from btrfs send
On 02/08/2014 10:46 AM, Wang Shilong wrote: From: Wang Shilong wangsl.f...@cn.fujitsu.com This reverts commit 41ce9970a8a6a362ae8df145f7a03d789e9ef9d2. Previously i was thinking we can use readonly root's commit root safely while it is not true, readonly root may be cowed with the following cases. 1.snapshot send root will cow source root. 2.balance,device operations will also cow readonly send root to relocate. So i have two ideas to make us safe to use commit root. --approach 1: make it protected by transaction and end transaction properly and we research next item from root node(see btrfs_search_slot_for_read()). --approach 2: add another counter to local root structure to sync snapshot with send. and add a global counter to sync send with exclusive device operations. So with approach 2, send can use commit root safely, because we make sure send root can not be cowed during send. Unfortunately, it make codes *ugly* and more complex to maintain. To make snapshot and send exclusively, device operations and send operation exclusively with each other is a little confusing for common users. So why not drop into previous way. Cc: Josef Bacik jba...@fb.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- Josef, if we reach agreement to adopt this approach, please revert Filipe's patch(Btrfs: make some tree searches in send.c more efficient) from btrfs-next. I agree, I'll leave Filipe's patch alone but I'll drop my search commit root patch since we don't need it any more. Do you want me to take this or do you want to resubmit without the rfc? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What to do about df and btrfs fi df
Hello, So first of all this is going to get a lot of responses, so straight away I'm only going to consider your opinion if I recognize your name and think you are a sane person. This basically means any big contributors and we'll make sanity exceptions for cwillu. These are just broad strokes, let us not get bogged down in the details, I just want to come to a consensus on how things _generally_ should be portrayed to the user. We can worry about implementation details once we agree on the direction we want to go. We all know space is a loaded question with btrfs, so I'm just going to explain the reasoning of why we chose what we chose originally and then offer the direction we should go in. If you agree say yay, if not please provide a very concise alternative suggestion with a very short explanation of why it is better than I'm suggesting. I'm not looking to spend a whole lot of time this problem. Also this isn't going to address b_avail, cause frankly that is some fucking voodoo right there, suffice it to say we will just adjust b_avail based on how we should represent total and used. = ye olde df = I don't remember what we did originally, but IIRC we would only show used space from the block groups and would show the entire size of the fs. So for example with two 1 tb drives in RAID1 you'd see ENOSPC and look at df and it would show total of 2TB and used at 1TB. Obviously not good, so we switched to the mechanism we have today, which is you see 2TB for total, you see 2TB for used and you see 0 for available. We just scaled up the used and available based on your raid multiplier. = btrfs fi df = I made this for me because of ENOSPC issues but of course it's also really useful for users. It is just a dump of the block group information and their flags, so really just spits out bytes_used and total_bytes and flags. Because at the block_group/space_info level in btrfs we don't care about how much actual space is taken up this number is not adjusted for RAID values, and these numbers are reflected in the tools output. So if you have RAID1 you need to mentally multiply the Total and Used values by 2 because that is how much actual space is being used. = What to do moving forward = Flip what both of these do. Do not multiply for normal df, and multiply for btrfs fi df. = New and improved df = Since this is the lowest common denominator we should just spit out how much space is used based on the block groups and then divide the remaining space that hasn't been allocated yet by the raid multiplier. This is going to be kind of tricky once we do per-subvolume RAID levels, but this falls under the b_avail voodoo which is just a guess anyway, so for this we will probably take the biggest multiplier and use that to show how much available space you have. This way with RAID1 it shows you have 1tb of total space and you've used 1tb of space. = New and improved btrfs fi df = Since people using this tool are already going to be better informed and since we are already given the block group flags we can go ahead and do the raid multiplier in btrfs-progs and spit out the adjusted numbers rather than the raw numbers we get from the ioctl. This will just be a progs thing and that way we can possibly add an option to not apply the multipliers and just get the raw output. = Conclusion = Let me know if this is acceptable to everybody. Remember this is just broad strokes, keep your responses short and simple or I simply won't read them. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: system stuck with flush-btrfs-4 at 100% after filesystem resize
On 02/08/2014 01:36 PM, John Navitsky wrote: Hello, I have a large file system that has been growing. We've resized it a couple of times with the following approach: lvextend -L +800G /dev/raid/virtual_machines btrfs filesystem resize +800G /vms I think the FS started out at 200G, we increased it by 200GB a time or two, then by 800GB and everything worked fine. The filesystem hosts a number of virtual machines so the file system is in use, although the VMs individually tend not to be overly active. VMs tend to be in subvolumes, and some of those subvolumes have snapshots. This time, I increased it by another 800GB, and it it has hung for many hours (over night) with flush-btrfs-4 near 100% cpu all that time. I'm not clear at this point that it will finish or where to go from here. Any pointers would be much appreciated. Thanks, -john (newbie to BTRFS) procedure log -- romulus:/home/users/johnn # lvextend -L +800G /dev/raid/virtual_machines romulus:/home/users/johnn # btrfs filesystem resize +800G /vms Resize '/vms' of '+800G' [hangs] top - 12:21:53 up 136 days, 2:45, 13 users, load average: 30.39, 30.37, 30.37 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.4 us, 2.3 sy, 0.0 ni, 95.1 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem:129147 total, 127427 used, 1720 free, 264 buffers MiB Swap: 262143 total, 661 used, 261482 free,93666 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 48809 root 20 0 000 R 99.3 0.0 1449:14 flush-btrfs-4 --- misc info --- romulus:/home/users/johnn # cat /etc/SuSE-release openSUSE 12.3 (x86_64) VERSION = 12.3 CODENAME = Dartmouth romulus:/home/users/johnn # uname -a Linux romulus.us.redacted.com 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May 31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux romulus:/home/users/johnn # Found your problem! Basically if you are going to run btrfs you should at the very least keep up with the stable kernels. 3.11.whatever is fine, 3.12.whatever is better. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: system stuck with flush-btrfs-4 at 100% after filesystem resize
On 2/10/2014 8:43 AM, Josef Bacik wrote: On 02/08/2014 01:36 PM, John Navitsky wrote: romulus:/home/users/johnn # cat /etc/SuSE-release openSUSE 12.3 (x86_64) VERSION = 12.3 CODENAME = Dartmouth romulus:/home/users/johnn # uname -a Linux romulus.us.redacted.com 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May 31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux romulus:/home/users/johnn # Found your problem! Basically if you are going to run btrfs you should at the very least keep up with the stable kernels. 3.11.whatever is fine, 3.12.whatever is better. Thanks, Josef Thanks for the feedback. -john -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4][RFC] btrfs: export global block reserve size as space_info
On 02/07/2014 08:34 AM, David Sterba wrote: Introduce a block group type bit for a global reserve and fill the space info for SPACE_INFO ioctl. This should replace the newly added ioctl (01e219e8069516cdb98594d417b8bb8d906ed30d) to get just the 'size' part of the global reserve, while the actual usage can be now visible in the 'btrfs fi df' output during ENOSPC stress. The unpatched userspace tools will show the blockgroup as 'unknown'. This wasn't in my rc2 pull because I wanted to sync up with Jeff on it. I like the idea of combining this into SPACE_INFO, any objections? -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
tl;dr: Yes to proposed df changes. Keep btrfs fi df as-is. On Mon, Feb 10, 2014 at 11:41:51AM -0500, Josef Bacik wrote: [snip] = What to do moving forward = Flip what both of these do. Do not multiply for normal df, and multiply for btrfs fi df. = New and improved df = Since this is the lowest common denominator we should just spit out how much space is used based on the block groups and then divide the remaining space that hasn't been allocated yet by the raid multiplier. This is going to be kind of tricky once we do per-subvolume RAID levels, but this falls under the b_avail voodoo which is just a guess anyway, so for this we will probably take the biggest multiplier and use that to show how much available space you have. Biggest multiplier leads to the pessimistic estimate, which is what I'd prefer to see here, so that's good. Agree with this. This way with RAID1 it shows you have 1tb of total space and you've used 1tb of space. = New and improved btrfs fi df = Since people using this tool are already going to be better informed and since we are already given the block group flags we can go ahead and do the raid multiplier in btrfs-progs and spit out the adjusted numbers rather than the raw numbers we get from the ioctl. This will just be a progs thing and that way we can possibly add an option to not apply the multipliers and just get the raw output. Keep this unchanged, IMO. (a) I quite like the non-multiplied version as it is, as it gives you the quantities of real, actual data stored -- the value you generally care about anyway (how much stuff do I have on here?). (b) Using the non-multiplied version here as well as above would then give gasp comparable values for btrfs fi df and Plain Old df. Less confusion all round, I think. (c) The difficulty with using multiplied values is the behaviour of parity RAID on filesystems with different sized devices: there isn't a single multiplier that will give an accurate answer at all. (Detailed arguments available on application ;) ) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Can I offer you anything? Tea? Seedcake? --- Glass of Amontillado? signature.asc Description: Digital signature
Re: What to do about df and btrfs fi df
I concur. The regular df data used number should be the amount of space required to hold a backup of that content (assuming that the backup maintains reflinks and compression and so forth). There's no good answer for available space; the statfs syscall isn't rich enough to cover all the bases even in the face of dup metadata and single data (i.e., the common case), and a truly conservative estimate (report based on the highest-usage raid level in use) would report space/2 on that same common case. Highest-usage data raid level in use is probably the best compromise, with a big warning that that many large numbers of small files will not actually fit, posted in some mythical place that users look. I would like to see the information from btrfs fi df and btrfs fi show summarized somewhere (ideally as a new btrfs fi df output), as both sets of numbers are really necessary, or at least have btrfs fi df include the amount of space not allocated to a block group. Re regular df: are we adding space allocated to a block group (raid1, say) but not in actual use in a file as the N/2 space available in the block group, or the N space it takes up on disk? This probably matters a bit less than it used to, but if it's N/2, that leaves us open to empty filesystem, 100GB free, write a 80GB file and then delete it, wtf, only 60GB free now? reporting issues. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
On 02/10/2014 01:24 PM, cwillu wrote: I concur. The regular df data used number should be the amount of space required to hold a backup of that content (assuming that the backup maintains reflinks and compression and so forth). There's no good answer for available space; the statfs syscall isn't rich enough to cover all the bases even in the face of dup metadata and single data (i.e., the common case), and a truly conservative estimate (report based on the highest-usage raid level in use) would report space/2 on that same common case. Highest-usage data raid level in use is probably the best compromise, with a big warning that that many large numbers of small files will not actually fit, posted in some mythical place that users look. I would like to see the information from btrfs fi df and btrfs fi show summarized somewhere (ideally as a new btrfs fi df output), as both sets of numbers are really necessary, or at least have btrfs fi df include the amount of space not allocated to a block group. Re regular df: are we adding space allocated to a block group (raid1, say) but not in actual use in a file as the N/2 space available in the block group, or the N space it takes up on disk? This probably matters a bit less than it used to, but if it's N/2, that leaves us open to empty filesystem, 100GB free, write a 80GB file and then delete it, wtf, only 60GB free now? reporting issues. The only case we add the actual allocated chunk space is for metadata, for data we only add the actual used number. So say say you write 80gb file and then delete it but during the writing we allocated a 1 gig chunk for metadata you'll see only 99gb free, make sense? We could (should?) roll this into the b_avail magic and make used really only reflect data usage, opinions on this? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
IMO, used should definitely include metadata, especially given that we inline small files. I can convince myself both that this implies that we should roll it into b_avail, and that we should go the other way and only report the actual used number for metadata as well, so I might just plead insanity here. On Mon, Feb 10, 2014 at 12:28 PM, Josef Bacik jba...@fb.com wrote: On 02/10/2014 01:24 PM, cwillu wrote: I concur. The regular df data used number should be the amount of space required to hold a backup of that content (assuming that the backup maintains reflinks and compression and so forth). There's no good answer for available space; the statfs syscall isn't rich enough to cover all the bases even in the face of dup metadata and single data (i.e., the common case), and a truly conservative estimate (report based on the highest-usage raid level in use) would report space/2 on that same common case. Highest-usage data raid level in use is probably the best compromise, with a big warning that that many large numbers of small files will not actually fit, posted in some mythical place that users look. I would like to see the information from btrfs fi df and btrfs fi show summarized somewhere (ideally as a new btrfs fi df output), as both sets of numbers are really necessary, or at least have btrfs fi df include the amount of space not allocated to a block group. Re regular df: are we adding space allocated to a block group (raid1, say) but not in actual use in a file as the N/2 space available in the block group, or the N space it takes up on disk? This probably matters a bit less than it used to, but if it's N/2, that leaves us open to empty filesystem, 100GB free, write a 80GB file and then delete it, wtf, only 60GB free now? reporting issues. The only case we add the actual allocated chunk space is for metadata, for data we only add the actual used number. So say say you write 80gb file and then delete it but during the writing we allocated a 1 gig chunk for metadata you'll see only 99gb free, make sense? We could (should?) roll this into the b_avail magic and make used really only reflect data usage, opinions on this? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
On 02/10/2014 01:36 PM, cwillu wrote: IMO, used should definitely include metadata, especially given that we inline small files. I can convince myself both that this implies that we should roll it into b_avail, and that we should go the other way and only report the actual used number for metadata as well, so I might just plead insanity here. I could be convinced to do this. So we have total: (total disk bytes) / (raid multiplier) used: (total used in data block groups) + (total used in metadata block groups) avail: total - (total used in data block groups + total metadata block groups) That seems like the simplest to code up. Then we can argue about whether to use the total metadata size or just the used metadata size for b_avail. Seem reasonable? Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
IMO, used should definitely include metadata, especially given that we inline small files. I can convince myself both that this implies that we should roll it into b_avail, and that we should go the other way and only report the actual used number for metadata as well, so I might just plead insanity here. I could be convinced to do this. So we have total: (total disk bytes) / (raid multiplier) used: (total used in data block groups) + (total used in metadata block groups) avail: total - (total used in data block groups + total metadata block groups) That seems like the simplest to code up. Then we can argue about whether to use the total metadata size or just the used metadata size for b_avail. Seem reasonable? I can't think of any situations where this results in tears. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
On Mon, Feb 10, 2014 at 01:41:23PM -0500, Josef Bacik wrote: On 02/10/2014 01:36 PM, cwillu wrote: IMO, used should definitely include metadata, especially given that we inline small files. I can convince myself both that this implies that we should roll it into b_avail, and that we should go the other way and only report the actual used number for metadata as well, so I might just plead insanity here. I could be convinced to do this. So we have total: (total disk bytes) / (raid multiplier) used: (total used in data block groups) + (total used in metadata block groups) avail: total - (total used in data block groups + total metadata block groups) That seems like the simplest to code up. Then we can argue about whether to use the total metadata size or just the used metadata size for b_avail. Seem reasonable? My vote on that bikeshed: total metadata size. But I'll accept any other answer. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Well, you don't get to be a kernel hacker simply by looking --- good in Speedos. -- Rusty Russell signature.asc Description: Digital signature
Re: [PATCH] xfstests: btrfs/004: fix to make test really work
On 02/10/2014 07:10 AM, Wang Shilong wrote: From: Wang Shilong wangsl.f...@cn.fujitsu.com So i was wandering why test 004 could pass my previous wrong kernel patch while it defenitely should not. By some debugging, i found here perl script is wrong, we did not filter out anything and this unit test did not work acutally.so it came out we will never fail this test. So now with this patch I'm failing it, is there some btrfs patch I need to make it not fail or is it still not supposed to fail normally and is this patch broken? Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error: could not do orphan cleanup -22
On Monday 10 February 2014 00:20:54 you wrote: There was a similar discussion about an error in January 2013 but it related to some kernel panic. I don't know if I encountered the same thing. These errors from system journal bother me: 2月 09 22:18:53 melforce kernel: BTRFS error (device sdb3): Error removing orphan entry, stopping orphan cleanup 2月 09 22:18:53 melforce kernel: BTRFS critical (device sdb3): could not do orphan cleanup -22 I run kernel 3.12.10. Some update. I tested with kernel 3.13.2 and still have the problem. Also, I don't have the errors in kernel log anymore but now I can't delete my snapshots! melforce mnt # ls btr2 home vap-snap1 vap-snap2 var melforce mnt # btrfs sub list btr2 ID 257 gen 6294 top level 5 path home ID 258 gen 6294 top level 5 path var ID 939 gen 6153 top level 5 path vap-snap1 ID 940 gen 6154 top level 5 path vap-snap2 melforce mnt # btrfs sub delete btr2/var-snap2 ERROR: error accessing 'btr2/var-snap2' My /mnt/btr2/var is also mounted on /var. There's plenty of space left on device (only ~590 GB allocated on 2.6 TB volume): # btrfs fi df btr2 Data, single: total=591.01GiB, used=590.64GiB System, DUP: total=8.00MiB, used=96.00KiB System, single: total=4.00MiB, used=0.00 Metadata, DUP: total=1.50GiB, used=989.41MiB Metadata, single: total=8.00MiB, used=0.00 # btrfs fi show Label: melforce_hdd uuid: c3f3a649-d8c3-49e1-9962-9b3ca9f54f1d Total devices 1 FS bytes used 591.61GiB devid1 size 2.61TiB used 594.04GiB path /dev/sdb3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] xfstests: Btrfs: add test for large metadata blocks
On 02/08/2014 03:30 AM, Koen De Wit wrote: Tests Btrfs filesystems with all possible metadata block sizes, by setting large extended attributes on files. Signed-off-by: Koen De Wit koen.de@oracle.com --- v1-v2: - Fix indentation: 8 spaces instead of 4 - Move _scratch_unmount to end of loop, add _check_scratch_fs - Sending failure messages of mkfs.btrfs to output instead of $seqres.full diff --git a/tests/btrfs/036 b/tests/btrfs/036 new file mode 100644 index 000..b14697d --- /dev/null +++ b/tests/btrfs/036 @@ -0,0 +1,137 @@ +#! /bin/bash +# FS QA Test No. 036 +# +# Tests large metadata blocks in btrfs, which allows large extended +# attributes. +# +#--- +# Copyright (c) 2014, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +status=1 # failure is the default! + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here + +_supported_fs btrfs +_supported_os Linux +_require_scratch +_need_to_be_root + +rm -f $seqres.full + +pagesize=`$here/src/feature -s` +pagesize_kb=`expr $pagesize / 1024` + +# Test all valid leafsizes +for leafsize in `seq $pagesize_kb $pagesize_kb 64`; do +_scratch_mkfs -l ${leafsize}K /dev/null +_scratch_mount +# Calculate the size of the extended attribute value, leaving +# 512 bytes for other metadata. +xattr_size=`expr $leafsize \* 1024 - 512` + +touch $SCRATCH_MNT/emptyfile +# smallfile will be inlined, bigfile not. +$XFS_IO_PROG -f -c pwrite 0 100 $SCRATCH_MNT/smallfile \ +/dev/null +$XFS_IO_PROG -f -c pwrite 0 9000 $SCRATCH_MNT/bigfile \ +/dev/null +ln -s $SCRATCH_MNT/bigfile $SCRATCH_MNT/bigfile_softlink + +files=(emptyfile smallfile bigfile bigfile_softlink) +chars=(a b c d) +for i in `seq 0 1 3`; do +char=${chars[$i]} +file=$SCRATCH_MNT/${files[$i]} +lnkfile=${file}_hardlink +ln $file $lnkfile +xattr_value=`head -c $xattr_size /dev/zero \ +| tr '\0' $char` + +set_md5=`echo -n $xattr_value | md5sum` +${ATTR_PROG} -Lq -s attr_$char -V $xattr_value $file +get_md5=`${ATTR_PROG} -Lq -g attr_$char $file | md5sum` +get_ln_md5=`${ATTR_PROG} -Lq -g attr_$char $lnkfile \ +| md5sum` + +# Using md5sums for comparison instead of the values +# themselves because bash command lines cannot be larger +# than 64K chars. +if [ $set_md5 != $get_md5 ]; then +echo -n Got unexpected xattr value for +echo -n attr_$char from file ${file}. +echo (leafsize is ${leafsize}K) +fi +if [ $set_md5 != $get_ln_md5 ]; then +echo -n Value for attr_$char differs for +echo -n $file and ${lnkfile}. +echo (leafsize is ${leafsize}K) +fi +done + +# Test attributes with a size larger than the leafsize. +# Should result in an error. +if [ $leafsize -lt 64 ]; then +# Bash command lines cannot be larger than 64K +# characters, so we do not test attribute values +# with a size 64KB. +xattr_size=`expr $leafsize \* 1024 + 512` +xattr_value=`head -c $xattr_size /dev/zero | tr '\0' x` +${ATTR_PROG} -q -s attr_toobig -V $xattr_value \ +$SCRATCH_MNT/emptyfile $seqres.full 21 +if [ $? -eq 0 ]; then +echo -n Expected error, xattr_size is bigger +echo than ${leafsize}K +fi +fi + +_scratch_unmount /dev/null 21 +_check_scratch_fs +done + +_scratch_mount + +# Illegal attribute name (more than 256 characters) +attr_name=`head -c 260 /dev/zero | tr
Re: Error: could not do orphan cleanup -22
Some more update. I checked the FS with btrfsck: checking extents ref mismatch on [17018880 8192] extent item 1, found 2 Incorrect local backref count on 17018880 root 258 owner 826 offset 0 found 2 wanted 1 back 0x961b268 backpointer mismatch on [17018880 8192] ref mismatch on [17027072 8192] extent item 1, found 2 Incorrect local backref count on 17027072 root 258 owner 827 offset 0 found 2 wanted 1 back 0x93ce988 backpointer mismatch on [17027072 8192] Errors found in extent allocation tree or chunk allocation checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/sdb3 UUID: c3f3a649-d8c3-49e1-9962-9b3ca9f54f1d free space inode generation (0) did not match free space cache generation (6151) free space inode generation (0) did not match free space cache generation (6151) found 226798495277 bytes used err is 0 total csum bytes: 619125860 total tree bytes: 1041760256 total fs tree bytes: 270155776 total extent tree bytes: 22577152 btree space waste bytes: 164080420 file data blocks allocated: 1102635630592 referenced 634893860864 Btrfs v3.12 Is it ok to try repairing it? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error: could not do orphan cleanup -22
On 02/10/2014 03:53 PM, Pavel Volkov wrote: Some more update. I checked the FS with btrfsck: Build a kernel with this patch applied http://ur1.ca/glslj and re-run the mount and when it fails attach dmesg to this email. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] xfstests/btrfs: add a regression test for running snapshot and send concurrently
On 02/07/2014 09:00 AM, Wang Shilong wrote: From: Wang Shilong wangsl.f...@cn.fujitsu.com Btrfs would fail to send if snapshot run concurrently, this test is to make sure we have fixed the bug. Looks reasonable, ran it with and without the patch and it did as expected. Reviewed-by: Josef Bacik jba...@fb.com Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: add test for btrfs data corruption when using compression
On 02/08/2014 10:50 AM, Filipe David Borba Manana wrote: Test for a btrfs data corruption when using compressed files/extents. Under certain cases, it was possible for reads to return random data (content from a previously used page) instead of zeroes. This also caused partial updates to those regions that were supposed to be filled with zeroes to save random (and invalid) data into the file extents. This is fixed by the commit for the linux kernel titled: Btrfs: fix data corruption when reading/updating compressed extents (https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/3610391/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=JRaF%2BUY%2F2k%2BBfF9nTx3Iwl5JZWNCwew%2BI%2Fw%2B%2BfuDrgc%3D%0As=4a033ea8f3cf1f28794e90fcf16ea553766bb1ea83e10fc904182a8f56435eef) Ran with and without the corresponding fix and all worked as expected. You can add Reviewed-by: Josef Bacik jba...@fb.com Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs/004: fix to make test really work
On Mon, Feb 10, 2014 at 08:10:56PM +0800, Wang Shilong wrote: From: Wang Shilong wangsl.f...@cn.fujitsu.com So i was wandering why test 004 could pass my previous wrong kernel patch while it defenitely should not. By some debugging, i found here perl script is wrong, we did not filter out anything and this unit test did not work acutally.so it came out we will never fail this test. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- tests/btrfs/004 | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) mode change 100755 = 100644 tests/btrfs/004 diff --git a/tests/btrfs/004 b/tests/btrfs/004 old mode 100755 new mode 100644 index 14da9f1..17a6e34 --- a/tests/btrfs/004 +++ b/tests/btrfs/004 @@ -57,10 +57,9 @@ _require_command /usr/sbin/filefrag rm -f $seqres.full -FILEFRAG_FILTER='if (/, blocksize (\d+)/) {$blocksize = $1; next} ($ext, '\ -'$logical, $physical, $expected, $length, $flags) = (/^\s*(\d+)\s+(\d+)'\ -'\s+(\d+)\s+(?:(\d+)\s+)?(\d+)\s+(.*)/) or next; $flags =~ '\ -'/(?:^|,)inline(?:,|$)/ and next; print $physical * $blocksize, #, '\ +FILEFRAG_FILTER='if (/blocks of (\d+) bytes/) {$blocksize = $1; next} ($ext, '\ +'$logical, $physical, $length) = (/^\s*(\d+):\s+(\d+)..\s+\d+:'\ +'\s+(\d+)..\s+\d+:\s+(\d+):/) or next; print $physical * $blocksize, #, '\ '$length * $blocksize, #, $logical * $blocksize, ' Oh, boy, who allowed that mess to pass review? Please format this in a readable manner while you are changing it. FILEFRAG_FILTER=' if (/blocks of (\d+) bytes/) { \ $blocksize = $1;\ next; \ } . Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] xfstests: Btrfs: add test for large metadata blocks
Tests Btrfs filesystems with all possible metadata block sizes, by setting large extended attributes on files. Signed-off-by: Koen De Wit koen.de@oracle.com --- v1-v2: - Fix indentation: 8 spaces instead of 4 - Move _scratch_unmount to end of loop, add _check_scratch_fs - Sending failure messages of mkfs.btrfs to output instead of $seqres.full v2-v3: - Sending the md5sums of the retrieved attribute values to the output instead of comparing them to the md5sum of the original value - Always testing attribute values of 4, 8, 12, ... up to 64 KB regardless of the pagesize, to make the golden output independent of the pagesize - Sending the output of mkfs.btrfs with illegal leafsize to $seqres.full and checking the return code - Using more uniform variable names: pagesize/pagesize_kb, leafsize/ leafsize_kb, attrsize/attrsize_kb diff --git a/tests/btrfs/036 b/tests/btrfs/036 new file mode 100644 index 000..fb3e987 --- /dev/null +++ b/tests/btrfs/036 @@ -0,0 +1,125 @@ +#! /bin/bash +# FS QA Test No. 036 +# +# Tests large metadata blocks in btrfs, which allows large extended +# attributes. +# +#--- +# Copyright (c) 2014, Oracle and/or its affiliates. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq + +here=`pwd` +status=1 # failure is the default! + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here + +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_math +_need_to_be_root + +rm -f $seqres.full + +pagesize=`$here/src/feature -s` +pagesize_kb=`expr $pagesize / 1024` + +# Test all valid leafsizes +for attrsize_kb in `seq 4 4 64`; do +# The leafsize should be a multiple of the pagesize, equal to or +# greater than the attribute size. +leafsize_kb=$(_math ($attrsize_kb + $pagesize_kb - 1) / \ +$pagesize_kb * $pagesize_kb); +echo Testing with attrsize ${attrsize_kb}K : + +_scratch_mkfs -l ${leafsize_kb}K /dev/null +_scratch_mount +# Calculate the size of the extended attribute value, leaving +# 512 bytes for other metadata. +attrsize=`expr $attrsize_kb \* 1024 - 512` + +touch $SCRATCH_MNT/emptyfile +# smallfile will be inlined, bigfile not. +$XFS_IO_PROG -f -c pwrite 0 100 $SCRATCH_MNT/smallfile \ +/dev/null +$XFS_IO_PROG -f -c pwrite 0 9000 $SCRATCH_MNT/bigfile \ +/dev/null +ln -s $SCRATCH_MNT/bigfile $SCRATCH_MNT/bigfile_softlink + +files=(emptyfile smallfile bigfile bigfile_softlink) +chars=(a b c d) +for i in `seq 0 1 3`; do +char=${chars[$i]} +file=$SCRATCH_MNT/${files[$i]} +lnkfile=${file}_hardlink +ln $file $lnkfile +xattr_value=`head -c $attrsize /dev/zero \ +| tr '\0' $char` + +echo -n $xattr_value | md5sum +${ATTR_PROG} -Lq -s attr_$char -V $xattr_value $file +${ATTR_PROG} -Lq -g attr_$char $file | md5sum +${ATTR_PROG} -Lq -g attr_$char $lnkfile | md5sum +done + +# Test attributes with a size larger than the leafsize. +# Should result in an error. +if [ $leafsize_kb -lt 64 ]; then +# Bash command lines cannot be larger than 64K +# characters, so we do not test attribute values +# with a size 64KB. +attrsize=`expr $attrsize_kb \* 1024 + 512` +xattr_value=`head -c $attrsize /dev/zero | tr '\0' x` +${ATTR_PROG} -q -s attr_toobig -V $xattr_value \ +$SCRATCH_MNT/emptyfile 21 | _filter_scratch +fi + +_scratch_unmount /dev/null 21 +_check_scratch_fs +done + +_scratch_mount + +# Illegal attribute name (more than 256 characters) +attr_name=`head -c 260 /dev/zero | tr '\0' n` +${ATTR_PROG} -s $attr_name -V attribute_name_too_big \ +$SCRATCH_MNT/emptyfile 21 | head -n 1 + +_scratch_unmount +
Re: [PATCH] xfstests: Btrfs: add test for large metadata blocks
On 02/10/2014 12:02 AM, Dave Chinner wrote: On Sat, Feb 08, 2014 at 09:30:51AM +0100, Koen De Wit wrote: On 02/07/2014 11:49 PM, Dave Chinner wrote: On Fri, Feb 07, 2014 at 06:14:45PM +0100, Koen De Wit wrote: echo -n $xattr_value | md5sum ${ATTR_PROG} -Lq -s attr_$char -V $xattr_value $file ${ATTR_PROG} -Lq -g attr_$char $file | md5sum ${ATTR_PROG} -Lq -g attr_$char $lnkfile | md5sum is all that neds to be done here. The problem with this is that the length of the output will depend on the page size. The code above runs for every valid leafsize, which can be any multiple of the page size up to 64KB, as defined in the loop initialization: for leafsize in `seq $pagesize_kb $pagesize_kb 64`; do That's only a limit on the mkfs leafsize parameter, yes? An the limiation is that the leaf size can't be smaller than page size? So really, the attribute sizes that are being tested are independent of the mkfs parameters being tested. i.e: for attrsize in `seq 4 4 64`; do if [ $attrsize -lt $pagesize ]; then leafsize=$pagesize else leafsize=$attrsize fi $BTRFS_MKFS_PROG -l $leafsize $SCRATCH_DEV And now the test executes a fixed loop, testing the same attribute sizes on all the filesystems under test. i.e. the attribute sizes being tested are *independent* of the mkfs parameters being tested. Always test the same attribute sizes, the mkfs parameters simply vary by page size. OK, thanks for the suggestion! I implemented it like this in v3, I just changed the calculation of the leafsize because it must be a multiple of the pagesize. (A leafsize of 12KB is not valid for systems with 8KB pages.) +_scratch_unmount + +# Some illegal leafsizes + +_scratch_mkfs -l 0 2 $seqres.full +echo $? Same again - you are dumping the error output into a different file, then detecting the error manually. pass the output of _scratch_mkfs through a filter, and let errors cause golden output mismatches. I did this to make the golden output not depend on the output of mkfs.btrfs, inspired by http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=commit;h=fd7a8e885732475c17488e28b569ac1530c8eb59 and http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=commit;h=78d86b996c9c431542fdbac11fa08764b16ceb7d However, in my opinion the test should simply be updated if the output of mkfs.btrfs changes, so I agree with you and I fixed this in v2. While I agree with the sentiment, I'm questioning the implementation. i.e. you've done this differently to every other test that needs to check for failures. run_check woul dbe just fine, as would be simply filtering the output of mkfs. run_check will make the test fail if the return code differs from 0, and Josef brought up an example scenario (MKFS_OPTIONS=-O skinny-metadata) where mkfs.btrfs produces additional output. In v3, I implemented the failure check similar to btrfs/022: _scratch_mkfs -l $1 $seqres.full 21 [ $? -ne 0 ] || _fail '$1' is an illegal value for the \ leafsize option, mkfs should have failed. Is this the right way? Thanks, Koen. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error: could not do orphan cleanup -22
On Monday 10 February 2014 16:13:40 Josef Bacik wrote: Build a kernel with this patch applied http://ur1.ca/glslj and re-run the mount and when it fails attach dmesg to this email. Thanks, I don't see these new messages nor the previous -22 messages in dmesg now. Only the access problem: melforce mnt # ls btr2 home vap-snap1 vap-snap2 var melforce mnt # ls btr2/var-snap1 ls: cannot access btr2/var-snap1: No such file or directory melforce mnt # ls btr2/var cache db empty lib lock log mail nmbd run spool src tmp www -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] xfstests: Btrfs: add test for large metadata blocks
On Mon, Feb 10, 2014 at 10:39:22PM +0100, Koen De Wit wrote: Tests Btrfs filesystems with all possible metadata block sizes, by setting large extended attributes on files. Signed-off-by: Koen De Wit koen.de@oracle.com + +_test_illegal_leafsize() { +_scratch_mkfs -l $1 $seqres.full 21 +[ $? -ne 0 ] || _fail '$1' is an illegal value for the \ +leafsize option, mkfs should have failed. +} You just re-implemented run_check Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
On 02/10/2014 06:06 PM, Hugo Mills wrote: Biggest multiplier leads to the pessimistic estimate, which is what I'd prefer to see here, so that's good. Agree with this. I would prefer to use as raid multiplier the ratio total data block groups + total metadata block group -- disk space allocated for data and metdata block group I hope that this would work better when we have a filesystem composed by small (inlined) files or when we will have per-subvolume RAID levels. -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
On 02/10/2014 05:41 PM, Josef Bacik wrote: = New and improved btrfs fi df = Since people using this tool are already going to be better informed and since we are already given the block group flags we can go ahead and do the raid multiplier in btrfs-progs and spit out the adjusted numbers rather than the raw numbers we get from the ioctl. This will just be a progs thing and that way we can possibly add an option to not apply the multipliers and just get the raw output. In the past [1] I proposed the following approach. $ sudo btrfs filesystem df /mnt/btrfs1/ Disk size: 400.00GB Disk allocated:8.04GB Disk unallocated:391.97GB Used: 11.29MB Free (Estimated):250.45GB (Max: 396.99GB, min: 201.00GB) Data to disk ratio: 63 % The space was given in terms of disk space and in terms of filesystem space. Other that there is an indication of an estimation of the free space, with the pessimistic and optimistic values. [1] See [PATCH V3][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support dated 03/10/2013 01:17 PM -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
In the past [1] I proposed the following approach. $ sudo btrfs filesystem df /mnt/btrfs1/ Disk size: 400.00GB Disk allocated:8.04GB Disk unallocated:391.97GB Used: 11.29MB Free (Estimated):250.45GB (Max: 396.99GB, min: 201.00GB) Data to disk ratio: 63 % Note that a big chunk of the problem is what do we do with the regular system df output. I don't mind this as a btrfs fi df summary though. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/02/14 10:24, cwillu wrote: The regular df data used number should be the amount of space required to hold a backup of that content (assuming that the backup maintains reflinks and compression and so forth). There's no good answer for available space; I think the flipside of the above works well. How large a group of files can you expect to create before you will get ENOSPC? That for example is the check code does that looks at df - I need to put in XGB of files - will it fit? It is also what users do. This is also what NTFS under Windows does with compression. If it says you have 5GB of space left then you will be able to put in 5GB of uncompressible files. Of course if they are compressible then you don't end up consuming all the free space. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) iEYEARECAAYFAlL5dqcACgkQmOOfHg372QQBzgCgyrvj+WnZevjEDdgbAFd2nHaD H98AoK0ZSDwZJpSMIdXpGYZGjWuPpGTh =xJ+X -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs/004: fix to make test really work
Hi Josef, On 02/11/2014 03:18 AM, Josef Bacik wrote: On 02/10/2014 07:10 AM, Wang Shilong wrote: From: Wang Shilong wangsl.f...@cn.fujitsu.com So i was wandering why test 004 could pass my previous wrong kernel patch while it defenitely should not. By some debugging, i found here perl script is wrong, we did not filter out anything and this unit test did not work acutally.so it came out we will never fail this test. So now with this patch I'm failing it, is there some btrfs patch I need to make it not fail or is it still not supposed to fail normally and is this patch broken? Thanks, You should not have updated my previous patch(Btrfs: switch to btrfs_previous_extent_item()) when you fail this test. I update your latest btrfs-next which has updated my previous patch and it can pass this case, did you miss that? Thanks, Wang Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs/004: fix to make test really work
On 02/11/2014 05:39 AM, Dave Chinner wrote: On Mon, Feb 10, 2014 at 08:10:56PM +0800, Wang Shilong wrote: From: Wang Shilong wangsl.f...@cn.fujitsu.com So i was wandering why test 004 could pass my previous wrong kernel patch while it defenitely should not. By some debugging, i found here perl script is wrong, we did not filter out anything and this unit test did not work acutally.so it came out we will never fail this test. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- tests/btrfs/004 | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) mode change 100755 = 100644 tests/btrfs/004 diff --git a/tests/btrfs/004 b/tests/btrfs/004 old mode 100755 new mode 100644 index 14da9f1..17a6e34 --- a/tests/btrfs/004 +++ b/tests/btrfs/004 @@ -57,10 +57,9 @@ _require_command /usr/sbin/filefrag rm -f $seqres.full -FILEFRAG_FILTER='if (/, blocksize (\d+)/) {$blocksize = $1; next} ($ext, '\ -'$logical, $physical, $expected, $length, $flags) = (/^\s*(\d+)\s+(\d+)'\ -'\s+(\d+)\s+(?:(\d+)\s+)?(\d+)\s+(.*)/) or next; $flags =~ '\ -'/(?:^|,)inline(?:,|$)/ and next; print $physical * $blocksize, #, '\ +FILEFRAG_FILTER='if (/blocks of (\d+) bytes/) {$blocksize = $1; next} ($ext, '\ +'$logical, $physical, $length) = (/^\s*(\d+):\s+(\d+)..\s+\d+:'\ +'\s+(\d+)..\s+\d+:\s+(\d+):/) or next; print $physical * $blocksize, #, '\ '$length * $blocksize, #, $logical * $blocksize, ' Oh, boy, who allowed that mess to pass review? Please format this in a readable manner while you are changing it. Yeah, i was thinking to make it more readable while i had sent this out.^_^ Thanks for your comments. Wang FILEFRAG_FILTER=' if (/blocks of (\d+) bytes/) { \ $blocksize = $1;\ next; \ } . Cheers, Dave. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck does not fix
On Feb 9, 2014, at 1:36 AM, Hendrik Friedel hend...@friedels.name wrote: Yes, but I can create that space. So, for me the next steps would be to: -generate enough room on the filesystem -btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/BTRFS/Video -btrfs device delete /dev/sdc1 /mnt/BTRFS/Video Right? No. You said you need to recreate the file system, and only have these two devices and therefore must remove one device. You can't achieve that with raid1 which requires minimum two devices. -dconvert=single -mconvert=dup -sconvert=dup next, I'm doing the balance for the subvolume /mnt/BTRFS/backups You told us above you deleted that subvolume. So how are you balancing it? Yes, that was my understanding from my research: You tell btrfs, that you want to remove one disc from the filesystem and then balance it to move the data on the remaining disc. I did find this logical. I was expecting that I possibly need a further command to tell btrfs that it's not a raid anymore, but I thought this could also be automagical. I understand, that's not the way it is implemented, but it's not a crazy idea, is it? Well it's not the right way to think that devices are raid1 or raid0. It's the data or metadata that has that attribute. And by removing a device you are managing devices, not the attribute of data or metadata chunks. Since you're already at the minimum number of disks for raid0, that's why conversion is needed first. And also, balance applies to a mountpoint, and even if you mount a subvolume to that mountpoint, the whole file system is balanced. Not just the mounted subvolume. That is confusing. (I mean: I understand what you are saying, but it's counterintuitive). Why is this the case? A subvolume is a file system tree. The data created in that tree is allocated to chunks which can contain data from other trees. And balance reads/writes chunks. It's not a subvolume aware command. In parallel, I try to delete /mnt/BTRFS/rsnapshot, but it fails: btrfs subvolume delete /mnt/BTRFS/rsnapshot/ Delete subvolume '/mnt/BTRFS/rsnapshot' ERROR: cannot delete '/mnt/BTRFS/rsnapshot' - Inappropriate ioctl for device Why's that? But even more: How do I free sdc1 now?! Well I'm pretty confused because again, I can't tell if your paths refer to subvolumes or if they refer to mount points. Now I am confused. These paths are the paths to which I mounted the subvolumes: my (abbreviated) fstab: UUID=xy /mnt/BTRFS/Video btrfs subvol=Video UUID=xy /mnt/BTRFS/rsnapshot btrfs subvol=rsnapshot UUID=xy /mnt/BTRFS/backups btrfs subvol=backups The balance and device delete commands all refer to a mount point, which is the path returned by the df command. So this: /dev/sdb1 5,5T3,5T 2,0T 64% /mnt/BTRFS/Video /dev/sdb1 5,5T3,5T 2,0T 64% /mnt/BTRFS/backups /dev/sdc1 5,5T3,5T 2,0T 64% /mnt/BTRFS/rsnapshot You can't delete a mounted subvolume. You'd have to unmount it first. And then you'd have to mount a parent subvolume. So if the subvolume you want to delete is in the ID 5 subvolume, you must mount that subvolume, for example: mount /dev/sdb1 /mnt/btrfs btrfs subvolume delete /mnt/btrfs/subvolumetodelete Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] xfstests: btrfs/004: fix to make test really work
On 02/10/2014 08:22 PM, Wang Shilong wrote: Hi Josef, On 02/11/2014 03:18 AM, Josef Bacik wrote: On 02/10/2014 07:10 AM, Wang Shilong wrote: From: Wang Shilong wangsl.f...@cn.fujitsu.com So i was wandering why test 004 could pass my previous wrong kernel patch while it defenitely should not. By some debugging, i found here perl script is wrong, we did not filter out anything and this unit test did not work acutally.so it came out we will never fail this test. So now with this patch I'm failing it, is there some btrfs patch I need to make it not fail or is it still not supposed to fail normally and is this patch broken? Thanks, You should not have updated my previous patch(Btrfs: switch to btrfs_previous_extent_item()) when you fail this test. I update your latest btrfs-next which has updated my previous patch and it can pass this case, did you miss that? Hrm I must not have insmod'ed the new module, which now means I have to re-run all my tests, sigh. Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
On Feb 9, 2014, at 2:40 PM, Saint Germain saint...@gmail.com wrote: Then I added another drive for a RAID1 configuration (with btrfs balance) and I installed grub on the second hard drive with grub-install /dev/sdb. That can't work on UEFI. UEFI firmware effectively requires a GPT partition map and something to serve as an EFI System partition on all bootable drives. Second there's a difference between UEFI with and without secure boot. With secure boot you need to copy the files your distro installer puts on the target drive EFI System partition to each addition drive's ESP if you want multibooting to work in case of disk failure. The grub on each ESP likely looks on only its own ESP for a grub.cfg. So that then means having to sync grub.cfg's among each disk used for booting. A way around this is to create a single grub.cfg that merely forwards to the true grub.cfg. And you can copy this forward-only grub.cfg to each ESP. That way the ESP's never need updating or syncing again. Without secure boot, you must umount /boot/efi and mount the ESP for each bootable disk is turn, and then merely run: grub-install That will cause a core.img to be created for that particular ESP, and it will point to the usual grub.cfg location at /boot/grub. If I boot on sdb, it takes sda1 as the root filesystem If I switched the cable, it always take the first hard drive as the root filesystem (now sdb) If I disconnect /dev/sda, the system doesn't boot with a message saying that it hasn't found the UUID: Scanning for BTRFS filesystems... mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c on /root failed: Invalid argument Well if /dev/sda is missing, and you have an unpartitioned /dev/sdb I don't even know how you're getting this far, and it seems like the UEFI computer might actually be booting in CSM-BIOS mode which presents a conventional BIOS to the operating system. Disintguishing such things gets messy quickly. Can you tell me what I have done incorrectly ? Is it because of UEFI ? If yes I haven't understood how I can correct it in a simple way. As extra question, I don't see also how I can configure the system to get the correct swap in case of disk failure. Should I force both swap partition to have the same UUID ? If you're really expecting to create a system that can accept a disk failure and continue to work, I don't see how it can depend on swap partitions. It's fine to create them, but just realize if they're actually being used and the underlying physical device dies, the kernel isn't going to like it. A possible work around is using an md raid1 partition as swap. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck does not fix
On Feb 10, 2014, at 6:45 PM, Chris Murphy li...@colorremedies.com wrote: On Feb 9, 2014, at 1:36 AM, Hendrik Friedel hend...@friedels.name wrote: Yes, but I can create that space. So, for me the next steps would be to: -generate enough room on the filesystem -btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/BTRFS/Video -btrfs device delete /dev/sdc1 /mnt/BTRFS/Video Right? No. You said you need to recreate the file system, and only have these two devices and therefore must remove one device. You can't achieve that with raid1 which requires minimum two devices. -dconvert=single -mconvert=dup -sconvert=dup Actually, I'm reminded with multiple devices that dup might not be possible. Instead you might have to using single for all of them. Then remove the device you want removed. And then do another conversion for just -mconvert=dup -sconvert=dup, and do not specify -dconvert. That way the single metadata profile is converted to duplicate. Chris-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
Hello Duncan, What an amazing extensive answer you gave me ! Thank you so much for it. See my comments below. On Mon, 10 Feb 2014 03:34:49 + (UTC), Duncan 1i5t5.dun...@cox.net wrote : I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with UEFI. My systems don't do UEFI, but I do run GPT partitions and use grub2 for booting, with grub2-core installed to a BIOS/reserved type partition (instead of as an EFI service as it would be with UEFI). And I have root filesystem btrfs two-device raid1 mode working fine here, tested bootable with only one device of the two available. So while I can't help you directly with UEFI, I know the rest of it can/ does work. One more thing: I do have a (small) separate btrfs /boot, actually two of them as I setup a separate /boot on each of the two devices in ordered to have a backup /boot, since grub can only point to one /boot by default, and while pointing to another in grub's rescue mode is possible, I didn't want to have to deal with that if the first /boot was corrupted, as it's easier to simply point the BIOS at a different drive entirely and load its (independently installed and configured) grub and /boot. Can you explain why you choose to have a dedicated /boot partition ? I also read on this thread that it may be better to have a dedicated /boot partition: https://bbs.archlinux.org/viewtopic.php?pid=1342893#p1342893 However I haven't managed to make the system boot when the removing the first hard drive. I have installed Debian with the following partition on the first hard drive (no BTRFS subsystem): /dev/sda1: for / (BTRFS) /dev/sda2: for /home (BTRFS) /dev/sda3: for swap Then I added another drive for a RAID1 configuration (with btrfs balance) and I installed grub on the second hard drive with grub-install /dev/sdb. Just for clarification as you don't mention it specifically, altho your btrfs filesystem show information suggests you did it this way, are your partition layouts identical on both drives? That's what I've done here, and I definitely find that easiest to manage and even just to think about, tho it's definitely not a requirement. But using different partition layouts does significantly increase management complexity, so it's useful to avoid if possible. =:^) Yes, the partition layout is exactly the same on both drive (copied with sfdisk). I also try to keep things simple ;-) If I boot on sdb, it takes sda1 as the root filesystem If I switched the cable, it always take the first hard drive as the root filesystem (now sdb) That's normal /appearance/, but that /appearance/ doesn't fully reflect reality. The problem is that mount output (and /proc/self/mounts), fstab, etc, were designed with single-device filesystems in mind, and multi-device btrfs has to be made to fix the existing rules as best it can. So what's actually happening is that the for a btrfs composed of multiple devices, since there's only one device slot for the kernel to list devices, it only displays the first one it happens to come across, even tho the filesystem will normally (unless degraded) require that all component devices be available and logically assembled into the filesystem before it can be mounted. When you boot on sdb, naturally, the sdb component of the multi-device filesystem that the kernel finds, so it's the one listed, even tho the filesystem is actually composed of more devices, not just that one. I am not following you: it seems to be the opposite of what you describe. If I boot on sdb, I expect sdb1 and sdb2 to be the first components that the kernel find. However I can see that sda1 and sda2 are used (using the 'mount' command). When you switch the cables, the first one is, at least on your system, always the first device component of the filesystem detected, so it's always the one occupying the single device slot available for display, even tho the filesystem has actually assembled all devices into the complete filesystem before mounting. Normally the 2 hard drive should be exactly the same (or I didn't understand something) except for the UUID_SUB. That's why I don't understand if I switch the cable, I should get exactly the same results with 'mount'. But that is not the case, the 'mount' command always point to the same partition: - without cable switch: sda1 and sda2 - with cable switch: sdb1 and sdb2 Everything happen as if the system is using the UUID_SUB to get his 'favorite' partition. If I disconnect /dev/sda, the system doesn't boot with a message saying that it hasn't found the UUID: Scanning for BTRFS filesystems... mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c on /root failed: Invalid argument Can you tell me what I have done incorrectly ? Is it because of UEFI ? If yes I haven't understood how I can correct
Re: What to do about df and btrfs fi df
On Mon, Feb 10, 2014 at 7:02 PM, Roger Binns rog...@rogerbinns.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/02/14 10:24, cwillu wrote: The regular df data used number should be the amount of space required to hold a backup of that content (assuming that the backup maintains reflinks and compression and so forth). There's no good answer for available space; I think the flipside of the above works well. How large a group of files can you expect to create before you will get ENOSPC? That for example is the check code does that looks at df - I need to put in XGB of files - will it fit? It is also what users do. But the answer changes dramatically depending on whether it's large numbers of small files or a small number of large files, and the conservative worst-case choice means we report a number that is half what is probably expected. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
Hello ! On Mon, 10 Feb 2014 19:18:22 -0700, Chris Murphy li...@colorremedies.com wrote : On Feb 9, 2014, at 2:40 PM, Saint Germain saint...@gmail.com wrote: Then I added another drive for a RAID1 configuration (with btrfs balance) and I installed grub on the second hard drive with grub-install /dev/sdb. That can't work on UEFI. UEFI firmware effectively requires a GPT partition map and something to serve as an EFI System partition on all bootable drives. Second there's a difference between UEFI with and without secure boot. With secure boot you need to copy the files your distro installer puts on the target drive EFI System partition to each addition drive's ESP if you want multibooting to work in case of disk failure. The grub on each ESP likely looks on only its own ESP for a grub.cfg. So that then means having to sync grub.cfg's among each disk used for booting. A way around this is to create a single grub.cfg that merely forwards to the true grub.cfg. And you can copy this forward-only grub.cfg to each ESP. That way the ESP's never need updating or syncing again. Without secure boot, you must umount /boot/efi and mount the ESP for each bootable disk is turn, and then merely run: grub-install That will cause a core.img to be created for that particular ESP, and it will point to the usual grub.cfg location at /boot/grub. Ok I need to really understand how my motherboard works (new Z87E-ITX). It is written 64Mb AMI UEFI Legal BIOS, so I thought it was really UEFI. If I boot on sdb, it takes sda1 as the root filesystem If I switched the cable, it always take the first hard drive as the root filesystem (now sdb) If I disconnect /dev/sda, the system doesn't boot with a message saying that it hasn't found the UUID: Scanning for BTRFS filesystems... mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c on /root failed: Invalid argument Well if /dev/sda is missing, and you have an unpartitioned /dev/sdb I don't even know how you're getting this far, and it seems like the UEFI computer might actually be booting in CSM-BIOS mode which presents a conventional BIOS to the operating system. Disintguishing such things gets messy quickly. /dev/sdb has the same partition as /dev/sda. Duncan gave me the hint with degraded mode and I managed to boot (however I had some problem with mounting sda2). Can you tell me what I have done incorrectly ? Is it because of UEFI ? If yes I haven't understood how I can correct it in a simple way. As extra question, I don't see also how I can configure the system to get the correct swap in case of disk failure. Should I force both swap partition to have the same UUID ? If you're really expecting to create a system that can accept a disk failure and continue to work, I don't see how it can depend on swap partitions. It's fine to create them, but just realize if they're actually being used and the underlying physical device dies, the kernel isn't going to like it. A possible work around is using an md raid1 partition as swap. I understand. Normally the swap will only be used for hibernating. I don't expect to use it except perhaps in some extreme case. Thanks for your help ! -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What to do about df and btrfs fi df
On Mon, Feb 10, 2014 at 7:13 PM, cwillu cwi...@cwillu.com wrote: On Mon, Feb 10, 2014 at 7:02 PM, Roger Binns rog...@rogerbinns.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/02/14 10:24, cwillu wrote: The regular df data used number should be the amount of space required to hold a backup of that content (assuming that the backup maintains reflinks and compression and so forth). There's no good answer for available space; I think the flipside of the above works well. How large a group of files can you expect to create before you will get ENOSPC? That for example is the check code does that looks at df - I need to put in XGB of files - will it fit? It is also what users do. But the answer changes dramatically depending on whether it's large numbers of small files or a small number of large files, and the conservative worst-case choice means we report a number that is half what is probably expected. I don't think that is a problem, as long as the avail guesstimate is conservative. Scenario: A user has 10G of files and df reports that there are 11G available. I think the expectation is that copying these 10G into the filesystem will not ENOSPC. After the copy completes, whether the new avail number is ==1G or 1G is less important IMHO. I.e. I like to see df output as a you can write AT LEAST this much more data until the filesystem is full. That was my 5 cent. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: system stuck with flush-btrfs-4 at 100% after filesystem resize
John Navitsky posted on Mon, 10 Feb 2014 07:35:32 -0800 as excerpted: [I rearranged your upside-down posting so the reply comes in context after the quote.] On 2/8/2014 10:36 AM, John Navitsky wrote: I have a large file system that has been growing. We've resized it a couple of times with the following approach: lvextend -L +800G /dev/raid/virtual_machines btrfs filesystem resize +800G /vms I think the FS started out at 200G, we increased it by 200GB a time or two, then by 800GB and everything worked fine. The filesystem hosts a number of virtual machines so the file system is in use, although the VMs individually tend not to be overly active. VMs tend to be in subvolumes, and some of those subvolumes have snapshots. This time, I increased it by another 800GB, and it it has hung for many hours (over night) with flush-btrfs-4 near 100% cpu all that time. I'm not clear at this point that it will finish or where to go from here. Any pointers would be much appreciated. As a follow-up, at some point over the weekend things did finish on their own: romulus:/vms/johnn-sles11sp3 # df -h /vms Filesystem Size Used Avail Use% Mounted on /dev/dm-4 2.6T 1.6T 1.1T 60% /vms romulus:/vms/johnn-sles11sp3 # I'd still be interested in any comments about what was going on or suggestions. I'm guessing you don't have the VM images set NOCOW (no-copy-on-write), which means over time they'll **HEAVILY** fragment since every time something changes in the image and is written back to the file, that block is written somewhere else due to COW. We've had some reports of hundreds of thousands of extents in VM files of a few gigs! It's also worth noting that while NOCOW does normally mean in-place writes, a change after a snapshot means unsharing the data since the snapshotted data has now diverged, which means mandatory single-shot COW in ordered to keep the new change from overwriting the old snapshot version. That of course triggers fragmentation too, since everything that changes in the image between snapshots must be written elsewhere, altho the fragmentation won't be nearly as fast as the default COW mode will. So what was very likely taking the time was tracking down all those potentially hundreds of thousands of fragments/extents in ordered to re- write the files as triggered by the size increase and presumably the physical location on-device. I'd strongly suggest that you set all VMs NOCOW (chattr +C). However, there's a wrinkle. In ordered to be effective on btrfs, NOCOW must be set on a file while it is still zero-size, before it has data written to it. The easiest way to do that is to set NOCOW on the directory, which doesn't really affect the directory itself, but DOES cause all new files (and subdirs, so it nests) created in that directory to inherit the NOCOW attribute. Then copy the file in, preferably either catting it in with redirection to create/write the file, or copying it from another filesystem, such that you know it's actually copying the data and not simply hard-linking it, thus ensuring that the new copy is actually a new copy, so the NOCOW will actually take effect. By organizing your VM images into dirs, all with NOCOW set, so the images inherit it at creation, you'll save yourself the fragmentation of the repeated COW writes. However, as I mentioned, the first time a block is written after a snapshot it's still a COW write, unavoidably so. Thus, I'd suggest keeping btrfs snapshots of your VMs to a minimum (preferably 0), using ordinary full-copy backups to other media, instead, thus avoiding that first COW-after-snapshot effect, too. Meanwhile, it's worth noting that if a file is written sequentially (append only) and not written into, as will typically be the case with the VM backups, there's nothing to trigger fragmentation. So the backups don't have to be NOCOW, since they'll be written once and left alone. But the actively in-use and thus often written to operational VM images should be NOCOW, and preferably not snapshotted, to keep fragmentation to a minimum. Finally, of course you can use btrfs defrag to manually deal with the problem. However, do note that the snapshot aware defrag introduced with kernel 3.9 simply does NOT scale well once the number of snapshots reaches near 1000, and the snapshot-awareness has just been disabled again (in kernel 3.14-rc), until the code can be reworked to scale better. So I'd suggest if you /are/ using snapshots and trying to work with defrag, you'll want a very new 3.14-rc kernel in ordered to avoid that problem, but avoiding it does come at the cost of losing space efficiency when defragging snapshotted btrfs, as the non-snapshot-aware version will tend to create separate copies of the data on each snapshot it is run on, thus decreasing shared data blocks and increasing space usage, perhaps dramatically. So again, at least for now, and
Re: BTRFS with RAID1 cannot boot when removing drive
Saint Germain posted on Tue, 11 Feb 2014 04:15:27 +0100 as excerpted: Ok I need to really understand how my motherboard works (new Z87E-ITX). It is written 64Mb AMI UEFI Legal BIOS, so I thought it was really UEFI. I expect it's truly UEFI. But from what I've read most UEFI based firmware(possibly all in theory, with the caveat that there's bugs and some might not actually work as intended due to bugs) on x86/amd64 (as opposed to arm) has a legacy-BIOS mode fallback. Provided it's not in secure-boot mode, if the storage devices it is presented don't have a valid UEFI config, it'll fall back to legacy-BIOS mode and try to detect and boot that. Which may or may not be what your system is actually doing. As I said, since I've not actually experimented with UEFI here, my practical knowledge on it is virtually nil, and I don't claim to have studied the theory well enough to deduce in that level of detail what your system is doing. But I know that's how it's /supposed/ to be able to work. =:^) (FWIW, what I /have/ done, deliberately, is read enough about UEFI to have a general feel for it, and to have been previously exposed to the ideas for some time, so that once I /do/ have it available and decide it's time, I'll be able to come up to speed relatively quickly as I've had the general ideas turning over in my head for quite some time already, so in effect I'll simply be reviewing the theory and doing the lab work, while concurrently making logical connections about how it all fits together that only happen once one actually does that lab work. I've discovered over the years that this is perhaps my most effective way to learn, read about the general principles while not really understanding it the first time thru, then come back to it some months or years later, and I pick it up real fast, because my subconscious has been working on the problem the whole time! Come to think of it, that's actually how I handled btrfs, too, trying it at one point and deciding it didn't fit my needs at the time, leaving it for awhile, then coming back to it later when my needs had changed, but I already had an idea what I was doing from the previous try, with the result being I really took to it fast, the second time! =:^) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS with RAID1 cannot boot when removing drive
Saint Germain posted on Tue, 11 Feb 2014 04:15:27 +0100 as excerpted: I understand. Normally the swap will only be used for hibernating. I don't expect to use it except perhaps in some extreme case. If hibernate is your main swap usage, you might consider the noauto fstab option as well, then specifically swapon the appropriate one in your hibernate script since you may well need logic in there to figure out which one to use in any case. I was doing that for awhile. (I've run my own suspend/hibernate scripts based on the documentation in $KERNDIR/Documentation/power/*, for years. The kernel's docs dir really is a great resource for a lot of sysadmin level stuff as well as the expected kernel developer stuff. I think few are aware of just how much real useful admin-level information it actually contains. =:^) -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html