[PATCH] Btrfs: fix possible deadlock in btrfs_cleanup_transaction

2014-02-10 Thread Liu Bo
[13654.480669] ==
[13654.480905] [ INFO: possible circular locking dependency detected ]
[13654.481003] 3.12.0+ #4 Tainted: GW  O
[13654.481060] ---
[13654.481060] btrfs-transacti/9347 is trying to acquire lock:
[13654.481060]  ((root-ordered_extent_lock)-rlock){+.+...}, at: 
[a02d30a1] btrfs_cleanup_transaction+0x271/0x570 [btrfs]
[13654.481060] but task is already holding lock:
[13654.481060]  ((fs_info-ordered_root_lock)-rlock){+.+...}, at: 
[a02d3015] btrfs_cleanup_transaction+0x1e5/0x570 [btrfs]
[13654.481060] which lock already depends on the new lock.

[13654.481060] the existing dependency chain (in reverse order) is:
[13654.481060] - #1 ((fs_info-ordered_root_lock)-rlock){+.+...}:
[13654.481060][810c4103] lock_acquire+0x93/0x130
[13654.481060][81689991] _raw_spin_lock+0x41/0x50
[13654.481060][a02f011b] 
__btrfs_add_ordered_extent+0x39b/0x450 [btrfs]
[13654.481060][a02f0202] btrfs_add_ordered_extent+0x32/0x40 
[btrfs]
[13654.481060][a02df6aa] run_delalloc_nocow+0x78a/0x9d0 
[btrfs]
[13654.481060][a02dfc0d] run_delalloc_range+0x31d/0x390 
[btrfs]
[13654.481060][a02f7c00] __extent_writepage+0x310/0x780 
[btrfs]
[13654.481060][a02f830a] 
extent_write_cache_pages.isra.29.constprop.48+0x29a/0x410 [btrfs]
[13654.481060][a02f879d] extent_writepages+0x4d/0x70 [btrfs]
[13654.481060][a02d9f68] btrfs_writepages+0x28/0x30 [btrfs]
[13654.481060][8114be91] do_writepages+0x21/0x50
[13654.481060][81140d49] __filemap_fdatawrite_range+0x59/0x60
[13654.481060][81140e13] filemap_fdatawrite_range+0x13/0x20
[13654.481060][a02f1db9] btrfs_wait_ordered_range+0x49/0x140 
[btrfs]
[13654.481060][a0318fe2] __btrfs_write_out_cache+0x682/0x8b0 
[btrfs]
[13654.481060][a031952d] btrfs_write_out_cache+0x8d/0xe0 
[btrfs]
[13654.481060][a02c7083] 
btrfs_write_dirty_block_groups+0x593/0x680 [btrfs]
[13654.481060][a0345307] commit_cowonly_roots+0x14b/0x20d 
[btrfs]
[13654.481060][a02d7c1a] btrfs_commit_transaction+0x43a/0x9d0 
[btrfs]
[13654.481060][a030061a] btrfs_create_uuid_tree+0x5a/0x100 
[btrfs]
[13654.481060][a02d5a8a] open_ctree+0x21da/0x2210 [btrfs]
[13654.481060][a02ab6fe] btrfs_mount+0x68e/0x870 [btrfs]
[13654.481060][811b2409] mount_fs+0x39/0x1b0
[13654.481060][811cd653] vfs_kern_mount+0x63/0xf0
[13654.481060][811cfcce] do_mount+0x23e/0xa90
[13654.481060][811d05a3] SyS_mount+0x83/0xc0
[13654.481060][81692b52] system_call_fastpath+0x16/0x1b
[13654.481060] - #0 ((root-ordered_extent_lock)-rlock){+.+...}:
[13654.481060][810c340a] __lock_acquire+0x150a/0x1a70
[13654.481060][810c4103] lock_acquire+0x93/0x130
[13654.481060][81689991] _raw_spin_lock+0x41/0x50
[13654.481060][a02d30a1] 
btrfs_cleanup_transaction+0x271/0x570 [btrfs]
[13654.481060][a02d35ce] transaction_kthread+0x22e/0x270 
[btrfs]
[13654.481060][81079efa] kthread+0xea/0xf0
[13654.481060][81692aac] ret_from_fork+0x7c/0xb0
[13654.481060] other info that might help us debug this:

[13654.481060]  Possible unsafe locking scenario:

[13654.481060]CPU0CPU1
[13654.481060]
[13654.481060]   lock((fs_info-ordered_root_lock)-rlock);
[13654.481060]   
lock((root-ordered_extent_lock)-rlock);
[13654.481060]   
lock((fs_info-ordered_root_lock)-rlock);
[13654.481060]   lock((root-ordered_extent_lock)-rlock);
[13654.481060]
 *** DEADLOCK ***
[...]

==

btrfs_destroy_all_ordered_extents()
gets fs_info-ordered_root_lock __BEFORE__ acquiring 
root-ordered_extent_lock,
while btrfs_[add,remove]_ordered_extent()
acquires fs_info-ordered_root_lock __AFTER__ getting 
root-ordered_extent_lock.

This patch fixes the above problem.

Signed-off-by: Liu Bo bo.li@oracle.com
---
 fs/btrfs/disk-io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3903bd3..8cd48c3 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3794,9 +3794,11 @@ static void btrfs_destroy_all_ordered_extents(struct 
btrfs_fs_info *fs_info)
list_move_tail(root-ordered_root,
   fs_info-ordered_roots);
 
+   spin_unlock(fs_info-ordered_root_lock);
btrfs_destroy_ordered_extents(root);
 
-   cond_resched_lock(fs_info-ordered_root_lock);
+ 

Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

2014-02-10 Thread Fengguang Wu
Hi Filipe,

 If you disable CONFIG_BTRFS_FS_RUN_SANITY_TESTS, does it still crash?

I tried disabling CONFIG_BTRFS_FS_RUN_SANITY_TESTS in the reported 3
randconfigs and they all boot fine.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Provide a better free space estimate on RAID1

2014-02-10 Thread Roman Mamedov
On Mon, 10 Feb 2014 00:02:38 + (UTC)
Duncan 1i5t5.dun...@cox.net wrote:

 Meanwhile, you said it yourself, users aren't normally concerned about 
 this.

I think you're being mistaken here, the point that users aren't looking at
the free space, hence it is not important to provide a correct estimate was
made by someone else, not me. Personally I found that to be just a bit too
surrealistic to try and seriously answer; much like the rest of your message.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

2014-02-10 Thread Fengguang Wu
On Sat, Feb 08, 2014 at 03:10:37PM -0500, Tejun Heo wrote:
 Hello, David, Fengguang, Chris.
 
 On Fri, Feb 07, 2014 at 01:13:06PM -0800, David Rientjes wrote:
  On Fri, 7 Feb 2014, Fengguang Wu wrote:
  
   On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote:
On Fri, 7 Feb 2014, Fengguang Wu wrote:

 [1.625020] BTRFS: selftest: Running btrfs_split_item tests
 [1.627004] BTRFS: selftest: Running find delalloc tests
 [2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
 [  292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, 
 order=1, oom_score_adj=0
 [  292.086439] kthreadd cpuset=
 [  292.087072] BUG: unable to handle kernel NULL pointer dereference 
 at 0038
 [  292.087372] IP: [812119de] pr_cont_kernfs_name+0x1b/0x6c

This looks like a problem with the cpuset cgroup name, are you sure 
this 
isn't related to the removal of cgroup-name?
   
   It looks not related to patch cgroup: remove cgroup-name, because
   that patch lies in the cgroup tree and not contained in output of git 
   log BAD_COMMIT.

Sorry I was wrong here. I find that the above dmesg is for commit
4830363 which is a merge HEAD that contains the cgroup code.

The dmesg for commit 878a876b2e1 (Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs)
looks different, which hangs after the tsc line:

[2.428110] Btrfs loaded, assert=on, integrity-checker=on
[2.429469] BTRFS: selftest: Running btrfs free space cache tests
[2.430874] BTRFS: selftest: Running extent only tests
[2.432135] BTRFS: selftest: Running bitmap only tests
[2.433359] BTRFS: selftest: Running bitmap and extent tests
[2.434675] BTRFS: selftest: Free space cache tests finished
[2.435959] BTRFS: selftest: Running extent buffer operation tests
[2.437350] BTRFS: selftest: Running btrfs_split_item tests
[2.438843] BTRFS: selftest: Running find delalloc tests
[3.158351] tsc: Refined TSC clocksource calibration: 2666.596 MHz


  It's dying on pr_cont_kernfs_name which is some tree that has kernfs: 
  implement kernfs_get_parent(), kernfs_name/path() and friends, which is 
  not in linux-next, and is obviously printing the cpuset cgroup name.
  
  It doesn't look like it has anything at all to do with btrfs or why they 
  would care about this failure.
 
 Yeah, this is from a patch in cgroup/review-post-kernfs-conversion
 branch which updates cgroup to use pr_cont_kernfs_name().  I forget
 that cgrp-kn is NULL for the dummy_root's top cgroup and thus it ends
 up calling the kernfs functions with NULL kn and thus the oops.  I
 posted an updated patch and the git branch has been updated.
 
  http://lkml.kernel.org/g/20140208200640.gb10...@htj.dyndns.org
 
 So, nothing to do with btrfs and it looks like somehow the test
 appratus is mixing up branches?

Yes - I may do random merges and boot test the resulted kernels.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: avoid warning bomb of btrfs_invalidate_inodes

2014-02-10 Thread Liu Bo
So after transaction is aborted, we need to cleanup inode resources by
calling btrfs_invalidate_inodes(), and btrfs_invalidate_inodes() hopes
roots' refs to be zero in old times and sets a WARN_ON(), however, this
is not always true within cleaning up transaction, so we get to detect
transaction abortion and not warn at all.

Signed-off-by: Liu Bo bo.li@oracle.com
---
v2: Follow Josef's advice, ie. in case of aborting transaction, we no more warn.

 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1af34d0..e876c1e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4920,7 +4920,8 @@ void btrfs_invalidate_inodes(struct btrfs_root *root)
struct inode *inode;
u64 objectid = 0;
 
-   WARN_ON(btrfs_root_refs(root-root_item) != 0);
+   if (!test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state))
+   WARN_ON(btrfs_root_refs(root-root_item) != 0);
 
spin_lock(root-inode_lock);
 again:
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with btrfs balance

2014-02-10 Thread Imran Geriskovan
I've experienced the following with balance:

Setup:
- Kernel 3.12.9
- 11 DVD sized (4.3GB) loopback devices.
(9 Read-Only Seed devices + 2 Read/Write devices)
- 9 device seed created with -m single -d single and made
Read-only with btrfstune -S 1 ...
- 2 devices was added at different dates. NO balance performed until now.
- NOW add 1 more device to the array and perform a balance.

Result:
Balance did run for a while and exited displaying Process Killed.
Any attempt to unmount the array failed, preventing any
shutdown. Hence I had no option other than hard reset.

After reboot, issuing balance command gave the message
Balance in progress.

I cancelled the balance and tried to remove the last device
which ended with a kernel crash. So I dumped 2 + 1 normal devices.

Former 9 multi device seed was OK and was mountable.

Regards,
Imran
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] xfstests: btrfs/004: fix to make test really work

2014-02-10 Thread Wang Shilong
From: Wang Shilong wangsl.f...@cn.fujitsu.com

So i was wandering why test 004 could pass my previous wrong
kernel patch while it defenitely should not.

By some debugging, i found here perl script is wrong, we did not
filter out anything and this unit test did not work acutally.so
it came out we will never fail this test.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 tests/btrfs/004 | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)
 mode change 100755 = 100644 tests/btrfs/004

diff --git a/tests/btrfs/004 b/tests/btrfs/004
old mode 100755
new mode 100644
index 14da9f1..17a6e34
--- a/tests/btrfs/004
+++ b/tests/btrfs/004
@@ -57,10 +57,9 @@ _require_command /usr/sbin/filefrag
 
 rm -f $seqres.full
 
-FILEFRAG_FILTER='if (/, blocksize (\d+)/) {$blocksize = $1; next} ($ext, '\
-'$logical, $physical, $expected, $length, $flags) = (/^\s*(\d+)\s+(\d+)'\
-'\s+(\d+)\s+(?:(\d+)\s+)?(\d+)\s+(.*)/) or next; $flags =~ '\
-'/(?:^|,)inline(?:,|$)/ and next; print $physical * $blocksize, #, '\
+FILEFRAG_FILTER='if (/blocks of (\d+) bytes/) {$blocksize = $1; next} ($ext, '\
+'$logical, $physical, $length) = (/^\s*(\d+):\s+(\d+)..\s+\d+:'\
+'\s+(\d+)..\s+\d+:\s+(\d+):/) or next; print $physical * $blocksize, #, '\
 '$length * $blocksize, #, $logical * $blocksize,  '
 
 # this makes filefrag output script readable by using a perl helper.
-- 
1.8.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: receive: don't output normal message into stderr

2014-02-10 Thread Wang Shilong
From: Wang Shilong wangsl.f...@cn.fujitsu.com

Don't output normal messages into stderr, this make xfstests
filter output easier.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
 cmds-receive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-receive.c b/cmds-receive.c
index cce37a7..2d55c53 100644
--- a/cmds-receive.c
+++ b/cmds-receive.c
@@ -213,7 +213,7 @@ static int process_snapshot(const char *path, const u8 
*uuid, u64 ctransid,
free(r-full_subvol_path);
r-full_subvol_path = path_cat3(r-root_path, r-dest_dir_path, path);
 
-   fprintf(stderr, At snapshot %s\n, path);
+   fprintf(stdout, At snapshot %s\n, path);
 
memcpy(r-cur_subvol-received_uuid, uuid, BTRFS_UUID_SIZE);
r-cur_subvol-stransid = ctransid;
-- 
1.8.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs send runs out of memory and file handles

2014-02-10 Thread Hugo Mills
On Mon, Feb 10, 2014 at 01:28:38PM +, Frank Kingswood wrote:
 Hi,
 
 I'm attempting to back up a btrfs subvolume with
 
   $ btrfs send /path/to/subvol | nc
 
 and the receiving end does
 
   $ nc -l | btrfs receive /path/to/volume
 
 This subvolume holds ~250 GB of data, about half full, and uses RAID1.
 
 Doing so runs out of file descriptors on the sending machine (having
 over 100k files open) and eventually runs out of memory and gets
 killed by the OOM killer.

   This sounds like a known bug, and I think it was fixed in 3.13.
What kernel version are you using?

   Hugo.

 What are the memory requirements of btrfs send?
 This is the initial send so the entire volume must be transferred.
 Are later send operations equally memory-intensive?
 
 Is it possible to do an incremental send, or send partial snapshots
 and combine them later on?

   

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
 --- Everything simple is false. Everything which is --- 
  complex is unusable.   


signature.asc
Description: Digital signature


Re: btrfs send runs out of memory and file handles

2014-02-10 Thread Frank Kingswood

On 10/02/14 13:47, Hugo Mills wrote:

On Mon, Feb 10, 2014 at 01:28:38PM +, Frank Kingswood wrote:


I'm attempting to back up a btrfs subvolume

 [...]

Doing so runs out of file descriptors on the sending machine (having
over 100k files open) and eventually runs out of memory and gets
killed by the OOM killer.


This sounds like a known bug, and I think it was fixed in 3.13.
What kernel version are you using?


This is on 3.12.5.
I can build and install 3.13.2 and test it again.

Frank


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


scrub crashed?

2014-02-10 Thread Johan Kröckel
root@fortknox:~# uname -a
Linux fortknox 3.12-0.bpo.1-amd64 #1 SMP Debian 3.12.9-1~bpo70+1
(2014-02-07) x86_64 GNU/Linux
root@fortknox:~# btrfs version
Btrfs v3.12
root@fortknox:~# btrfs scrub status -d /bunker
scrub status for 11312131-3372-4637-b526-35a4ef0c31eb
scrub device /dev/mapper/bunkerA (id 1) status
scrub started at Sun Feb  9 19:12:22 2014, running for 11317 seconds
total bytes scrubbed: 325.32GiB with 0 errors
scrub device /dev/dm-1 (id 2) status
scrub started at Sun Feb  9 19:12:22 2014, running for 11317 seconds
total bytes scrubbed: 321.76GiB with 0 errors
root@fortknox:~# btrfs scrub cancel /bunker
ERROR: scrub cancel failed on /bunker: not running
root@fortknox:~# btrfs scrub start /bunker
ERROR: scrub is already running.
To cancel use 'btrfs scrub cancel /bunker'.
To see the status use 'btrfs scrub status [-d] /bunker'.
root@fortknox:~# ps -A|grep btrfs
 3704 ?00:00:00 btrfs-genwork-1
 3705 ?00:00:00 btrfs-submit-1
 3706 ?00:00:00 btrfs-delalloc-
 3707 ?00:00:00 btrfs-fixup-1
 3708 ?00:00:02 btrfs-endio-1
 3709 ?00:00:00 btrfs-endio-met
 3710 ?00:00:00 btrfs-rmw-1
 3711 ?00:00:00 btrfs-endio-rai
 3712 ?00:00:00 btrfs-endio-met
 3714 ?00:00:05 btrfs-freespace
 3715 ?00:00:00 btrfs-delayed-m
 3716 ?00:00:00 btrfs-cache-1
 3717 ?00:00:00 btrfs-readahead
 3718 ?00:00:00 btrfs-flush_del
 3719 ?00:00:00 btrfs-qgroup-re
 3720 ?00:00:00 btrfs-cleaner
 3721 ?00:02:00 btrfs-transacti
 8380 ?00:00:00 btrfs-endio-wri
 8836 ?00:00:00 btrfs-worker-4
 8936 ?00:00:00 btrfs-worker-3
 8961 ?00:00:00 btrfs-worker-3
 9138 ?00:00:00 btrfs-worker-4

What can/should I do now?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lost with degraded RAID1

2014-02-10 Thread Johan Kröckel
Thanks, that explains something.  There was indeed a BIOS-Problem (the
drive vanished was disabled in BIOS suddenly and was only useable
again after reactivating it in the BIOS again). So should have been a
BIOS-problem.

2014-02-09 Duncan 1i5t5.dun...@cox.net:
 Johan Kröckel posted on Sat, 08 Feb 2014 12:09:46 +0100 as excerpted:

 Ok, I did nuke it now and created the fs again using 3.12 kernel. So far
 so good. Runs fine.
 Finally, I know its kind of offtopic, but can some help me interpreting
 this (I think this is the error in the smart-log which started the whole
 mess)?

 Error 1 occurred at disk power-on lifetime: 2576 hours (107 days + 8
 hours)
   When the command that caused the error occurred, the device was
 active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   04 71 00 ff ff ff 0f
  Device Fault; Error: ABRT at LBA = 0x0fff = 268435455

 I'm no SMART expert, but that LBA number is incredibly suspicious.  With
 standard 512-byte sectors that's the 128 GiB boundary, the old 28-bit LBA
 limit (LBA28, introduced with ATA-1 in 1994, modern drives are LBA48,
 introduced in 2003 with ATA-6 and offering an addressing capacity of 128
 PiB, according to wikipedia's article on LBA).

 It looks like something flipped back to LBA28, and when a continuing
 operation happened to write past that value... it triggered the abort you
 see in the SMART log.

 Double-check your BIOS to be sure it didn't somehow revert to the old
 LBA28 compatibility mode or some such, and the drives, to make sure they
 aren't clipped to LBA28 compatibility mode as well.

 --
 Duncan - List replies preferred.   No HTML msgs.
 Every nonfree program has a lord, a master --
 and if you use the program, he is your master.  Richard Stallman

 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with btrfs balance

2014-02-10 Thread Austin S Hemmelgarn
On 2014-02-10 08:41, Brendan Hide wrote:
 On 2014/02/10 04:33 AM, Austin S Hemmelgarn wrote:
 snip
 Apparently, trying to use -mconvert=dup or -sconvert=dup on a
 multi-device filesystem using one of the RAID profiles for metadata
 fails with a statement to look at the kernel log, which doesn't show
 anything at all about the failure.
 ^ If this is the case then it is definitely a bug. Can you provide some
 version info? Specifically kernel, btrfs-tools, and Distro.
In this case, btrfs-progs 3.12, kernel 3.13.2, and Gentoo.
 snip it appears
 that the kernel stops you from converting to a dup profile for metadata
 in this case because it thinks that such a profile doesn't work on
 multiple devices, despite the fact that you can take a single device
 filesystem, and a device, and it will still work fine even without
 converting the metadata/system profiles.
 I believe dup used to work on multiple devices but the facility was
 removed. In the standard case it doesn't make sense to use dup with
 multiple devices: It uses the same amount of diskspace but is more
 vulnerable than the RAID1 alternative.
 snip Ideally, this
 should be changed to allow converting to dup so that when converting a
 multi-device filesystem to single-device, you never have to have
 metadata or system chunks use a single profile.
 This is a good use-case for having the facility. I'm thinking that, if
 it is brought back in, the only caveat is that appropriate warnings
 should be put in place to indicate that it is inappropriate.
 
 My guess on how you'd like to migrate from raid1/raid1 to single/dup,
 assuming sda and sdb:
 btrfs balance start -dconvert=single -mconvert=dup /
 btrfs device delete /dev/sdb /
 
Ideally, yes.  The exact command I tried to use was:
btrfs balance start -dconvert=single -mconvert=dup -sconvert=dup -f -v /
Trying again without the system chunk conversion also failed.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scrub crashed?

2014-02-10 Thread Shilong Wang
Hello Johan,

This should be a known problem.

The problem seemed that scrub log file is corrupt, so i added an option
-f something like:

btrfs scrub start -f ..

You can update latest btrfs-progs from david's latest integration
branch and try it. if you don't
want to do that, just rm /var/lib/btrfs/scrub* should fix your problem.

Hopely it can help you.^_^

Thanks,
Wang


2014-02-10 22:02 GMT+08:00 Johan Kröckel johan.kroec...@gmail.com:
 root@fortknox:~# uname -a
 Linux fortknox 3.12-0.bpo.1-amd64 #1 SMP Debian 3.12.9-1~bpo70+1
 (2014-02-07) x86_64 GNU/Linux
 root@fortknox:~# btrfs version
 Btrfs v3.12
 root@fortknox:~# btrfs scrub status -d /bunker
 scrub status for 11312131-3372-4637-b526-35a4ef0c31eb
 scrub device /dev/mapper/bunkerA (id 1) status
 scrub started at Sun Feb  9 19:12:22 2014, running for 11317 seconds
 total bytes scrubbed: 325.32GiB with 0 errors
 scrub device /dev/dm-1 (id 2) status
 scrub started at Sun Feb  9 19:12:22 2014, running for 11317 seconds
 total bytes scrubbed: 321.76GiB with 0 errors
 root@fortknox:~# btrfs scrub cancel /bunker
 ERROR: scrub cancel failed on /bunker: not running
 root@fortknox:~# btrfs scrub start /bunker
 ERROR: scrub is already running.
 To cancel use 'btrfs scrub cancel /bunker'.
 To see the status use 'btrfs scrub status [-d] /bunker'.
 root@fortknox:~# ps -A|grep btrfs
  3704 ?00:00:00 btrfs-genwork-1
  3705 ?00:00:00 btrfs-submit-1
  3706 ?00:00:00 btrfs-delalloc-
  3707 ?00:00:00 btrfs-fixup-1
  3708 ?00:00:02 btrfs-endio-1
  3709 ?00:00:00 btrfs-endio-met
  3710 ?00:00:00 btrfs-rmw-1
  3711 ?00:00:00 btrfs-endio-rai
  3712 ?00:00:00 btrfs-endio-met
  3714 ?00:00:05 btrfs-freespace
  3715 ?00:00:00 btrfs-delayed-m
  3716 ?00:00:00 btrfs-cache-1
  3717 ?00:00:00 btrfs-readahead
  3718 ?00:00:00 btrfs-flush_del
  3719 ?00:00:00 btrfs-qgroup-re
  3720 ?00:00:00 btrfs-cleaner
  3721 ?00:02:00 btrfs-transacti
  8380 ?00:00:00 btrfs-endio-wri
  8836 ?00:00:00 btrfs-worker-4
  8936 ?00:00:00 btrfs-worker-3
  8961 ?00:00:00 btrfs-worker-3
  9138 ?00:00:00 btrfs-worker-4

 What can/should I do now?
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scrub crashed?

2014-02-10 Thread Johan Kröckel
Thank you Shilong, that was the problem.

2014-02-10 Shilong Wang wangshilong1...@gmail.com:
 Hello Johan,

 This should be a known problem.

 The problem seemed that scrub log file is corrupt, so i added an option
 -f something like:

 btrfs scrub start -f ..

 You can update latest btrfs-progs from david's latest integration
 branch and try it. if you don't
 want to do that, just rm /var/lib/btrfs/scrub* should fix your problem.

 Hopely it can help you.^_^

 Thanks,
 Wang


 2014-02-10 22:02 GMT+08:00 Johan Kröckel johan.kroec...@gmail.com:
 root@fortknox:~# uname -a
 Linux fortknox 3.12-0.bpo.1-amd64 #1 SMP Debian 3.12.9-1~bpo70+1
 (2014-02-07) x86_64 GNU/Linux
 root@fortknox:~# btrfs version
 Btrfs v3.12
 root@fortknox:~# btrfs scrub status -d /bunker
 scrub status for 11312131-3372-4637-b526-35a4ef0c31eb
 scrub device /dev/mapper/bunkerA (id 1) status
 scrub started at Sun Feb  9 19:12:22 2014, running for 11317 seconds
 total bytes scrubbed: 325.32GiB with 0 errors
 scrub device /dev/dm-1 (id 2) status
 scrub started at Sun Feb  9 19:12:22 2014, running for 11317 seconds
 total bytes scrubbed: 321.76GiB with 0 errors
 root@fortknox:~# btrfs scrub cancel /bunker
 ERROR: scrub cancel failed on /bunker: not running
 root@fortknox:~# btrfs scrub start /bunker
 ERROR: scrub is already running.
 To cancel use 'btrfs scrub cancel /bunker'.
 To see the status use 'btrfs scrub status [-d] /bunker'.
 root@fortknox:~# ps -A|grep btrfs
  3704 ?00:00:00 btrfs-genwork-1
  3705 ?00:00:00 btrfs-submit-1
  3706 ?00:00:00 btrfs-delalloc-
  3707 ?00:00:00 btrfs-fixup-1
  3708 ?00:00:02 btrfs-endio-1
  3709 ?00:00:00 btrfs-endio-met
  3710 ?00:00:00 btrfs-rmw-1
  3711 ?00:00:00 btrfs-endio-rai
  3712 ?00:00:00 btrfs-endio-met
  3714 ?00:00:05 btrfs-freespace
  3715 ?00:00:00 btrfs-delayed-m
  3716 ?00:00:00 btrfs-cache-1
  3717 ?00:00:00 btrfs-readahead
  3718 ?00:00:00 btrfs-flush_del
  3719 ?00:00:00 btrfs-qgroup-re
  3720 ?00:00:00 btrfs-cleaner
  3721 ?00:02:00 btrfs-transacti
  8380 ?00:00:00 btrfs-endio-wri
  8836 ?00:00:00 btrfs-worker-4
  8936 ?00:00:00 btrfs-worker-3
  8961 ?00:00:00 btrfs-worker-3
  9138 ?00:00:00 btrfs-worker-4

 What can/should I do now?
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: system stuck with flush-btrfs-4 at 100% after filesystem resize

2014-02-10 Thread John Navitsky
As a follow-up, at some point over the weekend things did finish on 
their own:


romulus:/vms/johnn-sles11sp3 # df -h /vms
Filesystem  Size  Used Avail Use% Mounted on
/dev/dm-4   2.6T  1.6T  1.1T  60% /vms
romulus:/vms/johnn-sles11sp3 #

I'd still be interested in any comments about what was going on or 
suggestions.


Thanks,

-john

On 2/8/2014 10:36 AM, John Navitsky wrote:

Hello,

I have a large file system that has been growing.  We've resized it a
couple of times with the following approach:

   lvextend -L +800G /dev/raid/virtual_machines
   btrfs filesystem resize +800G /vms

I think the FS started out at 200G, we increased it by 200GB a time or
two, then by 800GB and everything worked fine.

The filesystem hosts a number of virtual machines so the file system is
in use, although the VMs individually tend not to be overly active.

VMs tend to be in subvolumes, and some of those subvolumes have snapshots.

This time, I increased it by another 800GB, and it it has hung for many
hours (over night) with flush-btrfs-4 near 100% cpu all that time.

I'm not clear at this point that it will finish or where to go from here.

Any pointers would be much appreciated.

Thanks,

-john (newbie to BTRFS)


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] Revert Btrfs: remove transaction from btrfs send

2014-02-10 Thread Josef Bacik



On 02/08/2014 10:46 AM, Wang Shilong wrote:

From: Wang Shilong wangsl.f...@cn.fujitsu.com

This reverts commit 41ce9970a8a6a362ae8df145f7a03d789e9ef9d2.
Previously i was thinking we can use readonly root's commit root
safely while it is not true, readonly root may be cowed with the
following cases.

1.snapshot send root will cow source root.
2.balance,device operations will also cow readonly send root
to relocate.

So i have two ideas to make us safe to use commit root.

--approach 1:
make it protected by transaction and end transaction properly and we research
next item from root node(see btrfs_search_slot_for_read()).

--approach 2:
add another counter to local root structure to sync snapshot with send.
and add a global counter to sync send with exclusive device operations.

So with approach 2, send can use commit root safely, because we make sure
send root can not be cowed during send. Unfortunately, it make codes *ugly*
and more complex to maintain.

To make snapshot and send exclusively, device operations and send operation
exclusively with each other is a little confusing for common users.

So why not drop into previous way.

Cc: Josef Bacik jba...@fb.com
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
Josef, if we reach agreement to adopt this approach, please revert
Filipe's patch(Btrfs: make some tree searches in send.c more efficient)
from btrfs-next.


I agree, I'll leave Filipe's patch alone but I'll drop my search commit 
root patch since we don't need it any more.  Do you want me to take this 
or do you want to resubmit without the rfc?  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What to do about df and btrfs fi df

2014-02-10 Thread Josef Bacik

Hello,

So first of all this is going to get a lot of responses, so straight 
away I'm only going to consider your opinion if I recognize your name 
and think you are a sane person.  This basically means any big 
contributors and we'll make sanity exceptions for cwillu.


These are just broad strokes, let us not get bogged down in the details, 
I just want to come to a consensus on how things _generally_ should be 
portrayed to the user.  We can worry about implementation details once 
we agree on the direction we want to go.


We all know space is a loaded question with btrfs, so I'm just going to 
explain the reasoning of why we chose what we chose originally and then 
offer the direction we should go in.  If you agree say yay, if not 
please provide a very concise alternative suggestion with a very short 
explanation of why it is better than I'm suggesting.  I'm not looking to 
spend a whole lot of time this problem.


Also this isn't going to address b_avail, cause frankly that is some 
fucking voodoo right there, suffice it to say we will just adjust 
b_avail based on how we should represent total and used.


= ye olde df =

I don't remember what we did originally, but IIRC we would only show 
used space from the block groups and would show the entire size of the 
fs.  So for example with two 1 tb drives in RAID1 you'd see ENOSPC and 
look at df and it would show total of 2TB and used at 1TB.  Obviously 
not good, so we switched to the mechanism we have today, which is you 
see 2TB for total, you see 2TB for used and you see 0 for available.  We 
just scaled up the used and available based on your raid multiplier.


= btrfs fi df =

I made this for me because of ENOSPC issues but of course it's also 
really useful for users.  It is just a dump of the block group 
information and their flags, so really just spits out bytes_used and 
total_bytes and flags.  Because at the block_group/space_info level in 
btrfs we don't care about how much actual space is taken up this number 
is not adjusted for RAID values, and these numbers are reflected in the 
tools output.  So if you have RAID1 you need to mentally multiply the 
Total and Used values by 2 because that is how much actual space is 
being used.


=  What to do moving forward =

Flip what both of these do.  Do not multiply for normal df, and multiply 
for btrfs fi df.


= New and improved df =

Since this is the lowest common denominator we should just spit out how 
much space is used based on the block groups and then divide the 
remaining space that hasn't been allocated yet by the raid multiplier.


This is going to be kind of tricky once we do per-subvolume RAID levels, 
but this falls under the b_avail voodoo which is just a guess anyway, so 
for this we will probably take the biggest multiplier and use that to 
show how much available space you have.


This way with RAID1 it shows you have 1tb of total space and you've used 
1tb of space.


= New and improved btrfs fi df =

Since people using this tool are already going to be better informed and 
since we are already given the block group flags we can go ahead and do 
the raid multiplier in btrfs-progs and spit out the adjusted numbers 
rather than the raw numbers we get from the ioctl.  This will just be a 
progs thing and that way we can possibly add an option to not apply the 
multipliers and just get the raw output.


= Conclusion =

Let me know if this is acceptable to everybody.  Remember this is just 
broad strokes, keep your responses short and simple or I simply won't 
read them.  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: system stuck with flush-btrfs-4 at 100% after filesystem resize

2014-02-10 Thread Josef Bacik



On 02/08/2014 01:36 PM, John Navitsky wrote:

Hello,

I have a large file system that has been growing.  We've resized it a
couple of times with the following approach:

   lvextend -L +800G /dev/raid/virtual_machines
   btrfs filesystem resize +800G /vms

I think the FS started out at 200G, we increased it by 200GB a time or
two, then by 800GB and everything worked fine.

The filesystem hosts a number of virtual machines so the file system is
in use, although the VMs individually tend not to be overly active.

VMs tend to be in subvolumes, and some of those subvolumes have snapshots.

This time, I increased it by another 800GB, and it it has hung for many
hours (over night) with flush-btrfs-4 near 100% cpu all that time.

I'm not clear at this point that it will finish or where to go from here.

Any pointers would be much appreciated.

Thanks,

-john (newbie to BTRFS)


 procedure log --

romulus:/home/users/johnn # lvextend -L +800G /dev/raid/virtual_machines
romulus:/home/users/johnn #  btrfs filesystem resize +800G /vms
Resize '/vms' of '+800G'
[hangs]


top - 12:21:53 up 136 days,  2:45, 13 users,  load average: 30.39,
30.37, 30.37
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.4 us,  2.3 sy,  0.0 ni, 95.1 id,  0.1 wa,  0.0 hi,  0.0 si,
  0.0 st
MiB Mem:129147 total,   127427 used, 1720 free,  264 buffers
MiB Swap:   262143 total,  661 used,   261482 free,93666 cached

PID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEM TIME+ COMMAND
  48809 root  20   0 000 R  99.3  0.0   1449:14
flush-btrfs-4

--- misc info ---

romulus:/home/users/johnn # cat /etc/SuSE-release
openSUSE 12.3 (x86_64)
VERSION = 12.3
CODENAME = Dartmouth
romulus:/home/users/johnn # uname -a
Linux romulus.us.redacted.com 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May
31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux
romulus:/home/users/johnn #


Found your problem!  Basically if you are going to run btrfs you should 
at the very least keep up with the stable kernels.  3.11.whatever is 
fine, 3.12.whatever is better.  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: system stuck with flush-btrfs-4 at 100% after filesystem resize

2014-02-10 Thread John Navitsky

On 2/10/2014 8:43 AM, Josef Bacik wrote:

On 02/08/2014 01:36 PM, John Navitsky wrote:



romulus:/home/users/johnn # cat /etc/SuSE-release
openSUSE 12.3 (x86_64)
VERSION = 12.3
CODENAME = Dartmouth
romulus:/home/users/johnn # uname -a
Linux romulus.us.redacted.com 3.7.10-1.16-desktop #1 SMP PREEMPT Fri May
31 20:21:23 UTC 2013 (97c14ba) x86_64 x86_64 x86_64 GNU/Linux
romulus:/home/users/johnn #


Found your problem!  Basically if you are going to run btrfs you should
at the very least keep up with the stable kernels.  3.11.whatever is
fine, 3.12.whatever is better.  Thanks,

Josef


Thanks for the feedback.

-john

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4][RFC] btrfs: export global block reserve size as space_info

2014-02-10 Thread Chris Mason

On 02/07/2014 08:34 AM, David Sterba wrote:

Introduce a block group type bit for a global reserve and fill the space
info for SPACE_INFO ioctl. This should replace the newly added ioctl
(01e219e8069516cdb98594d417b8bb8d906ed30d) to get just the 'size' part
of the global reserve, while the actual usage can be now visible in the
'btrfs fi df' output during ENOSPC stress.

The unpatched userspace tools will show the blockgroup as 'unknown'.



This wasn't in my rc2 pull because I wanted to sync up with Jeff on it. 
 I like the idea of combining this into SPACE_INFO, any objections?


-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread Hugo Mills
tl;dr: Yes to proposed df changes. Keep btrfs fi df as-is.

On Mon, Feb 10, 2014 at 11:41:51AM -0500, Josef Bacik wrote:
[snip]
 =  What to do moving forward =
 
 Flip what both of these do.  Do not multiply for normal df, and
 multiply for btrfs fi df.
 
 = New and improved df =
 
 Since this is the lowest common denominator we should just spit out
 how much space is used based on the block groups and then divide the
 remaining space that hasn't been allocated yet by the raid
 multiplier.
 
 This is going to be kind of tricky once we do per-subvolume RAID
 levels, but this falls under the b_avail voodoo which is just a
 guess anyway, so for this we will probably take the biggest
 multiplier and use that to show how much available space you have.

   Biggest multiplier leads to the pessimistic estimate, which is what
I'd prefer to see here, so that's good. Agree with this.

 This way with RAID1 it shows you have 1tb of total space and you've
 used 1tb of space.
 
 = New and improved btrfs fi df =
 
 Since people using this tool are already going to be better informed
 and since we are already given the block group flags we can go ahead
 and do the raid multiplier in btrfs-progs and spit out the adjusted
 numbers rather than the raw numbers we get from the ioctl.  This
 will just be a progs thing and that way we can possibly add an
 option to not apply the multipliers and just get the raw output.

   Keep this unchanged, IMO.

(a) I quite like the non-multiplied version as it is, as it gives you
the quantities of real, actual data stored -- the value you
generally care about anyway (how much stuff do I have on here?).

(b) Using the non-multiplied version here as well as above would then
give gasp comparable values for btrfs fi df and Plain Old df.
Less confusion all round, I think.

(c) The difficulty with using multiplied values is the behaviour of
parity RAID on filesystems with different sized devices: there
isn't a single multiplier that will give an accurate answer at
all. (Detailed arguments available on application ;) )

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- Can I offer you anything? Tea? Seedcake? ---
 Glass of Amontillado?  


signature.asc
Description: Digital signature


Re: What to do about df and btrfs fi df

2014-02-10 Thread cwillu
I concur.

The regular df data used number should be the amount of space required
to hold a backup of that content (assuming that the backup maintains
reflinks and compression and so forth).

There's no good answer for available space; the statfs syscall isn't
rich enough to cover all the bases even in the face of dup metadata
and single data (i.e., the common case), and a truly conservative
estimate (report based on the highest-usage raid level in use) would
report space/2 on that same common case.  Highest-usage data raid
level in use is probably the best compromise, with a big warning that
that many large numbers of small files will not actually fit, posted
in some mythical place that users look.

I would like to see the information from btrfs fi df and btrfs fi show
summarized somewhere (ideally as a new btrfs fi df output), as both
sets of numbers are really necessary, or at least have btrfs fi df
include the amount of space not allocated to a block group.

Re regular df: are we adding space allocated to a block group (raid1,
say) but not in actual use in a file as the N/2 space available in the
block group, or the N space it takes up on disk?  This probably
matters a bit less than it used to, but if it's N/2, that leaves us
open to empty filesystem, 100GB free, write a 80GB file and then
delete it, wtf, only 60GB free now? reporting issues.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread Josef Bacik



On 02/10/2014 01:24 PM, cwillu wrote:

I concur.

The regular df data used number should be the amount of space required
to hold a backup of that content (assuming that the backup maintains
reflinks and compression and so forth).

There's no good answer for available space; the statfs syscall isn't
rich enough to cover all the bases even in the face of dup metadata
and single data (i.e., the common case), and a truly conservative
estimate (report based on the highest-usage raid level in use) would
report space/2 on that same common case.  Highest-usage data raid
level in use is probably the best compromise, with a big warning that
that many large numbers of small files will not actually fit, posted
in some mythical place that users look.

I would like to see the information from btrfs fi df and btrfs fi show
summarized somewhere (ideally as a new btrfs fi df output), as both
sets of numbers are really necessary, or at least have btrfs fi df
include the amount of space not allocated to a block group.

Re regular df: are we adding space allocated to a block group (raid1,
say) but not in actual use in a file as the N/2 space available in the
block group, or the N space it takes up on disk?  This probably
matters a bit less than it used to, but if it's N/2, that leaves us
open to empty filesystem, 100GB free, write a 80GB file and then
delete it, wtf, only 60GB free now? reporting issues.



The only case we add the actual allocated chunk space is for metadata, 
for data we only add the actual used number.  So say say you write 80gb 
file and then delete it but during the writing we allocated a 1 gig 
chunk for metadata you'll see only 99gb free, make sense?  We could 
(should?) roll this into the b_avail magic and make used really only 
reflect data usage, opinions on this?  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread cwillu
IMO, used should definitely include metadata, especially given that we
inline small files.

I can convince myself both that this implies that we should roll it
into b_avail, and that we should go the other way and only report the
actual used number for metadata as well, so I might just plead
insanity here.

On Mon, Feb 10, 2014 at 12:28 PM, Josef Bacik jba...@fb.com wrote:


 On 02/10/2014 01:24 PM, cwillu wrote:

 I concur.

 The regular df data used number should be the amount of space required
 to hold a backup of that content (assuming that the backup maintains
 reflinks and compression and so forth).

 There's no good answer for available space; the statfs syscall isn't
 rich enough to cover all the bases even in the face of dup metadata
 and single data (i.e., the common case), and a truly conservative
 estimate (report based on the highest-usage raid level in use) would
 report space/2 on that same common case.  Highest-usage data raid
 level in use is probably the best compromise, with a big warning that
 that many large numbers of small files will not actually fit, posted
 in some mythical place that users look.

 I would like to see the information from btrfs fi df and btrfs fi show
 summarized somewhere (ideally as a new btrfs fi df output), as both
 sets of numbers are really necessary, or at least have btrfs fi df
 include the amount of space not allocated to a block group.

 Re regular df: are we adding space allocated to a block group (raid1,
 say) but not in actual use in a file as the N/2 space available in the
 block group, or the N space it takes up on disk?  This probably
 matters a bit less than it used to, but if it's N/2, that leaves us
 open to empty filesystem, 100GB free, write a 80GB file and then
 delete it, wtf, only 60GB free now? reporting issues.


 The only case we add the actual allocated chunk space is for metadata, for
 data we only add the actual used number.  So say say you write 80gb file and
 then delete it but during the writing we allocated a 1 gig chunk for
 metadata you'll see only 99gb free, make sense?  We could (should?) roll
 this into the b_avail magic and make used really only reflect data usage,
 opinions on this?  Thanks,

 Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread Josef Bacik



On 02/10/2014 01:36 PM, cwillu wrote:

IMO, used should definitely include metadata, especially given that we
inline small files.

I can convince myself both that this implies that we should roll it
into b_avail, and that we should go the other way and only report the
actual used number for metadata as well, so I might just plead
insanity here.



I could be convinced to do this.  So we have

total: (total disk bytes) / (raid multiplier)
used: (total used in data block groups) +
(total used in metadata block groups)
avail: total - (total used in data block groups +
total metadata block groups)

That seems like the simplest to code up.  Then we can argue about 
whether to use the total metadata size or just the used metadata size 
for b_avail.  Seem reasonable?


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread cwillu
 IMO, used should definitely include metadata, especially given that we
 inline small files.

 I can convince myself both that this implies that we should roll it
 into b_avail, and that we should go the other way and only report the
 actual used number for metadata as well, so I might just plead
 insanity here.

 I could be convinced to do this.  So we have

 total: (total disk bytes) / (raid multiplier)
 used: (total used in data block groups) +
 (total used in metadata block groups)
 avail: total - (total used in data block groups +
 total metadata block groups)

 That seems like the simplest to code up.  Then we can argue about whether to
 use the total metadata size or just the used metadata size for b_avail.
 Seem reasonable?

I can't think of any situations where this results in tears.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread Hugo Mills
On Mon, Feb 10, 2014 at 01:41:23PM -0500, Josef Bacik wrote:
 
 
 On 02/10/2014 01:36 PM, cwillu wrote:
 IMO, used should definitely include metadata, especially given that we
 inline small files.
 
 I can convince myself both that this implies that we should roll it
 into b_avail, and that we should go the other way and only report the
 actual used number for metadata as well, so I might just plead
 insanity here.
 
 
 I could be convinced to do this.  So we have
 
 total: (total disk bytes) / (raid multiplier)
 used: (total used in data block groups) +
   (total used in metadata block groups)
 avail: total - (total used in data block groups +
   total metadata block groups)
 
 That seems like the simplest to code up.  Then we can argue about
 whether to use the total metadata size or just the used metadata
 size for b_avail.  Seem reasonable?

   My vote on that bikeshed: total metadata size. But I'll accept any
other answer. :)

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Well, you don't get to be a kernel hacker simply by looking ---   
good in Speedos. -- Rusty Russell


signature.asc
Description: Digital signature


Re: [PATCH] xfstests: btrfs/004: fix to make test really work

2014-02-10 Thread Josef Bacik



On 02/10/2014 07:10 AM, Wang Shilong wrote:

From: Wang Shilong wangsl.f...@cn.fujitsu.com

So i was wandering why test 004 could pass my previous wrong
kernel patch while it defenitely should not.

By some debugging, i found here perl script is wrong, we did not
filter out anything and this unit test did not work acutally.so
it came out we will never fail this test.



So now with this patch I'm failing it, is there some btrfs patch I need 
to make it not fail or is it still not supposed to fail normally and is 
this patch broken?  Thanks,


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Error: could not do orphan cleanup -22

2014-02-10 Thread Pavel Volkov
On Monday 10 February 2014 00:20:54 you wrote:
 There was a similar discussion about an error in January 2013 but it related
 to some kernel panic. I don't know if I encountered the same thing.
 
 These errors from system journal bother me:
 
  2月 09 22:18:53 melforce kernel: BTRFS error (device sdb3): Error removing
 orphan entry, stopping orphan cleanup 2月 09 22:18:53 melforce kernel: BTRFS
 critical (device sdb3): could not do orphan cleanup -22
 
 I run kernel 3.12.10.

Some update.
I tested with kernel 3.13.2 and still have the problem.
Also, I don't have the errors in kernel log anymore but now I can't delete my 
snapshots!

melforce mnt # ls btr2
home  vap-snap1  vap-snap2  var

melforce mnt # btrfs sub list btr2
ID 257 gen 6294 top level 5 path home
ID 258 gen 6294 top level 5 path var
ID 939 gen 6153 top level 5 path vap-snap1
ID 940 gen 6154 top level 5 path vap-snap2

melforce mnt # btrfs sub delete btr2/var-snap2
ERROR: error accessing 'btr2/var-snap2'

My /mnt/btr2/var is also mounted on /var.

There's plenty of space left on device (only ~590 GB allocated on 2.6 TB 
volume):

# btrfs fi df btr2
Data, single: total=591.01GiB, used=590.64GiB
System, DUP: total=8.00MiB, used=96.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=1.50GiB, used=989.41MiB
Metadata, single: total=8.00MiB, used=0.00

# btrfs fi show 
Label: melforce_hdd  uuid: c3f3a649-d8c3-49e1-9962-9b3ca9f54f1d
Total devices 1 FS bytes used 591.61GiB
devid1 size 2.61TiB used 594.04GiB path /dev/sdb3
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] xfstests: Btrfs: add test for large metadata blocks

2014-02-10 Thread Josef Bacik



On 02/08/2014 03:30 AM, Koen De Wit wrote:

Tests Btrfs filesystems with all possible metadata block sizes, by
setting large extended attributes on files.

Signed-off-by: Koen De Wit koen.de@oracle.com
---

v1-v2:
 - Fix indentation: 8 spaces instead of 4
 - Move _scratch_unmount to end of loop, add _check_scratch_fs
 - Sending failure messages of mkfs.btrfs to output instead of
   $seqres.full

diff --git a/tests/btrfs/036 b/tests/btrfs/036
new file mode 100644
index 000..b14697d
--- /dev/null
+++ b/tests/btrfs/036
@@ -0,0 +1,137 @@
+#! /bin/bash
+# FS QA Test No. 036
+#
+# Tests large metadata blocks in btrfs, which allows large extended
+# attributes.
+#
+#---
+# Copyright (c) 2014, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+status=1   # failure is the default!
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_need_to_be_root
+
+rm -f $seqres.full
+
+pagesize=`$here/src/feature -s`
+pagesize_kb=`expr $pagesize / 1024`
+
+# Test all valid leafsizes
+for leafsize in `seq $pagesize_kb $pagesize_kb 64`; do
+_scratch_mkfs -l ${leafsize}K /dev/null
+_scratch_mount
+# Calculate the size of the extended attribute value, leaving
+# 512 bytes for other metadata.
+xattr_size=`expr $leafsize \* 1024 - 512`
+
+touch $SCRATCH_MNT/emptyfile
+# smallfile will be inlined, bigfile not.
+$XFS_IO_PROG -f -c pwrite 0 100 $SCRATCH_MNT/smallfile \
+/dev/null
+$XFS_IO_PROG -f -c pwrite 0 9000 $SCRATCH_MNT/bigfile \
+/dev/null
+ln -s $SCRATCH_MNT/bigfile $SCRATCH_MNT/bigfile_softlink
+
+files=(emptyfile smallfile bigfile bigfile_softlink)
+chars=(a b c d)
+for i in `seq 0 1 3`; do
+char=${chars[$i]}
+file=$SCRATCH_MNT/${files[$i]}
+lnkfile=${file}_hardlink
+ln $file $lnkfile
+xattr_value=`head -c $xattr_size  /dev/zero \
+| tr '\0' $char`
+
+set_md5=`echo -n $xattr_value | md5sum`
+${ATTR_PROG} -Lq -s attr_$char -V $xattr_value $file
+get_md5=`${ATTR_PROG} -Lq -g attr_$char $file | md5sum`
+get_ln_md5=`${ATTR_PROG} -Lq -g attr_$char $lnkfile \
+| md5sum`
+
+# Using md5sums for comparison instead of the values
+# themselves because bash command lines cannot be larger
+# than 64K chars.
+if [ $set_md5 != $get_md5 ]; then
+echo -n Got unexpected xattr value for 
+echo -n attr_$char from file ${file}. 
+echo (leafsize is ${leafsize}K)
+fi
+if [ $set_md5 != $get_ln_md5 ]; then
+echo -n Value for attr_$char differs for 
+echo -n $file and ${lnkfile}. 
+echo (leafsize is ${leafsize}K)
+fi
+done
+
+# Test attributes with a size larger than the leafsize.
+# Should result in an error.
+if [ $leafsize -lt 64 ]; then
+# Bash command lines cannot be larger than 64K
+# characters, so we do not test attribute values
+# with a size 64KB.
+xattr_size=`expr $leafsize \* 1024 + 512`
+xattr_value=`head -c $xattr_size  /dev/zero | tr '\0' x`
+${ATTR_PROG} -q -s attr_toobig -V $xattr_value \
+$SCRATCH_MNT/emptyfile  $seqres.full 21
+if [ $? -eq 0 ]; then
+echo -n Expected error, xattr_size is bigger 
+echo than ${leafsize}K
+fi
+fi
+
+_scratch_unmount /dev/null 21
+_check_scratch_fs
+done
+
+_scratch_mount
+
+# Illegal attribute name (more than 256 characters)
+attr_name=`head -c 260  /dev/zero | tr 

Re: Error: could not do orphan cleanup -22

2014-02-10 Thread Pavel Volkov
Some more update.
I checked the FS with btrfsck:

checking extents
ref mismatch on [17018880 8192] extent item 1, found 2
Incorrect local backref count on 17018880 root 258 owner 826 offset 0 found 2 
wanted 1 back 0x961b268
backpointer mismatch on [17018880 8192]
ref mismatch on [17027072 8192] extent item 1, found 2
Incorrect local backref count on 17027072 root 258 owner 827 offset 0 found 2 
wanted 1 back 0x93ce988
backpointer mismatch on [17027072 8192]
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/sdb3
UUID: c3f3a649-d8c3-49e1-9962-9b3ca9f54f1d
free space inode generation (0) did not match free space cache generation (6151)
free space inode generation (0) did not match free space cache generation (6151)
found 226798495277 bytes used err is 0
total csum bytes: 619125860
total tree bytes: 1041760256
total fs tree bytes: 270155776
total extent tree bytes: 22577152
btree space waste bytes: 164080420
file data blocks allocated: 1102635630592
 referenced 634893860864
Btrfs v3.12

Is it ok to try repairing it?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Error: could not do orphan cleanup -22

2014-02-10 Thread Josef Bacik



On 02/10/2014 03:53 PM, Pavel Volkov wrote:

Some more update.
I checked the FS with btrfsck:


Build a kernel with this patch applied

http://ur1.ca/glslj

and re-run the mount and when it fails attach dmesg to this email.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] xfstests/btrfs: add a regression test for running snapshot and send concurrently

2014-02-10 Thread Josef Bacik



On 02/07/2014 09:00 AM, Wang Shilong wrote:

From: Wang Shilong wangsl.f...@cn.fujitsu.com

Btrfs would fail to send if snapshot run concurrently, this test is to make
sure we have fixed the bug.




Looks reasonable, ran it with and without the patch and it did as expected.

Reviewed-by: Josef Bacik jba...@fb.com

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: add test for btrfs data corruption when using compression

2014-02-10 Thread Josef Bacik



On 02/08/2014 10:50 AM, Filipe David Borba Manana wrote:

Test for a btrfs data corruption when using compressed files/extents.
Under certain cases, it was possible for reads to return random data
(content from a previously used page) instead of zeroes. This also
caused partial updates to those regions that were supposed to be filled
with zeroes to save random (and invalid) data into the file extents.

This is fixed by the commit for the linux kernel titled:

Btrfs: fix data corruption when reading/updating compressed extents

(https://urldefense.proofpoint.com/v1/url?u=https://patchwork.kernel.org/patch/3610391/k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=JRaF%2BUY%2F2k%2BBfF9nTx3Iwl5JZWNCwew%2BI%2Fw%2B%2BfuDrgc%3D%0As=4a033ea8f3cf1f28794e90fcf16ea553766bb1ea83e10fc904182a8f56435eef)



Ran with and without the corresponding fix and all worked as expected. 
You can add


Reviewed-by: Josef Bacik jba...@fb.com

Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs/004: fix to make test really work

2014-02-10 Thread Dave Chinner
On Mon, Feb 10, 2014 at 08:10:56PM +0800, Wang Shilong wrote:
 From: Wang Shilong wangsl.f...@cn.fujitsu.com
 
 So i was wandering why test 004 could pass my previous wrong
 kernel patch while it defenitely should not.
 
 By some debugging, i found here perl script is wrong, we did not
 filter out anything and this unit test did not work acutally.so
 it came out we will never fail this test.
 
 Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
 ---
  tests/btrfs/004 | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)
  mode change 100755 = 100644 tests/btrfs/004
 
 diff --git a/tests/btrfs/004 b/tests/btrfs/004
 old mode 100755
 new mode 100644
 index 14da9f1..17a6e34
 --- a/tests/btrfs/004
 +++ b/tests/btrfs/004
 @@ -57,10 +57,9 @@ _require_command /usr/sbin/filefrag
  
  rm -f $seqres.full
  
 -FILEFRAG_FILTER='if (/, blocksize (\d+)/) {$blocksize = $1; next} ($ext, '\
 -'$logical, $physical, $expected, $length, $flags) = (/^\s*(\d+)\s+(\d+)'\
 -'\s+(\d+)\s+(?:(\d+)\s+)?(\d+)\s+(.*)/) or next; $flags =~ '\
 -'/(?:^|,)inline(?:,|$)/ and next; print $physical * $blocksize, #, '\
 +FILEFRAG_FILTER='if (/blocks of (\d+) bytes/) {$blocksize = $1; next} ($ext, 
 '\
 +'$logical, $physical, $length) = (/^\s*(\d+):\s+(\d+)..\s+\d+:'\
 +'\s+(\d+)..\s+\d+:\s+(\d+):/) or next; print $physical * $blocksize, #, '\
  '$length * $blocksize, #, $logical * $blocksize,  '

Oh, boy, who allowed that mess to pass review? Please format this in
a readable manner while you are changing it.

FILEFRAG_FILTER='
if (/blocks of (\d+) bytes/) {  \
$blocksize = $1;\
next;   \
}
.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] xfstests: Btrfs: add test for large metadata blocks

2014-02-10 Thread Koen De Wit
Tests Btrfs filesystems with all possible metadata block sizes, by
setting large extended attributes on files.

Signed-off-by: Koen De Wit koen.de@oracle.com
---

v1-v2:
- Fix indentation: 8 spaces instead of 4
- Move _scratch_unmount to end of loop, add _check_scratch_fs
- Sending failure messages of mkfs.btrfs to output instead of
  $seqres.full
v2-v3:
- Sending the md5sums of the retrieved attribute values to the
  output instead of comparing them to the md5sum of the original
  value
- Always testing attribute values of 4, 8, 12, ... up to 64 KB
  regardless of the pagesize, to make the golden output independent
  of the pagesize
- Sending the output of mkfs.btrfs with illegal leafsize to
  $seqres.full and checking the return code
- Using more uniform variable names: pagesize/pagesize_kb, leafsize/
  leafsize_kb, attrsize/attrsize_kb

diff --git a/tests/btrfs/036 b/tests/btrfs/036
new file mode 100644
index 000..fb3e987
--- /dev/null
+++ b/tests/btrfs/036
@@ -0,0 +1,125 @@
+#! /bin/bash
+# FS QA Test No. 036
+#
+# Tests large metadata blocks in btrfs, which allows large extended
+# attributes.
+#
+#---
+# Copyright (c) 2014, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+
+here=`pwd`
+status=1   # failure is the default!
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_math
+_need_to_be_root
+
+rm -f $seqres.full
+
+pagesize=`$here/src/feature -s`
+pagesize_kb=`expr $pagesize / 1024`
+
+# Test all valid leafsizes
+for attrsize_kb in `seq 4 4 64`; do
+# The leafsize should be a multiple of the pagesize, equal to or
+# greater than the attribute size.
+leafsize_kb=$(_math ($attrsize_kb + $pagesize_kb - 1) / \
+$pagesize_kb * $pagesize_kb);
+echo Testing with attrsize ${attrsize_kb}K :
+
+_scratch_mkfs -l ${leafsize_kb}K /dev/null
+_scratch_mount
+# Calculate the size of the extended attribute value, leaving
+# 512 bytes for other metadata.
+attrsize=`expr $attrsize_kb \* 1024 - 512`
+
+touch $SCRATCH_MNT/emptyfile
+# smallfile will be inlined, bigfile not.
+$XFS_IO_PROG -f -c pwrite 0 100 $SCRATCH_MNT/smallfile \
+/dev/null
+$XFS_IO_PROG -f -c pwrite 0 9000 $SCRATCH_MNT/bigfile \
+/dev/null
+ln -s $SCRATCH_MNT/bigfile $SCRATCH_MNT/bigfile_softlink
+
+files=(emptyfile smallfile bigfile bigfile_softlink)
+chars=(a b c d)
+for i in `seq 0 1 3`; do
+char=${chars[$i]}
+file=$SCRATCH_MNT/${files[$i]}
+lnkfile=${file}_hardlink
+ln $file $lnkfile
+xattr_value=`head -c $attrsize  /dev/zero \
+| tr '\0' $char`
+
+echo -n $xattr_value | md5sum
+${ATTR_PROG} -Lq -s attr_$char -V $xattr_value $file
+${ATTR_PROG} -Lq -g attr_$char $file | md5sum
+${ATTR_PROG} -Lq -g attr_$char $lnkfile | md5sum
+done
+
+# Test attributes with a size larger than the leafsize.
+# Should result in an error.
+if [ $leafsize_kb -lt 64 ]; then
+# Bash command lines cannot be larger than 64K
+# characters, so we do not test attribute values
+# with a size 64KB.
+attrsize=`expr $attrsize_kb \* 1024 + 512`
+xattr_value=`head -c $attrsize  /dev/zero | tr '\0' x`
+${ATTR_PROG} -q -s attr_toobig -V $xattr_value \
+$SCRATCH_MNT/emptyfile 21 | _filter_scratch
+fi
+
+_scratch_unmount /dev/null 21
+_check_scratch_fs
+done
+
+_scratch_mount
+
+# Illegal attribute name (more than 256 characters)
+attr_name=`head -c 260  /dev/zero | tr '\0' n`
+${ATTR_PROG} -s $attr_name -V attribute_name_too_big \
+$SCRATCH_MNT/emptyfile 21 | head -n 1
+
+_scratch_unmount
+

Re: [PATCH] xfstests: Btrfs: add test for large metadata blocks

2014-02-10 Thread Koen De Wit


On 02/10/2014 12:02 AM, Dave Chinner wrote:

On Sat, Feb 08, 2014 at 09:30:51AM +0100, Koen De Wit wrote:

On 02/07/2014 11:49 PM, Dave Chinner wrote:

On Fri, Feb 07, 2014 at 06:14:45PM +0100, Koen De Wit wrote:
echo -n $xattr_value | md5sum
${ATTR_PROG} -Lq -s attr_$char -V $xattr_value $file
${ATTR_PROG} -Lq -g attr_$char $file | md5sum
${ATTR_PROG} -Lq -g attr_$char $lnkfile | md5sum

is all that neds to be done here.

The problem with this is that the length of the output will depend on the page 
size. The code above runs for every valid leafsize, which can be any multiple 
of the page size up to 64KB, as defined in the loop initialization:
 for leafsize in `seq $pagesize_kb $pagesize_kb 64`; do

That's only a limit on the mkfs leafsize parameter, yes? An the
limiation is that the leaf size can't be smaller than page size?

So really, the attribute sizes that are being tested are independent
of the mkfs parameters being tested. i.e:

for attrsize in `seq 4 4 64`; do
if [ $attrsize -lt $pagesize ]; then
leafsize=$pagesize
else
leafsize=$attrsize
fi
$BTRFS_MKFS_PROG -l $leafsize $SCRATCH_DEV

And now the test executes a fixed loop, testing the same attribute
sizes on all the filesystems under test. i.e. the attribute sizes
being tested are *independent* of the mkfs parameters being tested.
Always test the same attribute sizes, the mkfs parameters simply
vary by page size.


OK, thanks for the suggestion!  I implemented it like this in v3, I just 
changed the calculation of the leafsize because it must be a multiple of the 
pagesize. (A leafsize of 12KB is not valid for systems with 8KB pages.)


+_scratch_unmount + +# Some illegal leafsizes + +_scratch_mkfs
-l 0 2 $seqres.full +echo $?

Same again - you are dumping the error output into a different
file, then detecting the error manually. pass the output of
_scratch_mkfs through a filter, and let errors cause golden
output mismatches.

I did this to make the golden output not depend on the output of
mkfs.btrfs, inspired by
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=commit;h=fd7a8e885732475c17488e28b569ac1530c8eb59
and
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=commit;h=78d86b996c9c431542fdbac11fa08764b16ceb7d
However, in my opinion the test should simply be updated if the
output of mkfs.btrfs changes, so I agree with you and I fixed this
in v2.

While I agree with the sentiment, I'm questioning the
implementation. i.e. you've done this differently to every other
test that needs to check for failures. run_check woul dbe just
fine, as would be simply filtering the output of mkfs.


run_check will make the test fail if the return code differs from 0, and Josef brought up 
an example scenario (MKFS_OPTIONS=-O skinny-metadata) where mkfs.btrfs 
produces additional output.

In v3, I implemented the failure check similar to btrfs/022:

_scratch_mkfs -l $1 $seqres.full 21
[ $? -ne 0 ] || _fail '$1' is an illegal value for the \
leafsize option, mkfs should have failed.

Is this the right way?


Thanks,
Koen.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Error: could not do orphan cleanup -22

2014-02-10 Thread Pavel Volkov
On Monday 10 February 2014 16:13:40 Josef Bacik wrote:
 Build a kernel with this patch applied
 
 http://ur1.ca/glslj
 
 and re-run the mount and when it fails attach dmesg to this email.  Thanks,

I don't see these new messages nor the previous -22 messages in dmesg now.
Only the access problem:

melforce mnt # ls btr2
home  vap-snap1  vap-snap2  var

melforce mnt # ls btr2/var-snap1
ls: cannot access btr2/var-snap1: No such file or directory

melforce mnt # ls btr2/var
cache  db  empty  lib  lock  log  mail  nmbd  run  spool  src  tmp  www
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] xfstests: Btrfs: add test for large metadata blocks

2014-02-10 Thread Dave Chinner
On Mon, Feb 10, 2014 at 10:39:22PM +0100, Koen De Wit wrote:
 Tests Btrfs filesystems with all possible metadata block sizes, by
 setting large extended attributes on files.
 
 Signed-off-by: Koen De Wit koen.de@oracle.com

 +
 +_test_illegal_leafsize() {
 +_scratch_mkfs -l $1 $seqres.full 21
 +[ $? -ne 0 ] || _fail '$1' is an illegal value for the \
 +leafsize option, mkfs should have failed.
 +}

You just re-implemented run_check

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread Goffredo Baroncelli
On 02/10/2014 06:06 PM, Hugo Mills wrote:
Biggest multiplier leads to the pessimistic estimate, which is what
 I'd prefer to see here, so that's good. Agree with this.

I would prefer to use as raid multiplier the ratio 

 total data block groups + total metadata block group
--
disk space allocated for data and metdata block group

I hope that this would work better when we have a filesystem composed 
by small (inlined) files or when we will have per-subvolume RAID levels.

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread Goffredo Baroncelli
On 02/10/2014 05:41 PM, Josef Bacik wrote:
 = New and improved btrfs fi df =
 
 Since people using this tool are already going to be better informed
 and since we are already given the block group flags we can go ahead
 and do the raid multiplier in btrfs-progs and spit out the adjusted
 numbers rather than the raw numbers we get from the ioctl.  This will
 just be a progs thing and that way we can possibly add an option to
 not apply the multipliers and just get the raw output.

In the past [1] I proposed the following approach.

$ sudo btrfs filesystem df /mnt/btrfs1/
Disk size:   400.00GB
Disk allocated:8.04GB
Disk unallocated:391.97GB
Used: 11.29MB
Free (Estimated):250.45GB   (Max: 396.99GB, min: 201.00GB)
Data to disk ratio:  63 %

The space was given in terms of disk space and in terms of 
filesystem space. Other that there is an indication of an estimation of
the free space, with the pessimistic and optimistic values.

[1] See [PATCH V3][BTRFS-PROGS] Enhance btrfs fi df with raid5/6 support 
dated 03/10/2013 01:17 PM

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread cwillu
 In the past [1] I proposed the following approach.

 $ sudo btrfs filesystem df /mnt/btrfs1/
 Disk size:   400.00GB
 Disk allocated:8.04GB
 Disk unallocated:391.97GB
 Used: 11.29MB
 Free (Estimated):250.45GB   (Max: 396.99GB, min: 201.00GB)
 Data to disk ratio:  63 %

Note that a big chunk of the problem is what do we do with the
regular system df output.  I don't mind this as a btrfs fi df summary
though.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/02/14 10:24, cwillu wrote:
 The regular df data used number should be the amount of space required 
 to hold a backup of that content (assuming that the backup maintains 
 reflinks and compression and so forth).
 
 There's no good answer for available space;

I think the flipside of the above works well.  How large a group of files
can you expect to create before you will get ENOSPC?

That for example is the check code does that looks at df - I need to put
in XGB of files - will it fit?  It is also what users do.

This is also what NTFS under Windows does with compression.  If it says
you have 5GB of space left then you will be able to put in 5GB of
uncompressible files.  Of course if they are compressible then you don't
end up consuming all the free space.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlL5dqcACgkQmOOfHg372QQBzgCgyrvj+WnZevjEDdgbAFd2nHaD
H98AoK0ZSDwZJpSMIdXpGYZGjWuPpGTh
=xJ+X
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs/004: fix to make test really work

2014-02-10 Thread Wang Shilong

Hi Josef,

On 02/11/2014 03:18 AM, Josef Bacik wrote:



On 02/10/2014 07:10 AM, Wang Shilong wrote:

From: Wang Shilong wangsl.f...@cn.fujitsu.com

So i was wandering why test 004 could pass my previous wrong
kernel patch while it defenitely should not.

By some debugging, i found here perl script is wrong, we did not
filter out anything and this unit test did not work acutally.so
it came out we will never fail this test.



So now with this patch I'm failing it, is there some btrfs patch I 
need to make it not fail or is it still not supposed to fail normally 
and is this patch broken?  Thanks,
You should not have updated my previous patch(Btrfs: switch to 
btrfs_previous_extent_item()) when you fail this test.
I update your latest btrfs-next which has updated my previous patch and 
it can pass this case, did you miss that?


Thanks,
Wang


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs/004: fix to make test really work

2014-02-10 Thread Wang Shilong

On 02/11/2014 05:39 AM, Dave Chinner wrote:

On Mon, Feb 10, 2014 at 08:10:56PM +0800, Wang Shilong wrote:

From: Wang Shilong wangsl.f...@cn.fujitsu.com

So i was wandering why test 004 could pass my previous wrong
kernel patch while it defenitely should not.

By some debugging, i found here perl script is wrong, we did not
filter out anything and this unit test did not work acutally.so
it came out we will never fail this test.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
---
  tests/btrfs/004 | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)
  mode change 100755 = 100644 tests/btrfs/004

diff --git a/tests/btrfs/004 b/tests/btrfs/004
old mode 100755
new mode 100644
index 14da9f1..17a6e34
--- a/tests/btrfs/004
+++ b/tests/btrfs/004
@@ -57,10 +57,9 @@ _require_command /usr/sbin/filefrag
  
  rm -f $seqres.full
  
-FILEFRAG_FILTER='if (/, blocksize (\d+)/) {$blocksize = $1; next} ($ext, '\

-'$logical, $physical, $expected, $length, $flags) = (/^\s*(\d+)\s+(\d+)'\
-'\s+(\d+)\s+(?:(\d+)\s+)?(\d+)\s+(.*)/) or next; $flags =~ '\
-'/(?:^|,)inline(?:,|$)/ and next; print $physical * $blocksize, #, '\
+FILEFRAG_FILTER='if (/blocks of (\d+) bytes/) {$blocksize = $1; next} ($ext, '\
+'$logical, $physical, $length) = (/^\s*(\d+):\s+(\d+)..\s+\d+:'\
+'\s+(\d+)..\s+\d+:\s+(\d+):/) or next; print $physical * $blocksize, #, '\
  '$length * $blocksize, #, $logical * $blocksize,  '

Oh, boy, who allowed that mess to pass review? Please format this in
a readable manner while you are changing it.

Yeah,  i was thinking to make it more readable while i had sent this out.^_^
Thanks for your comments.

Wang


FILEFRAG_FILTER='
if (/blocks of (\d+) bytes/) {  \
$blocksize = $1;\
next;   \
}
.

Cheers,

Dave.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck does not fix

2014-02-10 Thread Chris Murphy

On Feb 9, 2014, at 1:36 AM, Hendrik Friedel hend...@friedels.name wrote:
 
 Yes, but I can create that space.
 So, for me the next steps would be to:
 -generate enough room on the filesystem
 -btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/BTRFS/Video
 -btrfs device delete /dev/sdc1 /mnt/BTRFS/Video
 
 Right?

No. You said you need to recreate the file system, and only have these two 
devices and therefore must remove one device. You can't achieve that with raid1 
which requires minimum two devices.

-dconvert=single -mconvert=dup -sconvert=dup

 
 next, I'm doing the balance for the subvolume /mnt/BTRFS/backups
 
 You told us above  you deleted that subvolume. So how are you balancing it?
 
 Yes, that was my understanding from my research:
 You tell btrfs, that you want to remove one disc from the filesystem and then 
 balance it to move the data on the remaining disc. I did find this logical. I 
 was expecting that I possibly need a further command to tell btrfs that it's 
 not a raid anymore, but I thought this could also be automagical.
 I understand, that's not the way it is implemented, but it's not a crazy 
 idea, is it?

Well it's not the right way to think that devices are raid1 or raid0. It's the 
data or metadata that has that attribute. And by removing a device you are 
managing devices, not the attribute of data or metadata chunks. Since you're 
already at the minimum number of disks for raid0, that's why conversion is 
needed first.


 
 And also, balance applies to a mountpoint, and even if you mount a
  subvolume to that mountpoint, the whole file system is balanced.
  Not just the mounted subvolume.
 
 That is confusing. (I mean: I understand what you are saying, but it's 
 counterintuitive). Why is this the case?

A subvolume is a file system tree. The data created in that tree is allocated 
to chunks which can contain data from other trees. And balance reads/writes 
chunks. It's not a subvolume aware command.


 
 In parallel, I try to delete /mnt/BTRFS/rsnapshot, but it fails:
  btrfs subvolume delete  /mnt/BTRFS/rsnapshot/
  Delete subvolume '/mnt/BTRFS/rsnapshot'
  ERROR: cannot delete '/mnt/BTRFS/rsnapshot' - Inappropriate ioctl
  for  device
 
 Why's that?
 But even more: How do I free sdc1 now?!
 
 
 Well I'm pretty confused because again, I can't tell if your paths refer to
  subvolumes or if they refer to mount points.
 
 Now I am confused. These paths are the paths to which I mounted the 
 subvolumes:
 my (abbreviated) fstab:
 UUID=xy  /mnt/BTRFS/Video btrfs subvol=Video
 UUID=xy /mnt/BTRFS/rsnapshot btrfs subvol=rsnapshot
 UUID=xy /mnt/BTRFS/backups btrfs subvol=backups
 
 
  The balance and device delete commands all refer to a mount point, which is 
  the path returned by the df command.
 So this:
 /dev/sdb1   5,5T3,5T  2,0T   64% /mnt/BTRFS/Video
 /dev/sdb1   5,5T3,5T  2,0T   64% /mnt/BTRFS/backups
 /dev/sdc1   5,5T3,5T  2,0T   64% /mnt/BTRFS/rsnapshot

You can't delete a mounted subvolume. You'd have to unmount it first. And then 
you'd have to mount a parent subvolume. So if the subvolume you want to delete 
is in the ID 5 subvolume, you must mount that subvolume, for example:
 
mount /dev/sdb1 /mnt/btrfs
btrfs subvolume delete /mnt/btrfs/subvolumetodelete



Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] xfstests: btrfs/004: fix to make test really work

2014-02-10 Thread Josef Bacik


On 02/10/2014 08:22 PM, Wang Shilong wrote:

Hi Josef,

On 02/11/2014 03:18 AM, Josef Bacik wrote:



On 02/10/2014 07:10 AM, Wang Shilong wrote:

From: Wang Shilong wangsl.f...@cn.fujitsu.com

So i was wandering why test 004 could pass my previous wrong
kernel patch while it defenitely should not.

By some debugging, i found here perl script is wrong, we did not
filter out anything and this unit test did not work acutally.so
it came out we will never fail this test.



So now with this patch I'm failing it, is there some btrfs patch I 
need to make it not fail or is it still not supposed to fail normally 
and is this patch broken?  Thanks,
You should not have updated my previous patch(Btrfs: switch to 
btrfs_previous_extent_item()) when you fail this test.
I update your latest btrfs-next which has updated my previous patch 
and it can pass this case, did you miss that?


Hrm I must not have insmod'ed the new module, which now means I have to 
re-run all my tests, sigh.


Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS with RAID1 cannot boot when removing drive

2014-02-10 Thread Chris Murphy

On Feb 9, 2014, at 2:40 PM, Saint Germain saint...@gmail.com wrote:
 
 Then I added another drive for a RAID1 configuration (with btrfs
 balance) and I installed grub on the second hard drive with
 grub-install /dev/sdb.

That can't work on UEFI. UEFI firmware effectively requires a GPT partition map 
and something to serve as an EFI System partition on all bootable drives.

Second there's a difference between UEFI with and without secure boot.

With secure boot you need to copy the files your distro installer puts on the 
target drive EFI System partition to each addition drive's ESP if you want 
multibooting to work in case of disk failure. The grub on each ESP likely looks 
on only its own ESP for a grub.cfg. So that then means having to sync 
grub.cfg's among each disk used for booting. A way around this is to create a 
single grub.cfg that merely forwards to the true grub.cfg. And you can copy 
this forward-only grub.cfg to each ESP. That way the ESP's never need updating 
or syncing again.

Without secure boot, you must umount /boot/efi and mount the ESP for each 
bootable disk is turn, and then merely run:

grub-install

That will cause a core.img to be created for that particular ESP, and it will 
point to the usual grub.cfg location at /boot/grub.



 
 If I boot on sdb, it takes sda1 as the root filesystem
 If I switched the cable, it always take the first hard drive as the
 root filesystem (now sdb)
 If I disconnect /dev/sda, the system doesn't boot with a message
 saying that it hasn't found the UUID:
 
 Scanning for BTRFS filesystems...
 mount: mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c on 
 /root failed: Invalid argument

Well if /dev/sda is missing, and you have an unpartitioned /dev/sdb I don't 
even know how you're getting this far, and it seems like the UEFI computer 
might actually be booting in CSM-BIOS mode which presents a conventional BIOS 
to the operating system. Disintguishing such things gets messy quickly.


 
 Can you tell me what I have done incorrectly ?
 Is it because of UEFI ? If yes I haven't understood how I can correct
 it in a simple way.
 
 As extra question, I don't see also how I can configure the system to
 get the correct swap in case of disk failure. Should I force both swap 
 partition
 to have the same UUID ?

If you're really expecting to create a system that can accept a disk failure 
and continue to work, I don't see how it can depend on swap partitions. It's 
fine to create them, but just realize if they're actually being used and the 
underlying physical device dies, the kernel isn't going to like it.

A possible work around is using an md raid1 partition as swap.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsck does not fix

2014-02-10 Thread Chris Murphy

On Feb 10, 2014, at 6:45 PM, Chris Murphy li...@colorremedies.com wrote:

 
 On Feb 9, 2014, at 1:36 AM, Hendrik Friedel hend...@friedels.name wrote:
 
 Yes, but I can create that space.
 So, for me the next steps would be to:
 -generate enough room on the filesystem
 -btrfs balance start -dconvert=raid1 -mconvert=raid1 /mnt/BTRFS/Video
 -btrfs device delete /dev/sdc1 /mnt/BTRFS/Video
 
 Right?
 
 No. You said you need to recreate the file system, and only have these two 
 devices and therefore must remove one device. You can't achieve that with 
 raid1 which requires minimum two devices.
 
 -dconvert=single -mconvert=dup -sconvert=dup

Actually, I'm reminded with multiple devices that dup might not be possible. 
Instead you might have to using single for all of them. Then remove the device 
you want removed. And then do another conversion for just -mconvert=dup 
-sconvert=dup, and do not specify -dconvert. That way the single metadata 
profile is converted to duplicate.


Chris--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS with RAID1 cannot boot when removing drive

2014-02-10 Thread Saint Germain
Hello Duncan,

What an amazing extensive answer you gave me !
Thank you so much for it.

See my comments below.

On Mon, 10 Feb 2014 03:34:49 + (UTC), Duncan 1i5t5.dun...@cox.net
wrote :

  I am experimenting with BTRFS and RAID1 on my Debian Wheezy (with
  backported kernel 3.12-0.bpo.1-amd64) using a a motherboard with
  UEFI.
 
 My systems don't do UEFI, but I do run GPT partitions and use grub2
 for booting, with grub2-core installed to a BIOS/reserved type
 partition (instead of as an EFI service as it would be with UEFI).
 And I have root filesystem btrfs two-device raid1 mode working fine
 here, tested bootable with only one device of the two available.
 
 So while I can't help you directly with UEFI, I know the rest of it
 can/ does work.
 
 One more thing:  I do have a (small) separate btrfs /boot, actually
 two of them as I setup a separate /boot on each of the two devices in
 ordered to have a backup /boot, since grub can only point to
 one /boot by default, and while pointing to another in grub's rescue
 mode is possible, I didn't want to have to deal with that if the
 first /boot was corrupted, as it's easier to simply point the BIOS at
 a different drive entirely and load its (independently installed and
 configured) grub and /boot.
 

Can you explain why you choose to have a dedicated /boot partition ?
I also read on this thread that it may be better to have a
dedicated /boot partition:
https://bbs.archlinux.org/viewtopic.php?pid=1342893#p1342893


  However I haven't managed to make the system boot when the removing
  the first hard drive.
  
  I have installed Debian with the following partition on the first
  hard drive (no BTRFS subsystem):
  /dev/sda1: for / (BTRFS)
  /dev/sda2: for /home (BTRFS)
  /dev/sda3: for swap
  
  Then I added another drive for a RAID1 configuration (with btrfs
  balance) and I installed grub on the second hard drive with
  grub-install /dev/sdb.
 
 Just for clarification as you don't mention it specifically, altho
 your btrfs filesystem show information suggests you did it this way,
 are your partition layouts identical on both drives?
 
 That's what I've done here, and I definitely find that easiest to
 manage and even just to think about, tho it's definitely not a
 requirement.  But using different partition layouts does
 significantly increase management complexity, so it's useful to avoid
 if possible. =:^)

Yes, the partition layout is exactly the same on both drive (copied
with sfdisk). I also try to keep things simple ;-)

  If I boot on sdb, it takes sda1 as the root filesystem
 
  If I switched the cable, it always take the first hard drive as
  the root filesystem (now sdb)
 
 That's normal /appearance/, but that /appearance/ doesn't fully
 reflect reality.
 
 The problem is that mount output (and /proc/self/mounts), fstab, etc, 
 were designed with single-device filesystems in mind, and
 multi-device btrfs has to be made to fix the existing rules as best
 it can.
 
 So what's actually happening is that the for a btrfs composed of
 multiple devices, since there's only one device slot for the kernel
 to list devices, it only displays the first one it happens to come
 across, even tho the filesystem will normally (unless degraded)
 require that all component devices be available and logically
 assembled into the filesystem before it can be mounted.
 
 When you boot on sdb, naturally, the sdb component of the
 multi-device filesystem that the kernel finds, so it's the one
 listed, even tho the filesystem is actually composed of more devices,
 not just that one.

I am not following you: it seems to be the opposite of what you
describe. If I boot on sdb, I expect sdb1 and sdb2 to be the first
components that the kernel find. However I can see that sda1 and sda2
are used (using the 'mount' command).

 When you switch the cables, the first one is, at
 least on your system, always the first device component of the
 filesystem detected, so it's always the one occupying the single
 device slot available for display, even tho the filesystem has
 actually assembled all devices into the complete filesystem before
 mounting.
 

Normally the 2 hard drive should be exactly the same (or I didn't
understand something) except for the UUID_SUB.
That's why I don't understand if I switch the cable, I should get
exactly the same results with 'mount'.
But that is not the case, the 'mount' command always point to the same
partition:
- without cable switch: sda1 and sda2
- with cable switch: sdb1 and sdb2
Everything happen as if the system is using the UUID_SUB to get his
'favorite' partition.

  If I disconnect /dev/sda, the system doesn't boot with a message
  saying that it hasn't found the UUID:
  
  Scanning for BTRFS filesystems...
  mount:
  mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c
  on /root failed: Invalid argument
  
  Can you tell me what I have done incorrectly ?
  Is it because of UEFI ? If yes I haven't understood how I can
  correct 

Re: What to do about df and btrfs fi df

2014-02-10 Thread cwillu
On Mon, Feb 10, 2014 at 7:02 PM, Roger Binns rog...@rogerbinns.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 10/02/14 10:24, cwillu wrote:
 The regular df data used number should be the amount of space required
 to hold a backup of that content (assuming that the backup maintains
 reflinks and compression and so forth).

 There's no good answer for available space;

 I think the flipside of the above works well.  How large a group of files
 can you expect to create before you will get ENOSPC?

 That for example is the check code does that looks at df - I need to put
 in XGB of files - will it fit?  It is also what users do.

But the answer changes dramatically depending on whether it's large
numbers of small files or a small number of large files, and the
conservative worst-case choice means we report a number that is half
what is probably expected.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS with RAID1 cannot boot when removing drive

2014-02-10 Thread Saint Germain
Hello !

On Mon, 10 Feb 2014 19:18:22 -0700, Chris Murphy
li...@colorremedies.com wrote :

 
 On Feb 9, 2014, at 2:40 PM, Saint Germain saint...@gmail.com wrote:
  
  Then I added another drive for a RAID1 configuration (with btrfs
  balance) and I installed grub on the second hard drive with
  grub-install /dev/sdb.
 
 That can't work on UEFI. UEFI firmware effectively requires a GPT
 partition map and something to serve as an EFI System partition on
 all bootable drives.
 
 Second there's a difference between UEFI with and without secure boot.
 
 With secure boot you need to copy the files your distro installer
 puts on the target drive EFI System partition to each addition
 drive's ESP if you want multibooting to work in case of disk failure.
 The grub on each ESP likely looks on only its own ESP for a grub.cfg.
 So that then means having to sync grub.cfg's among each disk used for
 booting. A way around this is to create a single grub.cfg that merely
 forwards to the true grub.cfg. And you can copy this forward-only
 grub.cfg to each ESP. That way the ESP's never need updating or
 syncing again.
 
 Without secure boot, you must umount /boot/efi and mount the ESP for
 each bootable disk is turn, and then merely run:
 
 grub-install
 
 That will cause a core.img to be created for that particular ESP, and
 it will point to the usual grub.cfg location at /boot/grub.
 

Ok I need to really understand how my motherboard works (new Z87E-ITX).
It is written 64Mb AMI UEFI Legal BIOS, so I thought it was really
UEFI.

 
  
  If I boot on sdb, it takes sda1 as the root filesystem
  If I switched the cable, it always take the first hard drive as the
  root filesystem (now sdb)
  If I disconnect /dev/sda, the system doesn't boot with a message
  saying that it hasn't found the UUID:
  
  Scanning for BTRFS filesystems...
  mount:
  mounting /dev/disk/by-uuid/c64fca2a-5700-4cca-abac-3a61f2f7486c
  on /root failed: Invalid argument
 
 Well if /dev/sda is missing, and you have an unpartitioned /dev/sdb I
 don't even know how you're getting this far, and it seems like the
 UEFI computer might actually be booting in CSM-BIOS mode which
 presents a conventional BIOS to the operating system. Disintguishing
 such things gets messy quickly.
 

/dev/sdb has the same partition as /dev/sda.
Duncan gave me the hint with degraded mode and I managed to boot
(however I had some problem with mounting sda2).

  
  Can you tell me what I have done incorrectly ?
  Is it because of UEFI ? If yes I haven't understood how I can
  correct it in a simple way.
  
  As extra question, I don't see also how I can configure the system
  to get the correct swap in case of disk failure. Should I force
  both swap partition to have the same UUID ?
 
 If you're really expecting to create a system that can accept a disk
 failure and continue to work, I don't see how it can depend on swap
 partitions. It's fine to create them, but just realize if they're
 actually being used and the underlying physical device dies, the
 kernel isn't going to like it.
 
 A possible work around is using an md raid1 partition as swap.
 

I understand. Normally the swap will only be used for hibernating. I
don't expect to use it except perhaps in some extreme case.

Thanks for your help !
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread ronnie sahlberg
On Mon, Feb 10, 2014 at 7:13 PM, cwillu cwi...@cwillu.com wrote:
 On Mon, Feb 10, 2014 at 7:02 PM, Roger Binns rog...@rogerbinns.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 10/02/14 10:24, cwillu wrote:
 The regular df data used number should be the amount of space required
 to hold a backup of that content (assuming that the backup maintains
 reflinks and compression and so forth).

 There's no good answer for available space;

 I think the flipside of the above works well.  How large a group of files
 can you expect to create before you will get ENOSPC?

 That for example is the check code does that looks at df - I need to put
 in XGB of files - will it fit?  It is also what users do.

 But the answer changes dramatically depending on whether it's large
 numbers of small files or a small number of large files, and the
 conservative worst-case choice means we report a number that is half
 what is probably expected.

I don't think that is a problem, as long as the avail guesstimate is
conservative.

Scenario:
A user has 10G of files and df reports that there are 11G available.
I think the expectation is that copying these 10G into the filesystem
will not ENOSPC.
After the copy completes, whether the new avail number is ==1G or 1G
is less important IMHO.

I.e. I like to see df output as a you can write AT LEAST this much
more data until the filesystem is full.


That was my 5 cent.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: system stuck with flush-btrfs-4 at 100% after filesystem resize

2014-02-10 Thread Duncan
John Navitsky posted on Mon, 10 Feb 2014 07:35:32 -0800 as excerpted:

[I rearranged your upside-down posting so the reply comes in context 
after the quote.]

 On 2/8/2014 10:36 AM, John Navitsky wrote:

 I have a large file system that has been growing.  We've resized it a
 couple of times with the following approach:

lvextend -L +800G /dev/raid/virtual_machines
btrfs filesystem resize +800G /vms

 I think the FS started out at 200G, we increased it by 200GB a time or
 two, then by 800GB and everything worked fine.

 The filesystem hosts a number of virtual machines so the file system is
 in use, although the VMs individually tend not to be overly active.

 VMs tend to be in subvolumes, and some of those subvolumes have
 snapshots.

 This time, I increased it by another 800GB, and it it has hung for many
 hours (over night) with flush-btrfs-4 near 100% cpu all that time.

 I'm not clear at this point that it will finish or where to go from
 here.

 Any pointers would be much appreciated.

 As a follow-up, at some point over the weekend things did finish on
 their own:
 
 romulus:/vms/johnn-sles11sp3 # df -h /vms
 Filesystem  Size  Used Avail Use% Mounted on
 /dev/dm-4   2.6T  1.6T  1.1T  60% /vms
 romulus:/vms/johnn-sles11sp3 #
 
 I'd still be interested in any comments about what was going on or
 suggestions.

I'm guessing you don't have the VM images set NOCOW (no-copy-on-write), 
which means over time they'll **HEAVILY** fragment since every time 
something changes in the image and is written back to the file, that 
block is written somewhere else due to COW.  We've had some reports of 
hundreds of thousands of extents in VM files of a few gigs!

It's also worth noting that while NOCOW does normally mean in-place 
writes, a change after a snapshot means unsharing the data since the 
snapshotted data has now diverged, which means mandatory single-shot COW 
in ordered to keep the new change from overwriting the old snapshot 
version.  That of course triggers fragmentation too, since everything 
that changes in the image between snapshots must be written elsewhere, 
altho the fragmentation won't be nearly as fast as the default COW mode 
will.

So what was very likely taking the time was tracking down all those 
potentially hundreds of thousands of fragments/extents in ordered to re-
write the files as triggered by the size increase and presumably the 
physical location on-device.

I'd strongly suggest that you set all VMs NOCOW (chattr +C).  However, 
there's a wrinkle.  In ordered to be effective on btrfs, NOCOW must be 
set on a file while it is still zero-size, before it has data written to 
it.  The easiest way to do that is to set NOCOW on the directory, which 
doesn't really affect the directory itself, but DOES cause all new files 
(and subdirs, so it nests) created in that directory to inherit the NOCOW 
attribute.  Then copy the file in, preferably either catting it in with 
redirection to create/write the file, or copying it from another 
filesystem, such that you know it's actually copying the data and not 
simply hard-linking it, thus ensuring that the new copy is actually a new 
copy, so the NOCOW will actually take effect.

By organizing your VM images into dirs, all with NOCOW set, so the images 
inherit it at creation, you'll save yourself the fragmentation of the 
repeated COW writes.  However, as I mentioned, the first time a block is 
written after a snapshot it's still a COW write, unavoidably so.  Thus, 
I'd suggest keeping btrfs snapshots of your VMs to a minimum (preferably 
0), using ordinary full-copy backups to other media, instead, thus 
avoiding that first COW-after-snapshot effect, too.

Meanwhile, it's worth noting that if a file is written sequentially 
(append only) and not written into, as will typically be the case with 
the VM backups, there's nothing to trigger fragmentation.  So the backups 
don't have to be NOCOW, since they'll be written once and left alone.  
But the actively in-use and thus often written to operational VM images 
should be NOCOW, and preferably not snapshotted, to keep fragmentation to 
a minimum.

Finally, of course you can use btrfs defrag to manually deal with the 
problem.  However, do note that the snapshot aware defrag introduced with 
kernel 3.9 simply does NOT scale well once the number of snapshots 
reaches near 1000, and the snapshot-awareness has just been disabled 
again (in kernel 3.14-rc), until the code can be reworked to scale 
better.  So I'd suggest if you /are/ using snapshots and trying to work 
with defrag, you'll want a very new 3.14-rc kernel in ordered to avoid 
that problem, but avoiding it does come at the cost of losing space 
efficiency when defragging snapshotted btrfs, as the non-snapshot-aware 
version will tend to create separate copies of the data on each snapshot 
it is run on, thus decreasing shared data blocks and increasing space 
usage, perhaps dramatically.

So again, at least for now, and 

Re: BTRFS with RAID1 cannot boot when removing drive

2014-02-10 Thread Duncan
Saint Germain posted on Tue, 11 Feb 2014 04:15:27 +0100 as excerpted:

 Ok I need to really understand how my motherboard works (new Z87E-ITX).
 It is written 64Mb AMI UEFI Legal BIOS, so I thought it was really
 UEFI.

I expect it's truly UEFI.  But from what I've read most UEFI based 
firmware(possibly all in theory, with the caveat that there's bugs and 
some might not actually work as intended due to bugs) on x86/amd64 (as 
opposed to arm) has a legacy-BIOS mode fallback.  Provided it's not in 
secure-boot mode, if the storage devices it is presented don't have a 
valid UEFI config, it'll fall back to legacy-BIOS mode and try to detect 
and boot that.

Which may or may not be what your system is actually doing.  As I said, 
since I've not actually experimented with UEFI here, my practical 
knowledge on it is virtually nil, and I don't claim to have studied the 
theory well enough to deduce in that level of detail what your system is 
doing.  But I know that's how it's /supposed/ to be able to work. =:^)

(FWIW, what I /have/ done, deliberately, is read enough about UEFI to 
have a general feel for it, and to have been previously exposed to the 
ideas for some time, so that once I /do/ have it available and decide 
it's time, I'll be able to come up to speed relatively quickly as I've 
had the general ideas turning over in my head for quite some time 
already, so in effect I'll simply be reviewing the theory and doing the 
lab work, while concurrently making logical connections about how it all 
fits together that only happen once one actually does that lab work.  
I've discovered over the years that this is perhaps my most effective way 
to learn, read about the general principles while not really 
understanding it the first time thru, then come back to it some months or 
years later, and I pick it up real fast, because my subconscious has been 
working on the problem the whole time! Come to think of it, that's 
actually how I handled btrfs, too, trying it at one point and deciding it 
didn't fit my needs at the time, leaving it for awhile, then coming back 
to it later when my needs had changed, but I already had an idea what I 
was doing from the previous try, with the result being I really took to 
it fast, the second time!  =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS with RAID1 cannot boot when removing drive

2014-02-10 Thread Duncan
Saint Germain posted on Tue, 11 Feb 2014 04:15:27 +0100 as excerpted:

 I understand. Normally the swap will only be used for hibernating. I
 don't expect to use it except perhaps in some extreme case.

If hibernate is your main swap usage, you might consider the noauto fstab 
option as well, then specifically swapon the appropriate one in your 
hibernate script since you may well need logic in there to figure out 
which one to use in any case.  I was doing that for awhile.

(I've run my own suspend/hibernate scripts based on the documentation in 
$KERNDIR/Documentation/power/*, for years.  The kernel's docs dir really 
is a great resource for a lot of sysadmin level stuff as well as the 
expected kernel developer stuff.  I think few are aware of just how much 
real useful admin-level information it actually contains. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html