storing metadata on a dedicated device

2012-04-09 Thread Jan Killius
Hello,
I just wanted to ask if storing metadata on a dedicated device is
implemented at the moment ?
It's listed under "Project ideas" and there is supposed to be a patch
but I can't find it anywhere.

Greetings
Jan Killius
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread David Sterba
On Tue, Apr 10, 2012 at 12:32:00AM +0300, Leho Kraav wrote:
> It is also BUG time WITH the patch. Mount succeeds, but "btrfs fi balance
> HOME" gives us:
> 
> Apr 10 00:24:18 server sudo: pam_unix(sudo:session): session opened for user 
> > root by (uid=1000)
> Apr 10 00:24:18 server kernel: [  363.839105] [ cut here > 
> ]
> Apr 10 00:24:18 server kernel: [  363.839163] kernel BUG at > 
> fs/btrfs/volumes.c:2733!

that's

2732 if (!(bctl->flags & BTRFS_BALANCE_RESUME)) {
2733 BUG_ON(ret == -EEXIST);

2734 set_balance_control(bctl);
2735 } else {
2736 BUG_ON(ret != -EEXIST);
2737 spin_lock(&fs_info->balance_lock);
2738 update_balance_args(bctl);
2739 spin_unlock(&fs_info->balance_lock);
2740 }

IIRC somebody reported similar problem recently. It basically means
there's an inconsistent balance state. Adding Ilya to CC.


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Revert "Btrfs: increase the global block reserve estimates"

2012-04-09 Thread David Sterba
On Mon, Apr 09, 2012 at 09:37:08AM +0800, Liu Bo wrote:
> The whole thing is about our overcommit stuff, that is,
> since we are not able to get _precise_ number of reservation right now, we 
> usually
> reserve more than what we need.
> For this, we've done overcommit dance (thanks for Josef's work!) but it's 
> still not
> enough for our reservation when we still have some disk space.
>
> I'm ok with this revert, but since we don't use up all the reserved space in 
> most time,
> I assume the following can be an alternative, thanks,

Unfortunatelly the situation looks not good for 3.3 btrfs users, I'm
looking for something we know that somehow (ie like in 3.2) works and
are able to submit to the stable tree very soon. I'll definitelly test
your alternative and if Chris applies it during the -rc phase we'll have
more time to verify it or you/Josef fix it in another way.


thanks,
david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large metadata mount messages lost after reboot

2012-04-09 Thread David Sterba
On Mon, Apr 09, 2012 at 04:06:56PM -0600, Calvin Morrow wrote:
> After a reboot however, the "flagging fs with big metadata feature"
> message no longer appears on subsequent mounts.

/*
 * flag our filesystem as having big metadata blocks if
 * they are bigger than the page size
 */
if (btrfs_super_leafsize(disk_super) > PAGE_CACHE_SIZE) {
if (!(features & BTRFS_FEATURE_INCOMPAT_BIG_METADATA))
printk(KERN_INFO "btrfs flagging fs with big metadata 
feature\n");
features |= BTRFS_FEATURE_INCOMPAT_BIG_METADATA;
}

translation: it will print the message only the first time it's mounted and the
flag is not already stored on disk.

> Is this expected behavior?

Yes it is.

> Is there a way I can dump filesystem
> information to determine what leaf and node size the filesystem thinks
> it has?

# mkfs.btrfs -n 32k -l 32k image

WARNING! - Btrfs v0.19-152-g1957076 IS STAB^WEXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

fs created label (null) on image
nodesize 32768 leafsize 32768 sectorsize 4096 size 300.00GB
Btrfs v0.19-152-g1957076

# file image
image: BTRFS Filesystem sectorsize 4096, nodesize 32768, leafsize 32768)

and it does it by looking to the first superblock:

/usr/share/misc/magic:
# BTRFS
0x10040 string  _BHRfS_MBTRFS Filesystem
>0x1012bstring  >\0 (label "%s",
>0x10090lelong  x   sectorsize %d,
>0x10094lelong  x   nodesize %d,
>0x10098lelong  x   leafsize %d)


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Large metadata mount messages lost after reboot

2012-04-09 Thread Calvin Morrow
Greetings,

I am testing the 3.4.0(rc) btrfs patches backported to a 3.3.1 kernel.
 I'm particularly interested in the large metadata features for help
with fragmentation issues.  After merging, installing, and booting the
new kernel, I created several btrfs filesystems with leaf size 32768
and node size 32768.

Mounting the filesystem gives the expected behavior with the following
messages in syslog:

Apr  9 14:56:18 app-cluster1 kernel: [ 1872.279286] device fsid
48be2eaf-f2af-444a-a440-ae0ed555f1e7 devid 1 transid 7 /dev/sdd
Apr  9 14:56:18 app-cluster1 kernel: [ 1872.281352] btrfs flagging fs
with big metadata feature

After a reboot however, the "flagging fs with big metadata feature"
message no longer appears on subsequent mounts.

Apr  9 15:03:28 app-cluster1 kernel: [   12.083827] device fsid
48be2eaf-f2af-444a-a440-ae0ed555f1e7 devid 1 transid 11 /dev/sdd
Apr  9 15:03:28 app-cluster1 kernel: [   12.118204] device fsid
078b5f99-89ba-4e46-9ed9-26895d188a3e devid 1 transid 11 /dev/sde
Apr  9 15:03:28 app-cluster1 kernel: [   12.136635] device fsid
ca2fd449-d582-4d46-ab9a-c89c17762f8d devid 1 transid 11 /dev/sdf

I've verified I'm rebooting into the correct kernel:  Linux
app-cluster1 3.3.1.app.btrfs-backports #1 SMP Sun Apr 8 19:33:29 MDT
2012 x86_64 x86_64 x86_64 GNU/Linux

Is this expected behavior?  Is there a way I can dump filesystem
information to determine what leaf and node size the filesystem thinks
it has?

Calvin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Leho Kraav

On 09.04.2012 23:58, Leho Kraav wrote:

On 09.04.2012 17:54, Daniel J Blueman wrote:

On 9 April 2012 22:44, Leho Kraav wrote:


And is the previous filesystem still hosed for good then? Or mounting
the
images with -discard might help?


It seems like the kernel caught and prevented the discard after the
end of the partition, so the data should be fine; scrubbing will tell
you.


Without the patch at least, it's BUG time. This is what happens when
mounting the image.



It is also BUG time WITH the patch. Mount succeeds, but "btrfs fi 
balance HOME" gives us:


Apr 10 00:24:18 server sudo: pam_unix(sudo:session): session opened for 
user root by (uid=1000)
Apr 10 00:24:18 server kernel: [  363.839105] [ cut here 
]
Apr 10 00:24:18 server kernel: [  363.839163] kernel BUG at 
fs/btrfs/volumes.c:2733!
Apr 10 00:24:18 server kernel: [  363.839220] invalid opcode:  [#1] 
PREEMPT SMP
Apr 10 00:24:18 server kernel: [  363.839258] Modules linked in: btrfs 
zlib_deflate rfcomm bnep ext4 jbd2 snd_dummy loop fuse crc32c_intel 
nvidia(PO) snd_hda_codec_realtek snd_hda_intel snd_hda_c
odec snd_pcm dvb_usb_dib0700 dvb_usb dib0070 dib7000p dibx000_common 
imon dvb_core hid_logitech_dj btusb bluetooth hid_logitech rc_core skge 
snd_page_alloc snd_timer processor snd r8168(O) butto
n i2c_i801 rtc_cmos usbhid sr_mod cdrom firewire_ohci firewire_core 
crc_itu_t uhci_hcd pata_jmicron

Apr 10 00:24:18 server kernel: [  363.839609]
Apr 10 00:24:18 server kernel: [  363.839619] Pid: 4682, comm: btrfs 
Tainted: P   O 3.3.1-vs2.3.3.2+pf #1 Gigabyte Technology Co., 
Ltd. P55M-UD2/P55M-UD2
Apr 10 00:24:18 server kernel: [  363.839677] EIP: 0060:[] 
EFLAGS: 00210246 CPU: 1
Apr 10 00:24:18 server kernel: [  363.839709] EIP is at 
btrfs_balance+0xe7f/0xed0 [btrfs]
Apr 10 00:24:18 server kernel: [  363.839732] EAX: ff00 EBX: 
ffef ECX: 0003 EDX: 0303
Apr 10 00:24:18 server kernel: [  363.839758] ESI: eb868e00 EDI: 
 EBP:  ESP: e8ebbdd8
Apr 10 00:24:18 server kernel: [  363.839785]  DS: 007b ES: 007b FS: 
00d8 GS: 0033 SS: 0068
Apr 10 00:24:18 server kernel: [  363.839809] Process btrfs (pid: 4682, 
ti=e8eba000 task=eb2c8ab0 task.ti=e8eba000)

Apr 10 00:24:18 server kernel: [  363.839839] Stack:
Apr 10 00:24:18 server kernel: [  363.839850]  0040 0001 
   e8ebbe30 e8ece000 ec713bb4
Apr 10 00:24:18 server kernel: [  363.839914]  0097 eb945000 
0097 000e  0002  f2f2d3b0
Apr 10 00:24:18 server kernel: [  363.839987]  ec0cdd34 f3153b00 
e9240600 ec0cde00 c1094152 c10cbb6b eac98b00 0001

Apr 10 00:24:18 server kernel: [  363.840090] Call Trace:
Apr 10 00:24:18 server kernel: [  363.840109]  [] ? 
filemap_fault+0x82/0x420
Apr 10 00:24:18 server kernel: [  363.840132]  [] ? 
__mem_cgroup_try_charge+0x28b/0x4c0
Apr 10 00:24:18 server kernel: [  363.840160]  [] ? 
__do_fault+0x3c9/0x510
Apr 10 00:24:18 server kernel: [  363.840183]  [] ? 
kmem_cache_alloc+0x75/0x90
Apr 10 00:24:18 server kernel: [  363.840212]  [] ? 
btrfs_ioctl_balance.isra.52+0x379/0x390 [btrfs]
Apr 10 00:24:18 server kernel: [  363.840246]  [] ? 
update_ioctl_balance_args+0x2e0/0x2e0 [btrfs]
Apr 10 00:24:18 server kernel: [  363.840280]  [] ? 
btrfs_ioctl+0x671/0x1200 [btrfs]
Apr 10 00:24:18 server kernel: [  363.840306]  [] ? 
handle_mm_fault+0x124/0x260
Apr 10 00:24:18 server kernel: [  363.840334]  [] ? 
update_ioctl_balance_args+0x2e0/0x2e0 [btrfs]
Apr 10 00:24:18 server kernel: [  363.840363]  [] ? 
do_vfs_ioctl+0x7a/0x580
Apr 10 00:24:18 server kernel: [  363.840386]  [] ? 
vmalloc_sync_all+0x10/0x10
Apr 10 00:24:18 server kernel: [  363.840409]  [] ? 
do_page_fault+0x185/0x3d0
Apr 10 00:24:18 server kernel: [  363.840432]  [] ? 
do_sys_open+0x15f/0x1b0
Apr 10 00:24:18 server kernel: [  363.840453]  [] ? 
do_fcntl+0x232/0x470
Apr 10 00:24:18 server kernel: [  363.840475]  [] ? 
sys_ioctl+0x2e/0x60
Apr 10 00:24:18 server kernel: [  363.840497]  [] ? 
sysenter_do_call+0x12/0x22
Apr 10 00:24:18 server kernel: [  363.840519] Code: c7 02 e9 ee fe ff ff 
c6 07 00 66 ba ff 03 8b 7c 24 60 83 c7 01 e9 cf fe ff ff 31 db e9 70 fe 
ff ff 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b <0f> 0b 8b 74 24 7c c7 04 24 9c f7 
f7 f4 89 74 24 04 e8 dc ce 46
Apr 10 00:24:18 server kernel: [  363.840933] EIP: [] 
btrfs_balance+0xe7f/0xed0 [btrfs] SS:ESP 0068:e8ebbdd8
Apr 10 00:24:18 server kernel: [  363.841023] ---[ end trace 
8be1f61ebfe6132a ]---

~
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot speed/mount time regression with 3.4.0-rc2

2012-04-09 Thread Calvin Walton
On Mon, 2012-04-09 at 16:54 -0400, Josef Bacik wrote:
> On Mon, Apr 09, 2012 at 01:10:04PM -0400, Calvin Walton wrote:
> > On Mon, 2012-04-09 at 11:53 -0400, Calvin Walton wrote:
> > > Hi,
> > > 
> > > I have a system that's using a dracut-generated initramfs to mount a
> > > btrfs root. After upgrading to kernel 3.4.0-rc2 to test it out, I've
> > > noticed that the process of mounting the root filesystem takes much
> > > longer with 3.4.0-rc2 than it did with 3.3.1 - nearly 30 seconds slower!

> > And the bisect results are in:
> > 285ff5af6ce358e73f53b55c9efadd4335f4c2ff is the first bad commit
> > commit 285ff5af6ce358e73f53b55c9efadd4335f4c2ff
> > Author: Josef Bacik 
> > Date:   Fri Jan 13 15:27:45 2012 -0500
> > 
> > Btrfs: remove the ideal caching code> 
> 
> Ok can you give this a whirl?  You are going to have to boot/reboot a few 
> times
> to let the cache get re-generated again to make sure it's taken effect, but
> hopefully this will help out.  Thanks,

Unfortunately, it doesn't seem to help. Even after 3 or 4 reboots with
this patch applied I'm still seeing the same delay.

-- 
Calvin Walton 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Leho Kraav

On 09.04.2012 17:54, Daniel J Blueman wrote:

On 9 April 2012 22:44, Leho Kraav  wrote:


And is the previous filesystem still hosed for good then? Or mounting the
images with -discard might help?


It seems like the kernel caught and prevented the discard after the
end of the partition, so the data should be fine; scrubbing will tell
you.


Without the patch at least, it's BUG time. This is what happens when 
mounting the image.


...
[171555.937706] device label HOME devid 1 transid 370409 /dev/loop3
[171555.956786] device label HOME devid 2 transid 370409 /dev/loop4
[171647.077501] device label HOME devid 2 transid 370409 /dev/loop4
[171647.196262] btrfs: continuing balance
[171650.826278] btrfs: relocating block group 18278776832 flags 9
[171651.218444] btrfs csum failed ino 257 off 262144 csum 3439556781 
private 289331560
[171651.226455] btrfs csum failed ino 257 off 196608 csum 3957169907 
private 1046207033
[171651.227070] btrfs csum failed ino 257 off 196608 csum 3957169907 
private 1046207033

[171652.484666] [ cut here ]
[171652.484669] kernel BUG at fs/btrfs/volumes.c:2487!
[171652.484671] invalid opcode:  [#1] PREEMPT SMP
[171652.484673] Modules linked in: btrfs zlib_deflate lrw gf128mul 
vboxnetadp(O) vboxnetflt(O) vboxdrv(O) coretemp it87 hwmon_vid hwmon nfs 
autofs4 nfsd lockd nfs_acl auth_rpcgss sunrpc iptable_mangle ipt_ULOG 
xt_recent xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state 
nf_conntrack iptable_filter ip_tables x_tables squashfs imon rfcomm bnep 
ext4 jbd2 snd_dummy loop fuse crc32c_intel nvidia(PO) 
snd_hda_codec_realtek dvb_usb_dib0700 dib7000p dib0070 dvb_usb dvb_core 
snd_hda_intel snd_hda_codec snd_pcm rc_core btusb bluetooth snd_timer 
r8168(O) processor skge snd rtc_cmos sg snd_page_alloc button 
dibx000_common i2c_i801 hid_logitech_dj hid_logitech usbhid sr_mod cdrom 
firewire_ohci firewire_core crc_itu_t pata_jmicron uhci_hcd [last 
unloaded: imon]

[171652.484703]
[171652.484705] Pid: 21206, comm: btrfs-balance Tainted: P   O 
3.3.1-vs2.3.3.2+pf #1 Gigabyte Technology Co., Ltd. P55M-UD2/P55M-UD2

[171652.484708] EIP: 0060:[] EFLAGS: 00010282 CPU: 1
[171652.484718] EIP is at btrfs_balance+0xe79/0xed0 [btrfs]
[171652.484719] EAX: fffb EBX: d0e58e00 ECX: 80240022 EDX: 80240023
[171652.484721] ESI: 7fd0 EDI: 0002 EBP: cc046068 ESP: dc5a3ef0
[171652.484722]  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
[171652.484724] Process btrfs-balance (pid: 21206, ti=dc5a2000 
task=c40ee750 task.ti=dc5a2000)

[171652.484725] Stack:
[171652.484726]  0096 c14d806c c1048ebb 0046 0046 7fe0 
0002 df15d800
[171652.484729]   c8efc000 c1580e3d 0030  0246 
 ec85a800
[171652.484733]  00029e7f 0002fea6 dc5a3f52 0010  0003 
0246 

[171652.484736] Call Trace:
[171652.484740]  [] ? up+0xb/0x40
[171652.484743]  [] ? try_to_wake_up+0x6e/0x100
[171652.484745]  [] ? default_wake_function+0x8/0x10
[171652.484752]  [] ? balance_kthread+0x5f/0xa0 [btrfs]
[171652.484759]  [] ? btrfs_balance+0xed0/0xed0 [btrfs]
[171652.484761]  [] ? kthread+0x6e/0x80
[171652.484763]  [] ? kthread_freezable_should_stop+0x50/0x50
[171652.484771]  [] ? kernel_thread_helper+0x6/0xd
[171652.484772] Code: 00 00 83 ea 02 83 c7 02 e9 ee fe ff ff c6 07 00 66 
ba ff 03 8b 7c 24 60 83 c7 01 e9 cf fe ff ff 31 db e9 70 fe ff ff 0f 0b 
0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 8b 74 24 7c c7 04 24 5c 47 28 fa 89 74
[171652.484793] EIP: [] btrfs_balance+0xe79/0xed0 [btrfs] 
SS:ESP 0068:dc5a3ef0

[171652.484802] ---[ end trace 15f25988d7f952de ]---
...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot speed/mount time regression with 3.4.0-rc2

2012-04-09 Thread Josef Bacik
On Mon, Apr 09, 2012 at 01:10:04PM -0400, Calvin Walton wrote:
> On Mon, 2012-04-09 at 11:53 -0400, Calvin Walton wrote:
> > Hi,
> > 
> > I have a system that's using a dracut-generated initramfs to mount a
> > btrfs root. After upgrading to kernel 3.4.0-rc2 to test it out, I've
> > noticed that the process of mounting the root filesystem takes much
> > longer with 3.4.0-rc2 than it did with 3.3.1 - nearly 30 seconds slower!
> > 
> > Does anyone have any ideas? I'm going to try bisecting the issue to see
> > if I can narrow down the cause. I've included excerpts from dmesg of the
> > bad and good kernels here, and attached the complete dmesg from the bad
> > kernel, in case it has anything interesting that I've trimmed out here.
> 
> And the bisect results are in:
> 285ff5af6ce358e73f53b55c9efadd4335f4c2ff is the first bad commit
> commit 285ff5af6ce358e73f53b55c9efadd4335f4c2ff
> Author: Josef Bacik 
> Date:   Fri Jan 13 15:27:45 2012 -0500
> 
> Btrfs: remove the ideal caching code
>
> This is a relic from before we had the disk space cache and it was to make
> bootup times when you had btrfs as root not be so damned slow.  Now that 
> we have
> the disk space cache this isn't a problem anymore and really having this 
> code
> casues uneeded fragmentation and complexity, so just remove it.  Thanks,
> 
> Signed-off-by: Josef Bacik 
> 
> The commit doesn't revert cleanly on top of 3.4.0-rc2, so I haven't
> tested that; but it looks like that caching code is in fact still useful
> to make "btrfs as root not be so damned slow."
> 

Ok can you give this a whirl?  You are going to have to boot/reboot a few times
to let the cache get re-generated again to make sure it's taken effect, but
hopefully this will help out.  Thanks,

Josef

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a844204..7a703d2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -530,8 +530,8 @@ static int cache_block_group(struct btrfs_block_group_cache 
*cache,
 * we likely hold important locks.
 */
if (trans && (!trans->transaction->in_commit) &&
-   (root && root != root->fs_info->tree_root) &&
-   btrfs_test_opt(root, SPACE_CACHE)) {
+   root != fs_info->tree_root &&
+   fs_info->mount_opt & BTRFS_MOUNT_SPACE_CACHE) {
ret = load_free_space_cache(fs_info, cache);
 
spin_lock(&cache->lock);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs-progs: make btrfsck aware of free space inodes

2012-04-09 Thread Josef Bacik
The new xfstests will run fsck against the volume to make sure we didn't
introduce any inconsistencies, which is nice except we will error out
immediately if we mount with inode_cache.  We need to make btrfsck skip the
special free space cache items and then just assume that we have a link for
the free space cache inode item.  This makes btrfsck pass with success on a
fs with inode cache items.  Thanks,

Signed-off-by: Josef Bacik 
---
 btrfsck.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/btrfsck.c b/btrfsck.c
index 7aac736..572dde0 100644
--- a/btrfsck.c
+++ b/btrfsck.c
@@ -274,6 +274,9 @@ static struct inode_record *get_inode_rec(struct cache_tree 
*inode_cache,
node->cache.size = 1;
node->data = rec;
 
+   if (ino == BTRFS_FREE_INO_OBJECTID)
+   rec->found_link = 1;
+
ret = insert_existing_cache_extent(inode_cache, &node->cache);
BUG_ON(ret);
}
@@ -1015,6 +1018,10 @@ static int process_one_leaf(struct btrfs_root *root, 
struct extent_buffer *eb,
nritems = btrfs_header_nritems(eb);
for (i = 0; i < nritems; i++) {
btrfs_item_key_to_cpu(eb, &key, i);
+
+   if (key.objectid == BTRFS_FREE_SPACE_OBJECTID)
+   continue;
+
if (active_node->current == NULL ||
active_node->current->ino < key.objectid) {
if (active_node->current) {
-- 
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Martin Steigerwald
Am Montag, 9. April 2012 schrieb Daniel J Blueman:
> On 9 April 2012 22:44, Leho Kraav  wrote:
> > On 09.04.2012 17:35, Daniel J Blueman wrote:
> >> Leho Kraav  kraav.com>  writes:
> >> []
> >> 
> >>> Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond
> >>> end of device
> >>> Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129,
> >>> want=23361976, limit=20967424
> >> 
> >> I recently bumped into this too [1]. Liu Bo posted a patch for it
> >> [2], which tests out fine here. The workaround is to not mount with
> >> 'discard' until eg ~3.4-rc3 or later.
> >> 
> >> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
> >> [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
> > 
> > Oh wow, thanks. This sounds exactly like what happened. I got the
> > livelock post off my search results, but the patch post doesn't seem
> > to have any of the keywords I was looking for, since I had no idea
> > it could be related to discards.
> > 
> > So can this become a problem earlier too, not only when the space
> > used is
> 
> > approaching limits? If not, I think I should be good until 3.4:
> Looks like it affects at least 3.3 and 3.4-rc1/2 in all circumstances.

Is offline discard via fstrim also affected?

I used fstrim some times for my / BTRFS with 3.3.0-trunk Debian kernel 
(should be 3.3.0) and

martin@merkaba:~> zgrep "beyond" /var/log/syslog*
martin@merkaba:~#1>

Seems I am safe.

But I think I won´t use fstrim for now anymore on any BTRFS partition 
until I have some confirmation that it is safe.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot speed/mount time regression with 3.4.0-rc2

2012-04-09 Thread Josef Bacik
On Mon, Apr 09, 2012 at 01:10:04PM -0400, Calvin Walton wrote:
> On Mon, 2012-04-09 at 11:53 -0400, Calvin Walton wrote:
> > Hi,
> > 
> > I have a system that's using a dracut-generated initramfs to mount a
> > btrfs root. After upgrading to kernel 3.4.0-rc2 to test it out, I've
> > noticed that the process of mounting the root filesystem takes much
> > longer with 3.4.0-rc2 than it did with 3.3.1 - nearly 30 seconds slower!
> > 
> > Does anyone have any ideas? I'm going to try bisecting the issue to see
> > if I can narrow down the cause. I've included excerpts from dmesg of the
> > bad and good kernels here, and attached the complete dmesg from the bad
> > kernel, in case it has anything interesting that I've trimmed out here.
> 
> And the bisect results are in:
> 285ff5af6ce358e73f53b55c9efadd4335f4c2ff is the first bad commit
> commit 285ff5af6ce358e73f53b55c9efadd4335f4c2ff
> Author: Josef Bacik 
> Date:   Fri Jan 13 15:27:45 2012 -0500
> 
> Btrfs: remove the ideal caching code
>
> This is a relic from before we had the disk space cache and it was to make
> bootup times when you had btrfs as root not be so damned slow.  Now that 
> we have
> the disk space cache this isn't a problem anymore and really having this 
> code
> casues uneeded fragmentation and complexity, so just remove it.  Thanks,
> 
> Signed-off-by: Josef Bacik 
> 
> The commit doesn't revert cleanly on top of 3.4.0-rc2, so I haven't
> tested that; but it looks like that caching code is in fact still useful
> to make "btrfs as root not be so damned slow."

Hrm well you should have disk space cache which is 10x faster, if it's falling
back to the old slow way we should probably figure out why that is happening.
Let me run some tests and see how often I'm getting no disk cache written out.
Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot speed/mount time regression with 3.4.0-rc2

2012-04-09 Thread Calvin Walton
On Mon, 2012-04-09 at 11:53 -0400, Calvin Walton wrote:
> Hi,
> 
> I have a system that's using a dracut-generated initramfs to mount a
> btrfs root. After upgrading to kernel 3.4.0-rc2 to test it out, I've
> noticed that the process of mounting the root filesystem takes much
> longer with 3.4.0-rc2 than it did with 3.3.1 - nearly 30 seconds slower!
> 
> Does anyone have any ideas? I'm going to try bisecting the issue to see
> if I can narrow down the cause. I've included excerpts from dmesg of the
> bad and good kernels here, and attached the complete dmesg from the bad
> kernel, in case it has anything interesting that I've trimmed out here.

And the bisect results are in:
285ff5af6ce358e73f53b55c9efadd4335f4c2ff is the first bad commit
commit 285ff5af6ce358e73f53b55c9efadd4335f4c2ff
Author: Josef Bacik 
Date:   Fri Jan 13 15:27:45 2012 -0500

Btrfs: remove the ideal caching code
   
This is a relic from before we had the disk space cache and it was to make
bootup times when you had btrfs as root not be so damned slow.  Now that we 
have
the disk space cache this isn't a problem anymore and really having this 
code
casues uneeded fragmentation and complexity, so just remove it.  Thanks,

Signed-off-by: Josef Bacik 

The commit doesn't revert cleanly on top of 3.4.0-rc2, so I haven't
tested that; but it looks like that caching code is in fact still useful
to make "btrfs as root not be so damned slow."

> slow:
> [0.00] Linux version 3.4.0-rc2 (cwalton@ayu) (gcc version 4.6.3 
> (Exherbo gcc-4.6.3) ) #57 SMP PREEMPT Mon Apr 9 11:19:43 EDT 2012
> [0.00] Command line: root=UUID=43969cd0-4aca-4297-bfbe-952a692f7d55 
> rootflags=subvolid=262,compress=lzo,autodefrag,space_cache,inode_cache,noatime
>  mtrr_chunk_size=512M quiet
> 
> [1.058257] udevd[701]: starting version 182
> [1.389606] device label Linux data devid 1 transid 611923 /dev/sda4
> [1.498659] dracut: Checking, if btrfs device complete
> [1.644808] device label Linux data devid 1 transid 611923 /dev/sda4
> [1.647993] btrfs: disk space caching is enabled
> [2.180836] device label Linux data devid 1 transid 611923 /dev/sda4
> [2.181663] btrfs: use lzo compression
> [2.181667] btrfs: enabling auto defrag
> [2.181670] btrfs: enabling inode map caching
> [2.181672] btrfs: disk space caching is enabled
> [2.697845] dracut: Checking btrfs: 
> /dev/disk/by-uuid/43969cd0-4aca-4297-bfbe-952a692f7d55
> [2.69] dracut: trying to mount 
> /dev/disk/by-uuid/43969cd0-4aca-4297-bfbe-952a692f7d55
> [2.702637] device label Linux data devid 1 transid 611923 /dev/sda4
> [2.704376] btrfs: disk space caching is enabled
> [3.081720] dracut: btrfs: 
> /dev/disk/by-uuid/43969cd0-4aca-4297-bfbe-952a692f7d55 is clean
> [   29.934639] dracut: Remounting 
> /dev/disk/by-uuid/43969cd0-4aca-4297-bfbe-952a692f7d55 with -o 
> subvolid=262,compress=lzo,autodefrag,space_cache,inode_cache,noatime,ro
> [   29.936810] device label Linux data devid 1 transid 611926 /dev/sda4
> [   29.937720] btrfs: use lzo compression
> [   29.937726] btrfs: enabling auto defrag
> [   29.937733] btrfs: enabling inode map caching
> [   29.937735] btrfs: disk space caching is enabled
> [   30.388066] dracut: Mounted root filesystem /dev/sda4
> [   30.461884] dracut: Switching root
> [   31.241729] udevd[1322]: starting version 182
> [   31.422905] btrfs: use lzo compression
> [   31.422909] btrfs: enabling auto defrag
> [   31.422913] btrfs: enabling inode map caching
> [   31.422915] btrfs: disk space caching is enabled

-- 
Calvin Walton 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot speed/mount time regression with 3.4.0-rc2

2012-04-09 Thread cwillu
On Mon, Apr 9, 2012 at 9:53 AM, Calvin Walton  wrote:
> Hi,
>
> I have a system that's using a dracut-generated initramfs to mount a
> btrfs root. After upgrading to kernel 3.4.0-rc2 to test it out, I've
> noticed that the process of mounting the root filesystem takes much
> longer with 3.4.0-rc2 than it did with 3.3.1 - nearly 30 seconds slower!
>
> Does anyone have any ideas? I'm going to try bisecting the issue to see
> if I can narrow down the cause. I've included excerpts from dmesg of the
> bad and good kernels here, and attached the complete dmesg from the bad
> kernel, in case it has anything interesting that I've trimmed out here.

That sounds suspiciously like a symlink from btrfsck is to fsck.btrfs
was added at about the same time as the update (old initrd maybe?).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: use i_version instead of our own sequence

2012-04-09 Thread Josef Bacik
We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write.  Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/btrfs_inode.h   |3 ---
 fs/btrfs/delayed-inode.c |4 ++--
 fs/btrfs/file.c  |1 -
 fs/btrfs/inode.c |5 ++---
 fs/btrfs/super.c |2 +-
 5 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 9b9b15f..3771b85 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -83,9 +83,6 @@ struct btrfs_inode {
 */
u64 generation;
 
-   /* sequence number for NFS changes */
-   u64 sequence;
-
/*
 * transid of the trans_handle that last modified this inode
 */
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 03e3748..bcd40c7 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1706,7 +1706,7 @@ static void fill_stack_inode_item(struct 
btrfs_trans_handle *trans,
btrfs_set_stack_inode_nbytes(inode_item, inode_get_bytes(inode));
btrfs_set_stack_inode_generation(inode_item,
 BTRFS_I(inode)->generation);
-   btrfs_set_stack_inode_sequence(inode_item, BTRFS_I(inode)->sequence);
+   btrfs_set_stack_inode_sequence(inode_item, inode->i_version);
btrfs_set_stack_inode_transid(inode_item, trans->transid);
btrfs_set_stack_inode_rdev(inode_item, inode->i_rdev);
btrfs_set_stack_inode_flags(inode_item, BTRFS_I(inode)->flags);
@@ -1754,7 +1754,7 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
set_nlink(inode, btrfs_stack_inode_nlink(inode_item));
inode_set_bytes(inode, btrfs_stack_inode_nbytes(inode_item));
BTRFS_I(inode)->generation = btrfs_stack_inode_generation(inode_item);
-   BTRFS_I(inode)->sequence = btrfs_stack_inode_sequence(inode_item);
+   inode->i_version = btrfs_stack_inode_sequence(inode_item);
inode->i_rdev = 0;
*rdev = btrfs_stack_inode_rdev(inode_item);
BTRFS_I(inode)->flags = btrfs_stack_inode_flags(inode_item);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 431b565..f0da02b 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1404,7 +1404,6 @@ static ssize_t btrfs_file_aio_write(struct kiocb *iocb,
mutex_unlock(&inode->i_mutex);
goto out;
}
-   BTRFS_I(inode)->sequence++;
 
start_pos = round_down(pos, root->sectorsize);
if (start_pos > i_size_read(inode)) {
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 7a084fb..7d3dd2f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2510,7 +2510,7 @@ static void btrfs_read_locked_inode(struct inode *inode)
 
inode_set_bytes(inode, btrfs_inode_nbytes(leaf, inode_item));
BTRFS_I(inode)->generation = btrfs_inode_generation(leaf, inode_item);
-   BTRFS_I(inode)->sequence = btrfs_inode_sequence(leaf, inode_item);
+   inode->i_version = btrfs_inode_sequence(leaf, inode_item);
inode->i_generation = BTRFS_I(inode)->generation;
inode->i_rdev = 0;
rdev = btrfs_inode_rdev(leaf, inode_item);
@@ -2594,7 +2594,7 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
 
btrfs_set_inode_nbytes(leaf, item, inode_get_bytes(inode));
btrfs_set_inode_generation(leaf, item, BTRFS_I(inode)->generation);
-   btrfs_set_inode_sequence(leaf, item, BTRFS_I(inode)->sequence);
+   btrfs_set_inode_sequence(leaf, item, inode->i_version);
btrfs_set_inode_transid(leaf, item, trans->transid);
btrfs_set_inode_rdev(leaf, item, inode->i_rdev);
btrfs_set_inode_flags(leaf, item, BTRFS_I(inode)->flags);
@@ -6884,7 +6884,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei->root = NULL;
ei->space_info = NULL;
ei->generation = 0;
-   ei->sequence = 0;
ei->last_trans = 0;
ei->last_sub_trans = 0;
ei->logged_trans = 0;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 54e7ee9..ee1bb31 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -770,7 +770,7 @@ static int btrfs_fill_super(struct super_block *sb,
 #ifdef CONFIG_BTRFS_FS_POSIX_ACL
sb->s_flags |= MS_POSIXACL;
 #endif
-
+   sb->s_flags |= MS_I_VERSION;
err = open_ctree(sb, fs_devices, (char *)data);
if (err) {
printk("btrfs: open_ctree failed\n");
-- 
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: remove lock assert from get_restripe_target()

2012-04-09 Thread Ilya Dryomov
This fixes a regression introduced by fc67c450.  spin_is_locked() always
returns 0 on UP kernels, which caused assert in get_restripe_target() to
be fired on every call from btrfs_reduce_alloc_profile() on UP systems.
Remove it completely for now, it's not clear if it's going to be needed
in future.

Reported-by: Bobby Powers 
Reported-by: Mitch Harder 
Tested-by: Mitch Harder 
Signed-off-by: Ilya Dryomov 
---
 fs/btrfs/extent-tree.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a844204..8db0884 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3152,15 +3152,14 @@ static void set_avail_alloc_bits(struct btrfs_fs_info 
*fs_info, u64 flags)
 /*
  * returns target flags in extended format or 0 if restripe for this
  * chunk_type is not in progress
+ *
+ * should be called with either volume_mutex or balance_lock held
  */
 static u64 get_restripe_target(struct btrfs_fs_info *fs_info, u64 flags)
 {
struct btrfs_balance_control *bctl = fs_info->balance_ctl;
u64 target = 0;
 
-   BUG_ON(!mutex_is_locked(&fs_info->volume_mutex) &&
-  !spin_is_locked(&fs_info->balance_lock));
-
if (!bctl)
return 0;
 
-- 
1.7.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [PATCH 2/2] Btrfs: move over to use ->update_time

2012-04-09 Thread J. Bruce Fields
On Wed, Apr 04, 2012 at 02:16:22PM -0400, Josef Bacik wrote:
> On Wed, Apr 04, 2012 at 09:12:57PM +0300, Kasatkin, Dmitry wrote:
> > On Wed, Apr 4, 2012 at 8:47 PM, Mimi Zohar  wrote:
> > > On Wed, 2012-04-04 at 13:43 -0400, Josef Bacik wrote:
> > >> On Wed, Apr 04, 2012 at 08:24:19PM +0300, Kasatkin, Dmitry wrote:
> > >> > Hello,
> > >> >
> > >> > Mimi and I working on IMA/EVM (security/integrity) and it uses
> > >> > i_version for checking if file content has been changed.
> > >> > extX file systems support i_version updates with mounting file system
> > >> > with "iversion" option or via kernel command line parameter
> > >> > "i_version"
> > >> >
> > >> > It seems iversion option is not recognized when mounting btrfs.
> > >> > I see this patchset deals with i_version update as well..
> > >> > Can you please give an advice how to use i_version with btrfs?
> > >> >
> > >>
> > >> Oh good somebody uses this?  We actually have a ->sequence thing we use 
> > >> for
> > >> this, the grand idea was to make it smarter about telling nfs when 
> > >> something
> > >> changed, but if you guys use i_version we could probably get rid of our 
> > >> in-core
> > >> sequence and use the normal inodes i_version and then just store it in 
> > >> our
> > >> sequence field on disk.  I'll do it without a mount option tho so it 
> > >> just works,
> > >> does that sound good to you?  Thanks,
> > 
> > Hello,
> > 
> > Thank you for the answer...
> > But can you a bit clarify...
> > 
> > Looking to file_update_time() I see that it does:
> > 
> > if (IS_I_VERSION(inode))
> > sync_it |= S_VERSION;
> > 
> > Basically it should be (inode->i_sb->s_flags & MS_I_VERSION)
> > 
> > use of i_version is controlled by iversion mount flag.
> > for ext4 I see in parse_options():
> > 
> > case Opt_i_version:
> > set_opt(sb, I_VERSION);
> > sb->s_flags |= MS_I_VERSION;
> > break;
> > 
> > 
> > But who sets MS_I_VERSION in s_flags on btrfs?
> > 
> 
> Nobody yet, I'm going to send a patch shortly that will support this.  Thanks,

Great.  It would also be far preferable if it was just always on (at
least by default) rather than requiring a mount option.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


umount vs delayed allocation potential deadlock...

2012-04-09 Thread Daniel J Blueman
When testing btrfs on 3.4-rc2 with unmount directly after a test
workload, I see potential deadlock:

[ INFO: possible circular locking dependency detected ]
3.4.0-rc2-debug+ #2 Not tainted
---
fio/2365 is trying to acquire lock:
 (&type->s_umount_key#19){+.}, at: []
writeback_inodes_sb_nr_if_idle+0x38/0x60

but task is already holding lock:
 (&ei->delalloc_mutex){+.+...}, at: []
btrfs_delalloc_reserve_metadata+0x7b/0x240 [btrfs]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&ei->delalloc_mutex){+.+...}:
   [] check_prevs_add+0xda/0x140
   [] validate_chain.isra.33+0x3d9/0x510
   [] __lock_acquire+0x388/0x900
   [] lock_acquire+0x55/0x70
   [] mutex_lock_nested+0x6b/0x340
   [] btrfs_delalloc_reserve_metadata+0x7b/0x240 [btrfs]
   [] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs]
   [] btrfs_save_ino_cache+0x23f/0x310 [btrfs]
   [] commit_fs_roots.isra.22+0xc0/0x190 [btrfs]
   [] btrfs_commit_transaction+0x4cc/0x8e0 [btrfs]
   [] btrfs_sync_file+0x116/0x1a0 [btrfs]
   [] do_fsync+0x51/0x80
   [] sys_fsync+0xb/0x10
   [] system_call_fastpath+0x16/0x1b

-> #1 (&fs_info->tree_log_mutex){+.+...}:
   [] check_prevs_add+0xda/0x140
   [] validate_chain.isra.33+0x3d9/0x510
   [] __lock_acquire+0x388/0x900
   [] lock_release_non_nested+0x10d/0x310
   [] lock_release_nested+0x2a/0xa0
   [] __lock_release+0xad/0xd0
   [] lock_release+0x36/0x50
   [] __mutex_unlock_slowpath+0x86/0x150
   [] mutex_unlock+0x9/0x10
   [] btrfs_commit_transaction+0x717/0x8e0 [btrfs]
   [] btrfs_sync_fs+0x4b/0x80 [btrfs]
   [] __sync_filesystem+0x5e/0x90
   [] sync_one_sb+0x17/0x20
   [] iterate_supers+0xe9/0xf0
   [] sys_sync+0x42/0x60
   [] system_call_fastpath+0x16/0x1b

-> #0 (&type->s_umount_key#19){+.}:
   [] check_prev_add+0x719/0x730
   [] check_prevs_add+0xda/0x140
   [] validate_chain.isra.33+0x3d9/0x510
   [] __lock_acquire+0x388/0x900
   [] lock_acquire+0x55/0x70
   [] down_read+0x47/0x5c
   [] writeback_inodes_sb_nr_if_idle+0x38/0x60
   [] shrink_delalloc+0x13a/0x200 [btrfs]
   [] reserve_metadata_bytes.isra.70+0x1c2/0x430 [btrfs]
   [] btrfs_delalloc_reserve_metadata+0x12f/0x240 [btrfs]
   [] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs]
   [] btrfs_direct_IO+0x14a/0x410 [btrfs]
   [] generic_file_direct_write+0xcc/0x190
   [] __btrfs_direct_write+0x40/0x146 [btrfs]
   [] btrfs_file_aio_write+0x33f/0x350 [btrfs]
   [] aio_rw_vect_retry+0xb9/0x160
   [] aio_run_iocb+0x5e/0x150
   [] io_submit_one+0x175/0x220
   [] do_io_submit+0x129/0x1c0
   [] sys_io_submit+0xb/0x10
   [] system_call_fastpath+0x16/0x1b

other info that might help us debug this:

Chain exists of:
  &type->s_umount_key#19 --> &fs_info->tree_log_mutex --> &ei->delalloc_mutex

 Possible unsafe locking scenario:

   CPU0CPU1
   
  lock(&ei->delalloc_mutex);
   lock(&fs_info->tree_log_mutex);
   lock(&ei->delalloc_mutex);
  lock(&type->s_umount_key#19);

 *** DEADLOCK ***

2 locks held by fio/2365:
 #0:  (&sb->s_type->i_mutex_key#12){+.+.+.}, at: []
btrfs_file_aio_write+0xcf/0x350 [btrfs]
 #1:  (&ei->delalloc_mutex){+.+...}, at: []
btrfs_delalloc_reserve_metadata+0x7b/0x240 [btrfs]

stack backtrace:
Pid: 2365, comm: fio Not tainted 3.4.0-rc2-debug+ #2
Call Trace:
 [] print_circular_bug+0xda/0xeb
 [] check_prev_add+0x719/0x730
 [] ? __lock_acquire+0x388/0x900
 [] check_prevs_add+0xda/0x140
 [] validate_chain.isra.33+0x3d9/0x510
 [] __lock_acquire+0x388/0x900
 [] lock_acquire+0x55/0x70
 [] ? writeback_inodes_sb_nr_if_idle+0x38/0x60
 [] down_read+0x47/0x5c
 [] ? writeback_inodes_sb_nr_if_idle+0x38/0x60
 [] ? __lock_release+0xad/0xd0
 [] writeback_inodes_sb_nr_if_idle+0x38/0x60
 [] shrink_delalloc+0x13a/0x200 [btrfs]
 [] reserve_metadata_bytes.isra.70+0x1c2/0x430 [btrfs]
 [] ? __lock_release+0x21/0xd0
 [] btrfs_delalloc_reserve_metadata+0x12f/0x240 [btrfs]
 [] btrfs_delalloc_reserve_space+0x3b/0x60 [btrfs]
 [] btrfs_direct_IO+0x14a/0x410 [btrfs]
 [] ? do_writepages+0x1f/0x40
 [] generic_file_direct_write+0xcc/0x190
 [] __btrfs_direct_write+0x40/0x146 [btrfs]
 [] ? btrfs_update_time+0x5f/0x160 [btrfs]
 [] btrfs_file_aio_write+0x33f/0x350 [btrfs]
 [] ? _raw_spin_unlock_irq+0x2b/0x50
 [] ? _raw_spin_unlock_irq+0x2b/0x50
 [] ? __btrfs_buffered_write+0x340/0x340 [btrfs]
 [] aio_rw_vect_retry+0xb9/0x160
 [] ? aio_advance_iovec+0x90/0x90
 [] aio_run_iocb+0x5e/0x150
 [] io_submit_one+0x175/0x220
 [] do_io_submit+0x129/0x1c0
 [] sys_io_submit+0xb/0x10
 [] system_call_fastpath+0x16/0x1b
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo 

Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Daniel J Blueman
On 9 April 2012 22:44, Leho Kraav  wrote:
> On 09.04.2012 17:35, Daniel J Blueman wrote:
>>
>> Leho Kraav  kraav.com>  writes:
>> []
>>>
>>> Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond end
>>> of device
>>> Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129, want=23361976,
>>> limit=20967424
>>
>>
>> I recently bumped into this too [1]. Liu Bo posted a patch for it [2],
>> which tests out fine here. The workaround is to not mount with
>> 'discard' until eg ~3.4-rc3 or later.
>>
>> [1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
>> [2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
>
> Oh wow, thanks. This sounds exactly like what happened. I got the livelock
> post off my search results, but the patch post doesn't seem to have any of
> the keywords I was looking for, since I had no idea it could be related to
> discards.
>
> So can this become a problem earlier too, not only when the space used is
> approaching limits? If not, I think I should be good until 3.4:

Looks like it affects at least 3.3 and 3.4-rc1/2 in all circumstances.

> $ sudo btrfs fi show
> Label: 'S9-HOME'  uuid: 1ed06dbc-e1b7-433f-8d1b-19cf1f7756f1
>        Total devices 1 FS bytes used 12.93GB
>        devid    1 size 60.00GB used 20.04GB path /dev/dm-0
>
> Label: 'S9-ROOT'  uuid: 6206dfce-afcf-4afe-9047-b1c88a7889fd
>        Total devices 1 FS bytes used 8.75GB
>        devid    1 size 30.00GB used 18.29GB path /dev/dm-1
>
> I think I'd like to keep using "discard" for SSD still, unless a smart
> person says it's not particularly useful anyway.

If your SSD has background garbage collection and there are disk idle
periods, the synchronous discards will have little benefit.

> So while I'm on 3.3, is the patch from gmane:16649 good enough to eliminate
> immediate dangers?

Yes.

> And is the previous filesystem still hosed for good then? Or mounting the
> images with -discard might help?

It seems like the kernel caught and prevented the discard after the
end of the partition, so the data should be fine; scrubbing will tell
you.

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Leho Kraav

On 09.04.2012 17:35, Daniel J Blueman wrote:

Leho Kraav  kraav.com>  writes:
[]

Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond end
of device
Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129, want=23361976,
limit=20967424


I recently bumped into this too [1]. Liu Bo posted a patch for it [2],
which tests out fine here. The workaround is to not mount with
'discard' until eg ~3.4-rc3 or later.

[1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
[2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649


Oh wow, thanks. This sounds exactly like what happened. I got the 
livelock post off my search results, but the patch post doesn't seem to 
have any of the keywords I was looking for, since I had no idea it could 
be related to discards.


So can this become a problem earlier too, not only when the space used 
is approaching limits? If not, I think I should be good until 3.4:


$ sudo btrfs fi show
Label: 'S9-HOME'  uuid: 1ed06dbc-e1b7-433f-8d1b-19cf1f7756f1
Total devices 1 FS bytes used 12.93GB
devid1 size 60.00GB used 20.04GB path /dev/dm-0

Label: 'S9-ROOT'  uuid: 6206dfce-afcf-4afe-9047-b1c88a7889fd
Total devices 1 FS bytes used 8.75GB
devid1 size 30.00GB used 18.29GB path /dev/dm-1

I think I'd like to keep using "discard" for SSD still, unless a smart 
person says it's not particularly useful anyway.


So while I'm on 3.3, is the patch from gmane:16649 good enough to 
eliminate immediate dangers?


And is the previous filesystem still hosed for good then? Or mounting 
the images with -discard might help?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.4-rc1] attempt to access beyond end of device and livelock

2012-04-09 Thread Daniel J Blueman
On 8 April 2012 16:49, Liu Bo  wrote:
> On 04/06/2012 07:36 PM, Daniel J Blueman wrote:
>> Hi Josef, Chris,
>>
>> When testing BTRFS with RAID 0 metadata on linux-3.4-rc1, we see
>> discard ranges exceeding the end of the block device [1], potentially
>> causing dataloss; when this occurs, filesystem writeback becomes
>> catatonic due to continual resubmission.
[]
> Thanks for the report, this bug shows we've miscalculated the length of 
> discard extents.
>
> I'll send a patch for this soon.

The patch test out well here on 3.4-rc2. Thanks Bo!

Daniel
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Daniel J Blueman
Leho Kraav  kraav.com> writes:
[]
> Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond end
> of device
> Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129, want=23361976,
> limit=20967424

I recently bumped into this too [1]. Liu Bo posted a patch for it [2],
which tests out fine here. The workaround is to not mount with
'discard' until eg ~3.4-rc3 or later.

Thanks,
  Daniel

[1] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16409
[2] http://permalink.gmane.org/gmane.comp.file-systems.btrfs/16649
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs 3.2.2 -> 3.3.1 upgrade finally ate babies, some advice?

2012-04-09 Thread Leho Kraav

Hi all

$ uname -a
Gentoo Linux s9 3.3.1-pf #2 SMP PREEMPT Mon Apr 9 00:35:28 EEST 2012 
i686 Intel(R) Core(TM) i5-2467M CPU @ 1.60GHz GenuineIntel GNU/Linux


I was running stuff for the past year or so on 4 partitions:

/dev/sda1 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB
/dev/sda2 -> dm-crypt -> btrfs raid 0 ROOT 10.0GB
/dev/sda3 -> dm-crypt -> btrfs raid 0 HOME 10.0GB
/dev/sda4 -> dm-crypt -> btrfs raid 0 HOME 10.0GB

Both filesystems mounted with "noatime,nodiratime,ssd,discard,compress=lzo"

I set that multi-partition monster up back in the 2.6.36ish days, when 
dm-crypt either was not capable of utilizing multicores on a single 
partition or I possibly didn't know that it already could. At one point 
it definitely couldn't.


So over time HOME started filling up and at the point of last night's 
baby eating "df -hT" showed 1.7G free. Yes I know free space is 
complicated in btrfs. Space had not been an issue so I didn't think to 
use any better tools regularly to check, such as "btrfs fi show" I guess.


I upgraded my 3.2.2-pf to 3.3.1-pf* and proceeding to launching my 
regular apps Firefox, TB, office, etc. Except they all hung. Checking my 
/var/log/message window revealed what was happening:


* pf-sources => http://pf.natalenko.name/

...
Apr  8 02:45:52 s9 sudo: leho : TTY=pts/0 ; PWD=/home/leho ; 
USER=root ; COMMAND=/bin/tail -
f /home/leho/.tail/awesome-leho /home/leho/.tail/messages 
/home/leho/.tail/openvpn.log
Apr  8 02:45:52 s9 sudo: pam_unix(sudo:session): session opened for user 
root by (uid=0)
Apr  8 02:46:11 s9 kernel: [  189.691778] attempt to access beyond end 
of device
Apr  8 02:46:11 s9 kernel: [  189.691787] dm-3: rw=129, want=23361976, 
limit=20967424
Apr  8 02:46:11 s9 kernel: [  189.691792] attempt to access beyond end 
of device
Apr  8 02:46:11 s9 kernel: [  189.691795] dm-3: rw=129, want=27556216, 
limit=20967424
Apr  8 02:46:11 s9 kernel: [  189.691799] attempt to access beyond end 
of device

...
Apr  8 02:46:11 s9 kernel: [  189.691869] attempt to access beyond end 
of device
Apr  8 02:46:11 s9 kernel: [  189.691874] dm-3: rw=129, want=69498616, 
limit=20967424

...
Apr  8 02:46:11 s9 kernel: [  189.692233] attempt to access beyond end 
of device
Apr  8 02:46:11 s9 kernel: [  189.692237] dm-3: rw=129, want=228879736, 
limit=20967424

(thousands of lines of this, as we can see "want" gets bigger all the time)

And it was all downhill from there. Result is a majorly corrupted 
filesystem that seems to be beyond repair. Hard rebooting back started 
giving csum errors in various spots and any modifications to the 
filesystem, even deleting files, would start another flood of "attempt 
to access beyond end of device", totally messing up syslog-ng. With 
blazing speedsc of an SSD that probably isn't a surprise.


So searching around, I found out about the ENOSPC thing which is 
possibly still an issue in 3.3. Is there any useful info I could provide 
for this? I now have some bigger partitions and probably won't run out 
of space again for a while.


I also discovered the btrfs "restore" binary, although possibly it was 
too late, since I had already hard rebooted a few times and done some 
more damage to HOME. This thing returned a whole bunch of "ret is -3" 
messages, and 0 byte files. Occasionally files were good as well. But 
majority of the files, seems to corrupt. When running out of space 
happens, is this a reasonable result to expect?


"btrfs scrub" reported uncorrectable errors count in the millions. At 
least thousands of csum mismatch errors visible in dmesg.


"btrfs balance" would bomb the machine with the same "access beyond end 
of device".


I made images of the two btrfs partitions on sda3 and sda4 for future 
diagnosis. I do think they are pretty corrupt though. Or could there be 
some magic poke or offset that would make more stuff magically 
"restore"-able :>


So in conclusion:

 * is filesystem-wide corruption like this helped by running on top of 
dm-crypt or btrfs multi device? dm-crypt is definitely staying for me, 
but I did consolidate partitions now to just 2.
 * what exactly should happen when an out of space scenario like the 
above happens?

 * I guess I should keep an eye on "btrfs fi show" on the regular?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: "Invalid argument" when mounting a btrfs raid1 filesystem

2012-04-09 Thread Martin Steigerwald
Am Montag, 26. März 2012 schrieb Calvin Walton:
> On Mon, 2012-03-26 at 10:51 +0200, Karel Zak wrote:
> > On Sat, Mar 24, 2012 at 06:21:05PM +, Hugo Mills wrote:
> > >As Sadner says, you have to run "btrfs dev scan" before you try
> > >to
> > > 
> > > mount the FS. If you have root on btrfs, this will have to go in an
> > > initrd; otherwise, it can go in your initscripts anywhere before
> > > the non-root filesystem mounts.
> > > 
> > >Basically, the kernel needs to know which devices hold which
> > >btrfs
> > > 
> > > filesystems (organised by UUID) before it tries to mount them. So,
> > > there's an ioctl that is used for sending that data to the kernel,
> > > and a userspace tool (btrfs dev scan) that enumerates all of the
> > > block devices it can see, looks for a btrfs superblock on them,
> > > and tells the kernel.
> >  
> >  Please, move all this logic to udev rules where we already scans all
> >  devices. It's really bad to scan all device more than once. We spent
> >  years to fix this problem for LVM, I don't think that btrfs has to
> >  repeat the same mistakes.
> 
> Oh, this is already possible to do with udev rules, quite easily. In
> fact, dracut ships with the appropriate udev rules, which it uses to
> initialize btrfs filesystems in the initramfs:
> 
> http://git.kernel.org/?p=boot/dracut/dracut.git;a=blob;f=modules.d/90bt
> rfs/80-btrfs.rules;hb=HEAD
> 
> which would be suitable with minor modifications for use in a system
> udev installation as well.

I reported this for Debian long time ago as:

please support raid configurations automatically
http://bugs.debian.org/bug=634658

I will be forwarding your mail to the bug report as to suggest this udev 
based solution.

I has also been reported as:

btrfs-tools: add initramfs boot and hook scripts
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=559710

please provide non-initramfs-tools integration
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=585568

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html