Re: BTRFS hangs - possibly NFS related?
kim-btrfs posted on Tue, 01 Apr 2014 13:56:06 +0100 as excerpted: Apologies if this is known, but I've been lurking a while on the list and not seen anything similar - and I'm running out of ideas on what to do next to debug it. Small HP microserver box, running Debian, EXT4 system disk plus 4 disk BTRFS array shared over NFS (nfs-kernel-server) and SMB - the disks recently moved from a different box where they've been running faultlessly for months, although that didn't use NFS. First off I have absolutely zero experience with NFS or SMB, so if it has anything at all to do with that, I'd be clueless. That said, I do know a few other things to look at, and some idea of how to look at them. The below is what I'd be looking at were it me. Under reasonable combined NFS and SMB load with only a couple of clients, the shares lock up, load average on server and clients goes high and stays high (10-12) and stays there. Apparently not actually CPU and there's little if any disk activity on the server. First thing, high load, but little CPU and little I/O. That's very strange, but there's a few things besides that to check to see if you can run down where all that load is going. With the right tools CPU/load can be categorized into several areas, low- priority/niced, normal, kernel, IRQ, soft-IRQ, IO-wait, steal, guest, altho steal and guest are VM related (steal is CPU taken by the hypervisor or another guest if measured from within a guest, and thus not available to it, quest is of course guests, when measured from the hypervisor) and will be zero if you're not running them, and irq and soft-irq won't show much either in the normal case. And of course niced doesn't show either unless you're running something niced. What I'm wondering here is if it's all going to IO-wait as I suspect... or something else. If you don't have a tool that shows all that, one available tool that does is htop. It's a better top, ncurses/semi-gui-based so run it in a terminal window or text-login VT. Of course you can see which threads are using all that CPU-time load that isn't, while you're at it. Also check out iotop, to see what processes are actually doing IO and the total IO speed. Both these tools have manpages... What could be interesting is what happens when you do that sync. Does a thread or several threads spring to life momentarily (say in iotop) and then idle again, or... ? Killing NFS and/or Samba sometimes helps, but it's always back when the load comes back on. Chased round NFS and Samba options, then find that when the clients hang it's unresponsive on the server directly to the disk. Notice a btrfs-transacti process hung in d.As are all the NFS processes: 3779 ?S 0:00 [nfsd4] 3780 ?S 0:00 [nfsd4_callbacks] 3782 ?D 0:27 [nfsd] 3783 ?D 0:27 [nfsd] 3784 ?D 0:28 [nfsd] 3785 ?D 0:26 [nfsd] sync instantly unsticks everything and it all works again for another couple of minutes, when it locks up again, same symptoms. Nothing apparently written to kern.log or dmesg, which has been the frustration all through - I don't know where to find the culprit! As a band-aid I've put btrfs filesystem sync /mnt/btrfs In the crontab once a minute which is actually working just fine and has been all morning - every 5 minutes was not enough. Any recommendations on where I can look next, or any known holes I've fallen in.? Do I need to force NFS clients to sync in their mount options? Background: Kernel - 3.13-1-amd64 #1 SMP Debian 3.13.7-1 (2014-03-25)AMD N54L with 10GB RAM. ## Total devices 4 FS bytes used 848.88GiB devid2 size 465.76GiB used 319.03GiB path /dev/sdc devid4 size 465.76GiB used 319.00GiB path /dev/sda devid5 size 455.76GiB used 309.03GiB path /dev/sdb2 devid6 size 931.51GiB used 785.00GiB path /dev/sdd ## OK, so you're not full allocation. No problem there. Data, RAID1: total=864.00GiB, used=847.86GiB System, RAID1: total=32.00MiB, used=128.00KiB Metadata, RAID1: total=2.00GiB, used=1009.93MiB That looks healthy. A scrub passes without finding any errors. There are a couple of VM images with light traffic which do fragment a little but I manually defrag those every day so often and I haven't had any problems there - it certainly isn't thrashing. If you've been following the list, I'm surprised you didn't mention whether you're doing snapshotting at all. I'll assume that means no, or only very light/manual snapshotting (as I have here). My guess is that it might be fragmentation of something other than the VMs. You're not mounting with autodefrag, I take it? What about compress? Do you have any other large actively written files, perhaps databases or pre-allocated-file
Re: [Help] Errors found in extent allocation tree or chunk allocation
Michael Witten posted on Tue, 01 Apr 2014 19:05:16 + as excerpted: The `btrfs balance' completed successfully without error, and DID solve my issues; it relocated every chunk, after which `btrfsck' ran smoothly. Thanks for the advice! You've put me at ease, and you've saved me a lot of time and energy. Very good! =:^) Thanks for the fix-report. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix reversed warning condition in btrfs_delayed_inode_reserve_metadata
On Thu, Apr 03, 2014 at 01:34:23PM +0800, Liu Bo wrote: On Wed, Apr 02, 2014 at 07:13:00PM +0200, David Sterba wrote: Commit fae7f21cece9a4c181 (btrfs: Use WARN_ON()'s return value in place of WARN_ON(1)) cleaned up WARN_ON usage and in one place reversed the condition that led to loads of warnings that were not supposed to occur. WARN_ON will trigger because it sees 'ret' though in the previous code did not reach the WARN_ON below. The correct pattern is - if (condition) + if (WARN_ON(condition)) CC: Dulshani Gunawardhana dulshani.gunawardhan...@gmail.com CC: sta...@vger.kernel.org # 3.13 Reported-by: Liu Bo bo.li@oracle.com Signed-off-by: David Sterba dste...@suse.cz --- fs/btrfs/delayed-inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 451b00c86f6c..098af20abd88 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -649,7 +649,7 @@ static int btrfs_delayed_inode_reserve_metadata( goto out; ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes); - if (!WARN_ON(ret)) + if (WARN_ON(!ret)) goto out; Oh sorry, I'd have to get my Reviewed-by back and give a NACK instead. With this patch, (ret = 0) triggers the WARNING, which is not right. Thanks for catching this, you're right, my patch was wrong. I must say the patch (fae7f21ce) made the code harder to read at some places, I don't see much help in removing plain WARN_ON(1) at this cost. Back to the warning flood you observed, the comment under the warning says: 655 /* 656 * Ok this is a problem, let's just steal from the global rsv 657 * since this really shouldn't happen that often. 658 */ 659 ret = btrfs_block_rsv_migrate(root-fs_info-global_block_rsv, 660 dst_rsv, num_bytes); so the question is why it does happen so often. A WARN_ON_ONCE hides the severity of the problem, so I'd rather suggest to put it under enospc_debug option so we can debug it and it does not bother users. As this is closer to the way you were going to fix that, I'm not sending a patch, take this as a review comment. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6 EARLY RFC] Btrfs: Get rid of whole page I/O.
On Thu, Mar 20, 2014 at 08:50:27AM +0530, Aneesh Kumar K.V wrote: On Tue, Mar 18, 2014 at 01:48:00PM +0630, chandan wrote: The earlier patchset posted by Chandra Seethraman was to get 4k blocksize to work with ppc64's 64k PAGE_SIZE. Are we talking about metadata block sizes or data block sizes? The root node of tree root tree has 1957 bytes being written by make_btrfs() (in btrfs-progs). Hence I chose to do 2k blocksize for the initial subpagesize-blocksize work. So with this patchset the supported blocksizes would be in the range 2k-64k. So it's metadata blocks, and in this case 2k looks like the only allowed size that's smaller than 4k, and thus can demonstrage sub-page size allocations. I'm not sure if this is limiting for potential future extensions of metadata structures that could be larger. 2k is ok for testing purposes, but I think a 4k-page machine will hardly use a smaller page size. The more that 16k metadata blocks are now default. The goal is to remove the assumption that supported blocks size is = page size. The primary reason to do that is to support migration of disk devices across different architectures. If we have a btrfs disk created on x86 box with data blocksize 4K and meta data block size 16K we should make sure that, the disk can be read/written from ppc64 box (which have a page size of 64K). To enable easy testing and community development we are now focusing on achieving 2K data blocksize and 2K meata data block size on x86. As you said this will never be used in production. To achieve that we did the below *) Add offset and len to btrfs_io_bio. These are file offsets and len. This is later used to unlock extent io tree. *) Now we also need to make sure that submit_extent_page only submit contiguous range in the file offset range. ie if we have holes in between we split them into two submit_extent_page. This ensures that btrfs_io_bio offset and len represent a contiguous range. Please let us know whether the above approach is acceptable. I don't see any apparent problem with this approach. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: allow mounting btrfs subvolumes with different ro/rw options
On Tue, Nov 19, 2013 at 09:35:32AM -0500, Chris Mason wrote: Quoting har...@redhat.com (2013-11-19 05:36:05) From: Harald Hoyer har...@redhat.com [ create new vfsmounts with different states ] Thanks for resending Harald. I'll give this a shot and see if I can find any problems with it. Is this patch queued for 3.15? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand.
On Wed, Apr 02, 2014 at 09:24:10AM -0400, Chris Mason wrote: On 04/02/2014 04:29 AM, Qu Wenruo wrote: Convert the old btrfs man pages to new asciidoc and split the huge btrfs man page into subcommand man page. The asciidoc style and Makefile things are mostly simplified from git Documentation, which only supports man page output and remove html output, since html output is somewhat overkilled for btrfs. Thanks for doing this. I've never liked roff, but I'll give people on the list a chance to complain before taking this one. Moving to asciidoc is definitely the right thing to do. Dave's also been moving the xfs docs in this direction. http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=summary - z -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hard restart required
Trying out BTRFS today for the first time after a few false starts with three drives configured raid1. I'm using Oracle Linux 6/64 with all updates applied on unremarkable hardware. (cast off Dell Core2 desktop, 2 GB RAM, no known problems) Here's what it looks like: [root@oracle ~]# btrfs filesystem show Label: none uuid: bdaf3d87-f992-4a89-9e2b-41de0b5ff909 Total devices 3 FS bytes used 448.82MB devid2 size 1.36TB used 167.01GB path /dev/sdc devid3 size 1.82TB used 167.01GB path /dev/sdd devid1 size 931.51GB used 2.02GB path /dev/sdb It hard locked and required a power-off system restart. Is this atypical? Here's what did it, this should be OK? # mount -U bdaf3d87-f992-4a89-9e2b-41de0b5ff909 /media/btrfs; UUID should work, right? Why else have a UUID if not? Currently, I'm using below in /etc/fstab, but this is not preferred since I'm expecting hard disks to come and go in my eventual use case - I'm rather certain that /dev/sdc will not be the correct drive to mount at some point: /dev/sdc/backups/spfs btrfs noatime,subvol=spfs,compress0 0 Thanks, -Ben -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hard restart required
On 4 Apr 2014, at 11:26 am, Lists li...@benjamindsmith.com wrote: UUID should work, right? Why else have a UUID if not? UUID should work fine on OL6. Can you confirm that you have the UEK3 (3.8.18) kernel running? If you’ve installed from OL6U5 media, it should be enabled by default, but older OL6 ISOs only had UEK2 on the media and the UEK3 yum channel would need to be manually enabled. Cheers, Avi -- Oracle http://www.oracle.com Avi Miller | Product Management Director | +61 (3) 8616 3496 Oracle Linux and Virtualization 417 St Kilda Road, Melbourne, Victoria 3004 Australia -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hard restart required
On 04/04/2014 08:26, Lists wrote: Trying out BTRFS today for the first time after a few false starts with three drives configured raid1. I'm using Oracle Linux 6/64 with all updates applied on unremarkable hardware. (cast off Dell Core2 desktop, 2 GB RAM, no known problems) Here's what it looks like: [root@oracle ~]# btrfs filesystem show Label: none uuid: bdaf3d87-f992-4a89-9e2b-41de0b5ff909 Total devices 3 FS bytes used 448.82MB devid2 size 1.36TB used 167.01GB path /dev/sdc devid3 size 1.82TB used 167.01GB path /dev/sdd devid1 size 931.51GB used 2.02GB path /dev/sdb It hard locked and required a power-off system restart. Is this atypical? Here's what did it, this should be OK? # mount -U bdaf3d87-f992-4a89-9e2b-41de0b5ff909 /media/btrfs; Does messages around the time of the problem say anything ? Thanks, Anand UUID should work, right? Why else have a UUID if not? Currently, I'm using below in /etc/fstab, but this is not preferred since I'm expecting hard disks to come and go in my eventual use case - I'm rather certain that /dev/sdc will not be the correct drive to mount at some point: /dev/sdc/backups/spfs btrfs noatime,subvol=spfs,compress0 0 Thanks, -Ben -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: fix reversed warning condition in btrfs_delayed_inode_reserve_metadata
On Thu, Apr 03, 2014 at 06:18:40PM +0200, David Sterba wrote: On Thu, Apr 03, 2014 at 01:34:23PM +0800, Liu Bo wrote: On Wed, Apr 02, 2014 at 07:13:00PM +0200, David Sterba wrote: Commit fae7f21cece9a4c181 (btrfs: Use WARN_ON()'s return value in place of WARN_ON(1)) cleaned up WARN_ON usage and in one place reversed the condition that led to loads of warnings that were not supposed to occur. WARN_ON will trigger because it sees 'ret' though in the previous code did not reach the WARN_ON below. The correct pattern is - if (condition) + if (WARN_ON(condition)) CC: Dulshani Gunawardhana dulshani.gunawardhan...@gmail.com CC: sta...@vger.kernel.org # 3.13 Reported-by: Liu Bo bo.li@oracle.com Signed-off-by: David Sterba dste...@suse.cz --- fs/btrfs/delayed-inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 451b00c86f6c..098af20abd88 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -649,7 +649,7 @@ static int btrfs_delayed_inode_reserve_metadata( goto out; ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes); - if (!WARN_ON(ret)) + if (WARN_ON(!ret)) goto out; Oh sorry, I'd have to get my Reviewed-by back and give a NACK instead. With this patch, (ret = 0) triggers the WARNING, which is not right. Thanks for catching this, you're right, my patch was wrong. I must say the patch (fae7f21ce) made the code harder to read at some places, I don't see much help in removing plain WARN_ON(1) at this cost. I agree, I prefer the original code which is easier to understand, if (!ret) goto out; WARN_ON(1); Back to the warning flood you observed, the comment under the warning says: 655 /* 656 * Ok this is a problem, let's just steal from the global rsv 657 * since this really shouldn't happen that often. 658 */ 659 ret = btrfs_block_rsv_migrate(root-fs_info-global_block_rsv, 660 dst_rsv, num_bytes); so the question is why it does happen so often. A WARN_ON_ONCE hides the severity of the problem, so I'd rather suggest to put it under enospc_debug option so we can debug it and it does not bother users. As this is closer to the way you were going to fix that, I'm not sending a patch, take this as a review comment. The comment was based on some assumptions which could be wrong according to my observation. -liubo -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html