Re: BTRFS hangs - possibly NFS related?

2014-04-03 Thread Duncan
kim-btrfs posted on Tue, 01 Apr 2014 13:56:06 +0100 as excerpted:

 Apologies if this is known, but I've been lurking a while on the list
 and not seen anything similar - and I'm running out of ideas on what to
 do next to debug it.
 
 Small HP microserver box, running Debian, EXT4 system disk plus 4 disk
 BTRFS array shared over NFS (nfs-kernel-server) and SMB - the disks
 recently moved from a different box where they've been running
 faultlessly for months, although that didn't use NFS.

First off I have absolutely zero experience with NFS or SMB, so if it has 
anything at all to do with that, I'd be clueless.  That said, I do know a 
few other things to look at, and some idea of how to look at them.  The 
below is what I'd be looking at were it me.

 Under reasonable combined NFS and SMB load with only a couple of
 clients, the shares lock up, load average on server and clients goes
 high and stays high (10-12) and stays there. Apparently not actually
 CPU and there's little if any disk activity on the server.

First thing, high load, but little CPU and little I/O.  That's very 
strange, but there's a few things besides that to check to see if you can 
run down where all that load is going.

With the right tools CPU/load can be categorized into several areas, low-
priority/niced, normal, kernel, IRQ, soft-IRQ, IO-wait, steal, guest, 
altho steal and guest are VM related (steal is CPU taken by the hypervisor 
or another guest if measured from within a guest, and thus not available 
to it, quest is of course guests, when measured from the hypervisor) and 
will be zero if you're not running them, and irq and soft-irq won't show 
much either in the normal case.  And of course niced doesn't show either 
unless you're running something niced.

What I'm wondering here is if it's all going to IO-wait as I suspect... 
or something else.

If you don't have a tool that shows all that, one available tool that 
does is htop.  It's a better top, ncurses/semi-gui-based so run it in a 
terminal window or text-login VT.

Of course you can see which threads are using all that CPU-time load 
that isn't, while you're at it.

Also check out iotop, to see what processes are actually doing IO and the 
total IO speed.  Both these tools have manpages...

What could be interesting is what happens when you do that sync.  Does a 
thread or several threads spring to life momentarily (say in iotop) and 
then idle again, or... ?

 Killing NFS and/or Samba sometimes helps, but it's always back when the
 load comes back on. Chased round NFS and Samba options, then find that
 when the clients hang it's unresponsive on the server directly to the
 disk.
 
 Notice  a btrfs-transacti process hung in d.As are all the NFS
 processes:
 
 3779 ?S 0:00 [nfsd4]
 3780 ?S 0:00 [nfsd4_callbacks]
 3782 ?D  0:27 [nfsd]
 3783 ?D  0:27 [nfsd]
 3784 ?D  0:28 [nfsd]
 3785 ?D  0:26 [nfsd]
 
 sync instantly unsticks everything and it all works again for another
 couple of minutes, when it locks up again, same symptoms. Nothing
 apparently written to kern.log or dmesg, which has been the frustration
 all through - I don't know where to find the culprit!
 
 As a band-aid I've put btrfs filesystem sync /mnt/btrfs
 
 In the crontab once a minute which is actually working just fine  and
 has been all morning - every 5 minutes was not enough.
 
 Any recommendations on where I can look next, or any known holes I've
 fallen in.?  Do I need to force NFS clients to sync in their mount
 options?
 
 
 Background:
 Kernel - 3.13-1-amd64 #1 SMP Debian 3.13.7-1 (2014-03-25)AMD N54L
 with 10GB RAM.
 
 ##
   Total devices 4 FS bytes used 848.88GiB
   devid2 size 465.76GiB used 319.03GiB path /dev/sdc
   devid4 size 465.76GiB used 319.00GiB path /dev/sda
   devid5 size 455.76GiB used 309.03GiB path /dev/sdb2
   devid6 size 931.51GiB used 785.00GiB path /dev/sdd
 
 ##

OK, so you're not full allocation.  No problem there.

 Data, RAID1: total=864.00GiB, used=847.86GiB
 System, RAID1: total=32.00MiB, used=128.00KiB
 Metadata, RAID1: total=2.00GiB, used=1009.93MiB

That looks healthy. 

 A scrub passes without finding any errors.
 
 There are a couple of VM images with light traffic which do fragment a
 little but I manually defrag those every day so often and I haven't had
 any problems there - it certainly isn't thrashing.

If you've been following the list, I'm surprised you didn't mention 
whether you're doing snapshotting at all.  I'll assume that means no, or 
only very light/manual snapshotting (as I have here).


My guess is that it might be fragmentation of something other than the 
VMs.  You're not mounting with autodefrag, I take it?  What about 
compress?  Do you have any other large actively written files, perhaps 
databases or pre-allocated-file 

Re: [Help] Errors found in extent allocation tree or chunk allocation

2014-04-03 Thread Duncan
Michael Witten posted on Tue, 01 Apr 2014 19:05:16 + as excerpted:

 The `btrfs balance' completed successfully without error, and DID solve
 my issues; it relocated every chunk, after which `btrfsck'
 ran smoothly.
 
 Thanks for the advice! You've put me at ease, and you've saved me a lot
 of time and energy.

Very good! =:^)  Thanks for the fix-report.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix reversed warning condition in btrfs_delayed_inode_reserve_metadata

2014-04-03 Thread David Sterba
On Thu, Apr 03, 2014 at 01:34:23PM +0800, Liu Bo wrote:
 On Wed, Apr 02, 2014 at 07:13:00PM +0200, David Sterba wrote:
  Commit fae7f21cece9a4c181 (btrfs: Use WARN_ON()'s return value in place of
  WARN_ON(1)) cleaned up WARN_ON usage and in one place reversed the 
  condition
  that led to loads of warnings that were not supposed to occur.
  
  WARN_ON will trigger because it sees 'ret' though in the previous code
  did not reach the WARN_ON below. The correct pattern is
  
  -   if (condition)
  +   if (WARN_ON(condition))
  
  CC: Dulshani Gunawardhana dulshani.gunawardhan...@gmail.com
  CC: sta...@vger.kernel.org # 3.13
  Reported-by: Liu Bo bo.li@oracle.com
  Signed-off-by: David Sterba dste...@suse.cz
  ---
   fs/btrfs/delayed-inode.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
  index 451b00c86f6c..098af20abd88 100644
  --- a/fs/btrfs/delayed-inode.c
  +++ b/fs/btrfs/delayed-inode.c
  @@ -649,7 +649,7 @@ static int btrfs_delayed_inode_reserve_metadata(
  goto out;
   
  ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
  -   if (!WARN_ON(ret))
  +   if (WARN_ON(!ret))
  goto out;
 
 Oh sorry, I'd have to get my Reviewed-by back and give a NACK instead.
 
 With this patch, (ret = 0) triggers the WARNING, which is not right.

Thanks for catching this, you're right, my patch was wrong. I must say
the patch (fae7f21ce) made the code harder to read at some places, I
don't see much help in removing plain WARN_ON(1) at this cost.

Back to the warning flood you observed, the comment under the warning
says:

655 /*
656  * Ok this is a problem, let's just steal from the global 
rsv
657  * since this really shouldn't happen that often.
658  */
659 ret = 
btrfs_block_rsv_migrate(root-fs_info-global_block_rsv,
660   dst_rsv, num_bytes);

so the question is why it does happen so often.

A WARN_ON_ONCE hides the severity of the problem, so I'd rather suggest
to put it under enospc_debug option so we can debug it and it does not
bother users. As this is closer to the way you were going to fix that,
I'm not sending a patch, take this as a review comment.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6 EARLY RFC] Btrfs: Get rid of whole page I/O.

2014-04-03 Thread David Sterba
On Thu, Mar 20, 2014 at 08:50:27AM +0530, Aneesh Kumar K.V wrote:
  On Tue, Mar 18, 2014 at 01:48:00PM +0630, chandan wrote:
  The earlier patchset posted by Chandra Seethraman was to get 4k
  blocksize to work with ppc64's 64k PAGE_SIZE.
 
  Are we talking about metadata block sizes or data block sizes?
 
  The root node of tree root tree has 1957 bytes being written by
  make_btrfs() (in btrfs-progs).  Hence I chose to do 2k blocksize for
  the initial subpagesize-blocksize work. So with this patchset the
  supported blocksizes would be in the range 2k-64k.
 
  So it's metadata blocks, and in this case 2k looks like the only
  allowed size that's smaller than 4k, and thus can demonstrage sub-page
  size allocations. I'm not sure if this is limiting for potential future
  extensions of metadata structures that could be larger.
 
  2k is ok for testing purposes, but I think a 4k-page machine will hardly
  use a smaller page size. The more that 16k metadata blocks are now
  default.
 
 The goal is to remove the assumption that supported blocks size is = page
 size. The primary reason to do that is to support migration of disk
 devices across different architectures. If we have a btrfs disk created
 on x86 box with data blocksize 4K and meta data block size 16K we should
 make sure that, the disk can be read/written from ppc64 box (which have a page
 size of 64K). To enable easy testing and community development we are
 now focusing on achieving 2K data blocksize and 2K meata data block size
 on x86. As you said this will never be used in production.
 
 To achieve that we did the below
 *) Add offset and len to btrfs_io_bio. These are file offsets and
 len. This is later used to unlock extent io tree.
 
 *) Now we also need to make sure that submit_extent_page only submit
  contiguous range in the file offset range. ie if we have holes in
  between we split them into two submit_extent_page.  This ensures that
  btrfs_io_bio offset and len represent a contiguous range.
 
 Please let us know whether the above approach is acceptable.

I don't see any apparent problem with this approach.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: allow mounting btrfs subvolumes with different ro/rw options

2014-04-03 Thread David Sterba
On Tue, Nov 19, 2013 at 09:35:32AM -0500, Chris Mason wrote:
 Quoting har...@redhat.com (2013-11-19 05:36:05)
  From: Harald Hoyer har...@redhat.com
  
 [ create new vfsmounts with different states ]
 
 Thanks for resending Harald.  I'll give this a shot and see if I can
 find any problems with it.

Is this patch queued for 3.15? Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/27] Replace the old man page with asciidoc and man page for each btrfs subcommand.

2014-04-03 Thread Zach Brown
On Wed, Apr 02, 2014 at 09:24:10AM -0400, Chris Mason wrote:
 On 04/02/2014 04:29 AM, Qu Wenruo wrote:
 Convert the old btrfs man pages to new asciidoc and split the huge
 btrfs man page into subcommand man page.
 
 The asciidoc style and Makefile things are mostly simplified from git
 Documentation, which only supports man page output and remove html output,
 since html output is somewhat overkilled for btrfs.
 
 Thanks for doing this.  I've never liked roff, but I'll give people
 on the list a chance to complain before taking this one.

Moving to asciidoc is definitely the right thing to do.  Dave's also
been moving the xfs docs in this direction.

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/xfs-documentation.git;a=summary

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hard restart required

2014-04-03 Thread Lists
Trying out BTRFS today for the first time after a few false starts with 
three drives configured raid1. I'm using Oracle Linux 6/64 with all 
updates applied on unremarkable hardware. (cast off Dell Core2 desktop, 
2 GB RAM, no known problems) Here's what it looks like:


[root@oracle ~]# btrfs filesystem show
Label: none  uuid: bdaf3d87-f992-4a89-9e2b-41de0b5ff909
Total devices 3 FS bytes used 448.82MB
devid2 size 1.36TB used 167.01GB path /dev/sdc
devid3 size 1.82TB used 167.01GB path /dev/sdd
devid1 size 931.51GB used 2.02GB path /dev/sdb

It hard locked and required a power-off system restart. Is this 
atypical? Here's what did it, this should be OK?


# mount -U bdaf3d87-f992-4a89-9e2b-41de0b5ff909 /media/btrfs;

UUID should work, right? Why else have a UUID if not? Currently, I'm 
using below in /etc/fstab, but this is not preferred since I'm expecting 
hard disks to come and go in my eventual use case - I'm rather certain 
that /dev/sdc will not be the correct drive to mount at some point:


/dev/sdc/backups/spfs btrfs noatime,subvol=spfs,compress0 0


Thanks,

-Ben
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard restart required

2014-04-03 Thread Avi Miller

On 4 Apr 2014, at 11:26 am, Lists li...@benjamindsmith.com wrote:

 UUID should work, right? Why else have a UUID if not? 

UUID should work fine on OL6. Can you confirm that you have the UEK3 (3.8.18) 
kernel running? If you’ve installed from OL6U5 media, it should be enabled by 
default, but older OL6 ISOs only had UEK2 on the media and the UEK3 yum channel 
would need to be manually enabled.

Cheers,
Avi

--
Oracle http://www.oracle.com
Avi Miller | Product Management Director | +61 (3) 8616 3496
Oracle Linux and Virtualization
417 St Kilda Road, Melbourne, Victoria 3004 Australia

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Hard restart required

2014-04-03 Thread Anand Jain



On 04/04/2014 08:26, Lists wrote:

Trying out BTRFS today for the first time after a few false starts with
three drives configured raid1. I'm using Oracle Linux 6/64 with all
updates applied on unremarkable hardware. (cast off Dell Core2 desktop,
2 GB RAM, no known problems) Here's what it looks like:

[root@oracle ~]# btrfs filesystem show
Label: none  uuid: bdaf3d87-f992-4a89-9e2b-41de0b5ff909
 Total devices 3 FS bytes used 448.82MB
 devid2 size 1.36TB used 167.01GB path /dev/sdc
 devid3 size 1.82TB used 167.01GB path /dev/sdd
 devid1 size 931.51GB used 2.02GB path /dev/sdb

It hard locked and required a power-off system restart. Is this
atypical? Here's what did it, this should be OK?

# mount -U bdaf3d87-f992-4a89-9e2b-41de0b5ff909 /media/btrfs;


 Does messages around the time of the problem say anything ?

Thanks, Anand



UUID should work, right? Why else have a UUID if not? Currently, I'm
using below in /etc/fstab, but this is not preferred since I'm expecting
hard disks to come and go in my eventual use case - I'm rather certain
that /dev/sdc will not be the correct drive to mount at some point:

/dev/sdc/backups/spfs btrfs noatime,subvol=spfs,compress0 0


Thanks,

-Ben
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix reversed warning condition in btrfs_delayed_inode_reserve_metadata

2014-04-03 Thread Liu Bo
On Thu, Apr 03, 2014 at 06:18:40PM +0200, David Sterba wrote:
 On Thu, Apr 03, 2014 at 01:34:23PM +0800, Liu Bo wrote:
  On Wed, Apr 02, 2014 at 07:13:00PM +0200, David Sterba wrote:
   Commit fae7f21cece9a4c181 (btrfs: Use WARN_ON()'s return value in place 
   of
   WARN_ON(1)) cleaned up WARN_ON usage and in one place reversed the 
   condition
   that led to loads of warnings that were not supposed to occur.
   
   WARN_ON will trigger because it sees 'ret' though in the previous code
   did not reach the WARN_ON below. The correct pattern is
   
   -   if (condition)
   +   if (WARN_ON(condition))
   
   CC: Dulshani Gunawardhana dulshani.gunawardhan...@gmail.com
   CC: sta...@vger.kernel.org # 3.13
   Reported-by: Liu Bo bo.li@oracle.com
   Signed-off-by: David Sterba dste...@suse.cz
   ---
fs/btrfs/delayed-inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
   
   diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
   index 451b00c86f6c..098af20abd88 100644
   --- a/fs/btrfs/delayed-inode.c
   +++ b/fs/btrfs/delayed-inode.c
   @@ -649,7 +649,7 @@ static int btrfs_delayed_inode_reserve_metadata(
 goto out;

 ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
   - if (!WARN_ON(ret))
   + if (WARN_ON(!ret))
 goto out;
  
  Oh sorry, I'd have to get my Reviewed-by back and give a NACK instead.
  
  With this patch, (ret = 0) triggers the WARNING, which is not right.
 
 Thanks for catching this, you're right, my patch was wrong. I must say
 the patch (fae7f21ce) made the code harder to read at some places, I
 don't see much help in removing plain WARN_ON(1) at this cost.

I agree, I prefer the original code which is easier to understand,

if (!ret)
goto out;
WARN_ON(1);

 
 Back to the warning flood you observed, the comment under the warning
 says:
 
 655 /*
 656  * Ok this is a problem, let's just steal from the global 
 rsv
 657  * since this really shouldn't happen that often.
 658  */
 659 ret = 
 btrfs_block_rsv_migrate(root-fs_info-global_block_rsv,
 660   dst_rsv, num_bytes);
 
 so the question is why it does happen so often.
 
 A WARN_ON_ONCE hides the severity of the problem, so I'd rather suggest
 to put it under enospc_debug option so we can debug it and it does not
 bother users. As this is closer to the way you were going to fix that,
 I'm not sending a patch, take this as a review comment.

The comment was based on some assumptions which could be wrong according to
my observation.

-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html