Re: How btrfs-find-root knows that the block is actually a root?

2014-12-23 Thread Alex Lyakas
Hi Qu,

On Tue, Dec 23, 2014 at 7:27 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

  Original Message 
 Subject: How btrfs-find-root knows that the block is actually a root?
 From: Alex Lyakas alex.bt...@zadarastorage.com
 To: linux-btrfs linux-btrfs@vger.kernel.org
 Date: 2014年12月22日 22:57

 Greetings,

 I am looking at the code of search_iobuf() in
 btrfs-find-root.c.(3.17.3)  I see that we probe nodesize blocks one by
 one, and for each block we check:
 - its owner is what we are looking for
 - its header-bytenr is what we are looking at currently
 - its level is not too small
 - it has valid checksum
 - it has the desired generation

 If all those conditions are true, we declare this block as a root and
 end the program.

 How do we actually know that it's a root and not a leaf or an
 intermediate node? What if we are searching for a root of the root
 tree, which has one node and two leafs (all have the same highest
 transid), and one of the leafs has logical lower than the actual
 root, i.e., it comes first in our scan. Then we will declare this leaf
 as a root, won't we? Or somehow the root always has the lowest
 logical?

 You can refer to this patch:
 https://patchwork.kernel.org/patch/5285521/
I see that this has not been applied to any of David's branches. Do
you have a repo to look at this code in its entirety?


 Your questions are mostly right.
 The best method should be search through all the metadata, and only the
 highest level header for
 a given generation may be the root for that generation.

 But that method still has some problems.
 1) Overwritten old node/leaf
 As btrfs metadata cow happens, old node/leaf may be overwritten and become
 incompletely,
 so above method won't always work as expected.

 2) Corrupted fs
 That will makes everything not work as expected.
 But sadly, when someone needs to use btrfs-find-root, there is a high
 possibility the fs is already corrupted.

 3) Slow speed
 It needs to scan over all the sectors of metadata chunks, it may var from
 megabytese to tegabytes,
 which makes the complete scan impractical.
 So current find-root uses a trade-off, if find a header at the position
 superblock points to, and generation
 matches, then just consider it as the desired root and exit.
I think this is a bit optimistic. What if the root tree has several
leaves having the same generation as the root? Then we might declare a
leaf as a root and exit. But further recovery based on that output
will get us into trouble.



 Also, I am confused by this line:
 level = h_level;
 This means that if we encounter a block that seems good, we will
 skip all other blocks that have lower level. Is this intended?

 This is intended, for case user already know the root's level, so it will
 skip any header whose level is below it.
But this line is performed before the generation check. Let's say that
user did not specify any level (so search_level==0). Then assume we
encounter a block, which has lower generation than what we need, but
higher level. At this point, we do
level = h_level;
and we will skip any blocks lower than this level from now on. What if
the root tree got shirnked (due to subvolume deletion, for example),
and the good root has lower level? We will skip it then, and will
not find the root.

Thanks for your comments,
Alex.



 Thanks,
 Qu


 Thanks,
 Alex.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 06/17] btrfs: make use of immutable biovecs

2014-12-23 Thread Christoph Hellwig
This seems like it could be applied without the rest of the series,
right?  Might be worth to get it into the btrfs tree ASAP?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 06/17] btrfs: make use of immutable biovecs

2014-12-23 Thread Dongsu Park
On 23.12.2014 02:35, Christoph Hellwig wrote:
 This seems like it could be applied without the rest of the series,
 right?  Might be worth to get it into the btrfs tree ASAP?

Ah, you're right.
While patch #5 must be in this series, patch #6 does not necessarily
have to be included. This conversion should belong to the multipage
bvecs series. So I'll skip #6 in the next round.

Thanks,
Dongsu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Standards Problems [Was: [PATCH v2 1/3] Btrfs: get more accurate output in df command.]

2014-12-23 Thread Dongsheng Yang

On 12/18/2014 12:07 PM, Robert White wrote:
I don't disagree with the _ideal_ of your patch. I just think that 
it's impossible to implement it without lying to the user or making 
things just as bad in a different way. I would _like_ you to be right. 
But my thing is finding and quantifying failure cases and the entire 
question is full of fail.



...


See you keep giving me these examples where the history of the 
filesystem is uniform. It was made a certain way and stayed that way. 
But in real life this sort of thing is going to happen and your patch 
simply report's a _different_ _wrong_ number. A _friendlier_ wrong 
number, I'll grant you that, but still wrong.




Hi Robert, sorry for the late. (It's busy to deal with some other work.)

Yes, you are providing some examples here to show that in some cases the
numbers from df is not helpful to understand the real space state in 
disk. But

I'm afraid we can not blame df reporting a wrong number. You could say
Hey df, you are stupid, we can store the small file in metadata to exceed
your @size.. But he just reports the information from what he is seeing 
from FS level honestly.
He is reporting what he can see, and do not care about the other things 
out of

his business.

The all special cases you provided in this thread, actually lead to one 
result that
Btrfs is special, df command will report an improper in some case.. It 
means
we need some other methods to tell user about the space info. And yes, 
we had. It's

btrfs fi df.

You are trying to make df showing information as more as possible, even 
change
the behaviour of df in btrfs to show the numbers of different meaning 
with what it
is in other filesystems. And my answer is keeping df command doing what 
he want,

and solve the problems in our private df , btrfs fi df.

Thanx
Yang

. xaE



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can BTRFS handle XATTRs larger than 4K?

2014-12-23 Thread Austin S Hemmelgarn

On 2014-12-22 19:08, Robert White wrote:

On 12/22/2014 02:55 PM, Richard Sharpe wrote:

On Mon, Dec 22, 2014 at 2:52 PM, Robert White rwh...@pobox.com wrote:

So skipping the full ADS, what's the current demand/payoff for large
XATTR space?


Windows Security Descriptors (sometimes incorrectly called ACLs)
stored by Samba.


Ah.

I know that Linux ACLs are fairly small per entry, I take it Windows'
can be much bigger?

Having just gone off an read a lot about the many ADS possible in
windows, they've sort of treated ever file as if it were the name of a
phantom directory limited depth... That is you seem to be able to create
any name as a stream name but you can't create any pathname as same.

The system-level API -- that is the complete retooling of SYS_open et al
-- and the requsite departure from POSIX -- seems unlikely.

On the antipode, it seems like being able to put an inode reference key
type (e.g. a name,inode pair as one of the metadata entries) could
relieve the space constraint for a limited number of entires. The
contents of that inode's data region would become the value of the
single attribute.

Does that relieve Security Descriptor burden? Is each descriptor a
separate attribute or are all the descriptors held in one attribute as a
list-of?

Going full phantom directory to match Windows just seems like we get
into the business of replacing whole kernel tidbits a la the
inner-system effect.



Personally, I'm thinking something more like the OS X Forks API than the 
Windows ADS API.  Forks are actually the original reason that windows 
added ADS to NTFS, and their API is much cleaner than the windows one. 
The standard there is that forks are accessible as 
'filename/..namedfork/forkname'  ..namedfork doesn't get listed by 
ls -a, but you can explicitly list 'filename/..namedfork/' to get a 
list of forks other than the data fork.  Such an API needs almost zero 
modification to existing programs (just enough to get tar and friends to 
check if there are any named forks for each file).


The significant advantages I see to having forks are:
 1. Windows Security Descriptors
 2. Macintosh Resource Forks
 3. BeOS (and others) style associated metadata (ie, associated file 
icons/thumbnails, per-file program associations, in other words, the 
same sort of stuff that OSX uses resource forks for)

 4. Better flexibility for stuff like IMA and EVM




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC PATCH 05/17] btrfs: remove bio splitting and merge_bvec_fn() calls

2014-12-23 Thread Chris Mason
On Mon, Dec 22, 2014 at 6:48 AM, Dongsu Park 
dongsu.p...@profitbricks.com wrote:

From: Kent Overstreet k...@daterainc.com

Btrfs has been doing bio splitting from btrfs_map_bio(), by checking
device limits as well as calling -merge_bvec_fn() etc. That is not
necessary any more, because generic_make_request() is now able to
handle arbitrarily sized bios. So clean up unnecessary code paths.

Signed-off-by: Kent Overstreet k...@daterainc.com
[dpark: add more description in commit message]
Signed-off-by: Dongsu Park dongsu.p...@profitbricks.com
Cc: Chris Mason c...@fb.com
Cc: Josef Bacik jba...@fb.com
Cc: linux-btrfs@vger.kernel.org
---


Looks good, I'll test it here too.  Thanks!

Signed-off-by: Chris Mason c...@fb.com



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: rebuild missing block group during chunk recovery if possible

2014-12-23 Thread Alex Lyakas
Hi Qu,

On Thu, Oct 30, 2014 at 4:54 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Before the patch, chunk will be considered bad if the corresponding
 block group is missing, even the only uncertain data is the 'used'
 member of the block group.

 This patch will try to recalculate the 'used' value of the block group
 and rebuild it.
 So even only chunk item and dev extent item is found, the chunk can be
 recovered.
 Although if extent tree is damanged and needed extent item can't be
 read, the block group's 'used' value will be the block group length, to
 prevent any later write/block reserve damaging the block group.
 In that case, we will prompt user and recommend them to use
 '--init-extent-tree' to rebuild extent tree if possible.

 Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
 ---
  btrfsck.h   |   3 +-
  chunk-recover.c | 242 
 +---
  cmds-check.c|  29 ---
  3 files changed, 234 insertions(+), 40 deletions(-)

 diff --git a/btrfsck.h b/btrfsck.h
 index 356c767..7a50648 100644
 --- a/btrfsck.h
 +++ b/btrfsck.h
 @@ -179,5 +179,6 @@ btrfs_new_device_extent_record(struct extent_buffer *leaf,
  int check_chunks(struct cache_tree *chunk_cache,
  struct block_group_tree *block_group_cache,
  struct device_extent_tree *dev_extent_cache,
 -struct list_head *good, struct list_head *bad, int silent);
 +struct list_head *good, struct list_head *bad,
 +struct list_head *rebuild, int silent);
  #endif
 diff --git a/chunk-recover.c b/chunk-recover.c
 index 6f43066..dbf98b5 100644
 --- a/chunk-recover.c
 +++ b/chunk-recover.c
 @@ -61,6 +61,7 @@ struct recover_control {

 struct list_head good_chunks;
 struct list_head bad_chunks;
 +   struct list_head rebuild_chunks;
 struct list_head unrepaired_chunks;
 pthread_mutex_t rc_lock;
  };
 @@ -203,6 +204,7 @@ static void init_recover_control(struct recover_control 
 *rc, int verbose,

 INIT_LIST_HEAD(rc-good_chunks);
 INIT_LIST_HEAD(rc-bad_chunks);
 +   INIT_LIST_HEAD(rc-rebuild_chunks);
 INIT_LIST_HEAD(rc-unrepaired_chunks);

 rc-verbose = verbose;
 @@ -529,22 +531,32 @@ static void print_check_result(struct recover_control 
 *rc)
 return;

 printf(CHECK RESULT:\n);
 -   printf(Healthy Chunks:\n);
 +   printf(Recoverable Chunks:\n);
 list_for_each_entry(chunk, rc-good_chunks, list) {
 print_chunk_info(chunk,   );
 good++;
 total++;
 }
 -   printf(Bad Chunks:\n);
 +   list_for_each_entry(chunk, rc-rebuild_chunks, list) {
 +   print_chunk_info(chunk,   );
 +   good++;
 +   total++;
 +   }
 +   list_for_each_entry(chunk, rc-unrepaired_chunks, list) {
 +   print_chunk_info(chunk,   );
 +   good++;
 +   total++;
 +   }
 +   printf(Unrecoverable Chunks:\n);
 list_for_each_entry(chunk, rc-bad_chunks, list) {
 print_chunk_info(chunk,   );
 bad++;
 total++;
 }
 printf(\n);
 -   printf(Total Chunks:\t%d\n, total);
 -   printf(  Heathy:\t%d\n, good);
 -   printf(  Bad:\t%d\n, bad);
 +   printf(Total Chunks:\t\t%d\n, total);
 +   printf(  Recoverable:\t\t%d\n, good);
 +   printf(  Unrecoverable:\t%d\n, bad);

 printf(\n);
 printf(Orphan Block Groups:\n);
 @@ -555,6 +567,7 @@ static void print_check_result(struct recover_control *rc)
 printf(Orphan Device Extents:\n);
 list_for_each_entry(devext, rc-devext.no_chunk_orphans, chunk_list)
 print_device_extent_info(devext,   );
 +   printf(\n);
  }

  static int check_chunk_by_metadata(struct recover_control *rc,
 @@ -938,6 +951,11 @@ static int build_device_maps_by_chunk_records(struct 
 recover_control *rc,
 if (ret)
 return ret;
 }
 +   list_for_each_entry(chunk, rc-rebuild_chunks, list) {
 +   ret = build_device_map_by_chunk_record(root, chunk);
 +   if (ret)
 +   return ret;
 +   }
 return ret;
  }

 @@ -1168,12 +1186,31 @@ static int __rebuild_device_items(struct 
 btrfs_trans_handle *trans,
 return ret;
  }

 +static int __insert_chunk_item(struct btrfs_trans_handle *trans,
 +   struct chunk_record *chunk_rec,
 +   struct btrfs_root *chunk_root)
 +{
 +   struct btrfs_key key;
 +   struct btrfs_chunk *chunk = NULL;
 +   int ret = 0;
 +
 +   chunk = create_chunk_item(chunk_rec);
 +   if (!chunk)
 +   return -ENOMEM;
 +   key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
 +   key.type = BTRFS_CHUNK_ITEM_KEY;
 +   key.offset = chunk_rec-offset;
 +
 +   ret = 

[PATCH] btrfs-progs: Fix btrfs fi show by uuid and label

2014-12-23 Thread Justin Maggard
Commit 8be2fff (btrfs-progs: apply realpath for btrfs fi
show when mount point is given) changed the behavior of
btrfs fi show to return an error if the call to realpath()
failed.  This broke the ability to specify a filesystem by
uuid or label.

So let's not consider a failed call to realpath() as an
error.  If the user really specified a bad device, just
return nothing like we did before.
---
 cmds-filesystem.c |9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 8f037dd..a654e6f 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -901,13 +901,8 @@ static int cmd_show(int argc, char **argv)
 * realpath do  /mnt/btrfs/  = /mnt/btrfs
 * which shall be recognized by btrfs_scan_kernel()
 */
-   if (!realpath(search, path)) {
-   fprintf(stderr, ERROR: Could not show %s: %s\n,
-   search, strerror(errno));
-   return 1;
-   }
-
-   search = path;
+   if (realpath(search, path))
+   search = path;
 
/*
 * Needs special handling if input arg is block dev And if
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Quota limit question

2014-12-23 Thread Christian Robottom Reis
On Tue, Dec 16, 2014 at 11:15:37PM -0200, Christian Robottom Reis wrote:
 # btrfs qgroup limit 2000m 0/261 .  touch x
 touch: cannot touch ‘x’: Disk quota exceeded
 
 The strange thing is that it doesn't seem to be actually out of space:
 
 # btrfs qgroup show -p -r -e /var | grep 261
 0/261810048  391114752   2097152000  0  ---   

Replying to myself as I had not yet been subscribed in time to receive a
reply; I just upgraded to 3.18.1 and am seeing the same issue on the
same subvolume (and on no others).

root@riff:/etc# uname -a
Linux riff 3.18.1-031801-generic #201412170637 SMP Wed Dec 17 11:38:50
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

It's quite odd that this specific subvolume acts up, given that there
are quite a few others that are closer to the quota:

subvol  grouptotal   unshared
-
(unknown)   0/5  1.37G / none  1.16G /   none
lxc-template1/rootfs0/2590.68G / none  0.10G /  2.00G
machine-2/rootfs0/2611.07G / none  0.40G /  2.00G
machine-3/rootfs0/2651.17G / none  0.41G /  2.00G
lxc-template2/rootfs0/2710.77G / none  0.31G /  2.00G
lxc-template3/rootfs0/2740.46G / none  0.02G /  2.00G
machine-4/rootfs0/2837.12G / none  6.21G / 10.00G
machine-5/rootfs0/2881.05G / none  0.34G /  2.00G
machine-6/rootfs0/289   11.33G / none 10.74G / 15.00G
machine-7/rootfs0/2901.30G / none  0.68G /  2.00G
machine-8/rootfs0/2921.00G / none  0.33G /  2.00G
machine-9/rootfs0/2931.17G / none  0.38G /  2.00G
machine-10/rootfs   0/3061.34G / none  0.62G /  2.00G
machine-11/rootfs   0/3189.49G / none  8.75G / 15.00G
lxc-template4/rootfs0/3200.79G / none  0.78G /  2.00G
machine-14/rootfs   0/3231.10G / none  0.45G /  2.00G

The LWN article suggests that btrfs is quite conservative with quotas,
but shouldn't 265, 290, 306, 320 and 323 all be out of quota as well? Or
is there a lot else that goes into the calculation beyond the numbers
reported by btrfs qgroup show?

What could I do to help investigate further?
-- 
Christian Robottom Reis | [+55 16] 3376 0125   | http://async.com.br/~kiko
CEO, Async Open Source  | [+55 16] 9 9112 6430 | http://launchpad.net/~kiko
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: add regression test for remount with thread_pool resized

2014-12-23 Thread Dave Chinner
On Tue, Dec 23, 2014 at 03:07:32PM +0800, Qu Wenruo wrote:
 
  Original Message 
 Subject: Re: [PATCH] btrfs: add regression test for remount with
 thread_pool resized
 From: gux.fnst gux.f...@cn.fujitsu.com
 To: fste...@vger.kernel.org
 Date: 2014年09月17日 18:02
 Hi,
 
  Could anyone help to review this patch? Thanks!
 
 Regards,
 Xing Gu
 
 On 05/30/2014 04:52 PM, Xing Gu wrote:
 Regression test for resizing 'thread_pool' when remount the fs.
 
 Signed-off-by: Xing Gu gux.f...@cn.fujitsu.com
 ---
tests/btrfs/055 | 55 
  +
tests/btrfs/055.out |  1 +
tests/btrfs/group   |  1 +
3 files changed, 57 insertions(+)
create mode 100755 tests/btrfs/055
create mode 100644 tests/btrfs/055.out
 
 diff --git a/tests/btrfs/055 b/tests/btrfs/055
 new file mode 100755
 index 000..0a0dd34
 --- /dev/null
 +++ b/tests/btrfs/055
 @@ -0,0 +1,55 @@
 +#!/bin/bash
 +# FS QA Test No. btrfs/055
 +#
 +# Regression test for resizing 'thread_pool' when remount the fs.
 +#
 IMHO, it would be better to add the commit number for the regression
 and the fix commit number
 (Filipe Manana always does the great job).
 
 For reference:
 Regresssion commit: 08a9ff3264181986d1d692a4e6fce3669700c9f8 (All my fault)
 Fix commit: 800ee2247f483b6d05ed47ef3bbc90b56451746c

That belongs in the commit message, not the test description.

-Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: add regression test for remount with thread_pool resized

2014-12-23 Thread Dave Chinner
On Tue, Dec 23, 2014 at 03:23:10PM +0800, Eryu Guan wrote:
 On Tue, Dec 23, 2014 at 01:41:02PM +0800, Xing Gu wrote:
  Regression test for a btrfs issue of resizing 'thread_pool'
  when remount the fs.
  
  Signed-off-by: Xing Gu gux.f...@cn.fujitsu.com
  ---
   tests/btrfs/017 | 64 
  +
   tests/btrfs/017.out |  1 +
   tests/btrfs/group   |  1 +
   3 files changed, 66 insertions(+)
   create mode 100755 tests/btrfs/017
   create mode 100644 tests/btrfs/017.out
  
  diff --git a/tests/btrfs/017 b/tests/btrfs/017
  new file mode 100755
  index 000..81f5af6
  --- /dev/null
  +++ b/tests/btrfs/017
  @@ -0,0 +1,64 @@
  +#!/bin/bash
  +# FS QA Test No. btrfs/017
  +#
  +# Regression test for a btrfs issue of resizing 'thread_pool' when
  +# remount the fs.
  +# This was fixed in the following linux kernel patch:
  +# 800ee22 btrfs: fix crash in remount(thread_pool=) case
  +#
  +#---
  +# Copyright (c) 2014 Fujitsu.  All Rights Reserved.
  +#
  +# This program is free software; you can redistribute it and/or
  +# modify it under the terms of the GNU General Public License as
  +# published by the Free Software Foundation.
  +#
  +# This program is distributed in the hope that it would be useful,
  +# but WITHOUT ANY WARRANTY; without even the implied warranty of
  +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  +# GNU General Public License for more details.
  +#
  +# You should have received a copy of the GNU General Public License
  +# along with this program; if not, write the Free Software Foundation,
  +# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  +#
  +#---
  +#
  +
  +seq=`basename $0`
  +seqres=$RESULT_DIR/$seq
  +echo QA output created by $seq
  +
  +here=`pwd`
  +tmp=/tmp/$$
  +status=1   # failure is the default!
  +
  +_cleanup()
  +{
  +rm -f $tmp.*
  +}
  +
  +trap _cleanup ; exit \$status 0 1 2 3 15
  +
  +# get standard environment, filters and checks
  +. ./common/rc
  +. ./common/filter
  +
  +# real QA test starts here
  +_supported_fs btrfs
  +_supported_os Linux
  +_require_scratch
  +
  +_scratch_mkfs  /dev/null
  +
  +dmesg -c  /dev/null
  +nr_null=`dmesg | grep -c kernel NULL pointer dereference`
 
 dmesg -c cleared dmesg buffer and nr_null is zero, I think this is
 duplicated work, either dmesg -c or check nr_null before remount.

Please don't clear the dmesg buffer - it means that people can't
review the kernel error messages across the entire test run for
errors unless they are capturing that information elsewhere.

And yes, I don't use syslog for capturing dmesg output on my test
boxes because that leads to root filesystem ENOSPC whenever I'm
dumping a lot of debug to the console

 Actually I'm thinking about checking dmesg log in check after each test,
 which could detect kernel BUG, WARNING and NULL pointer dereference etc.
 so test doesn't need to check dmesg itself. I'll send it out for review
 soon.

Some tests actually generate errors/warnings on purpose. IOWs, the
presence of messages in dmesg is not always indicative of an
error

  +_scratch_mount -o remount,thread_pool=10
  +new_null=`dmesg | grep -c kernel NULL pointer dereference`
  +if [ $new_null -ne $nr_null ]; then
  +   _fatal kernel bug detected, check dmesg for more infomation.
 
 _fail is better than _fatal, and don't forget to remove $seqres.full before
 test if you use _fail, cause it dumps errors to $seqres.full
 
 But simply echo the error message could fail the test too, I'm fine with
 either way..

Echo the error message, set status=1 and exit.

Oh, wait, that's what _fail does. ;)

  +exit
  diff --git a/tests/btrfs/017.out b/tests/btrfs/017.out
  new file mode 100644
  index 000..90c4b69
  --- /dev/null
  +++ b/tests/btrfs/017.out
  @@ -0,0 +1 @@
  +QA output created by 017

Empty output with no indication that it should be empty is bad form
as well. The test should echo something to indicate that an empty
output file is expected. Silence is golden is the usual string
used to indicate that the golden output expects no output from the
test.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Uncorrectable errors on RAID-1?

2014-12-23 Thread Zygo Blaxell
On Sun, Dec 21, 2014 at 05:25:47PM -0700, Chris Murphy wrote:
 For the kernel to automatically fix
 bad sectors by overwriting them, the drive needs to explicitly report
 read errors. If the SCSI command timer value is shorter than the
 drive's error recovery, the SATA link might get reset before the drive
 reports the read error and then uncorrected errors will persist
 instead of being automatically fixed.

Is there a way to tell the kernel to go ahead and assume that all timeouts
are effectively read errors?  For a simple non-removable hard disk (i.e.
not removable and not optical), that seems like a reasonable workaround
for an assortment of firmware brokenness.

I just did a quick survey of random drives here and found less than 10%
support smartctl -l scterc.  A lot of server drives (or at least the
drives that shipped in servers) don't have it, but laptop drives do.
Drives with firmware that has horrifying known bugs do also have this
feature.  :-P



signature.asc
Description: Digital signature


Re: btrfs is using 25% more disk than it should

2014-12-23 Thread Zygo Blaxell
On Sat, Dec 20, 2014 at 06:28:22AM -0500, Josef Bacik wrote:
 We now have two extents
 with the same bytenr but with different lengths.  
[...]
 Then there is the problem of actually returning the free space.  Now
 if we drop all of the refs for an extent we know the space is free
 and we return it to the allocator.  With the above example we can't
 do that anymore, we have to check the extent tree for any area that
 is left overlapping the area we just freed.  This add's another
 search to every btrfs_free_extent operation, which slows the whole
 system down and again leaves us with weird corner cases and pain for
 the users.  Plus this would be an incompatible format change so
 would require setting a feature flag in the fs and rolled to
 voluntarily.

Ouchie.

 Now I have another solution, but I'm not convinced it's awesome
 either.  Take the same example above, but instead we split the
 original extent in the extent tree so we avoid all the mess of
 having overlapping ranges

Would this work for a read-only snapshot?  For a read-write snapshot
it would be as if we had modified both (or all, if there are multiple
snapshots) versions of the tree with split extents.

 This wouldn't require a format change so everybody would get
 this behaviour as soon as we turned it on

It could be a mount option, like autodefrag, off by default until the
bugs were worked out.

Arguably there could be a 'garbage-collection tool' similar to 'btrfs
fi defrag', that could be used to clean out any large partially-obscured
extents from specific files.  This might be important for deduplication
as well (although the extent-same code looks like it does split extents?).

Definitely something to think about.  Thanks for the detailed
explanations.



signature.asc
Description: Digital signature


Re: Uncorrectable errors on RAID-1?

2014-12-23 Thread Chris Murphy
On Tue, Dec 23, 2014 at 2:16 PM, Zygo Blaxell zblax...@furryterror.org wrote:
 On Sun, Dec 21, 2014 at 05:25:47PM -0700, Chris Murphy wrote:
 For the kernel to automatically fix
 bad sectors by overwriting them, the drive needs to explicitly report
 read errors. If the SCSI command timer value is shorter than the
 drive's error recovery, the SATA link might get reset before the drive
 reports the read error and then uncorrected errors will persist
 instead of being automatically fixed.

 Is there a way to tell the kernel to go ahead and assume that all timeouts
 are effectively read errors?

The timer in /sys is a kernel command timer, it's not a device timer
even though it's pointed at a block device. You need to change that
from 30 to something higher to get the behavior you want. It doesn't
really make sense to say, timeout in 30 seconds, but instead of
reporting a timeout, report it as a read error. They're completely
different things.

There are all sorts of errors listed in libata so for all of them to
get dumped into a read error doesn't make sense. A lot of those errors
don't report back a sector, and the key part of the read error is what
sector(s) have the problem so that they can be fixed. Without that
information, the ability to fix it is lost. And it's the drive that
needs to report this.


 For a simple non-removable hard disk (i.e.
 not removable and not optical), that seems like a reasonable workaround
 for an assortment of firmware brokenness.

Oven doesn't work, so lets spray gasoline on it and light it and the
kitchen on fire so that we can cook this damn pizza! That's what I
just read. Sorry. It doesn't seem like a good idea to me to map all
errors as read errors.


 I just did a quick survey of random drives here and found less than 10%
 support smartctl -l scterc.  A lot of server drives (or at least the
 drives that shipped in servers) don't have it, but laptop drives do.
 Drives with firmware that has horrifying known bugs do also have this
 feature.  :-P

Any decent server SATA drive should support SCT ERC. The inexpensive
WDC Red drives for NAS's all have it and by default are a reasonable
70 deciseconds last time I checked.

It might be that you're using SAS drives? In that case they may have
something different than SCT ERC that serves the same purpose, but I
don't have any SAS drives here to check this. I'd expect any SAS drive
already has short error recoveries by default, but that expectation
might be flawed.

Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Uncorrectable errors on RAID-1?

2014-12-23 Thread Chris Murphy
The other thing to note, is that the scsi command timer timeout is a
maximum. So at 30 seconds if a command to the drive hasn't completed,
then consider the drive hung up and do a link reset. And whatever
error recovery is in the drive, is also a maximum. If the sector is
really immediately bad, the drive will produce a read error
immediately. The case where you get these long recoveries where the
drive keeps retrying beyond the 30 second scsi command timer value, is
when the drive firmware ECC thinks it can recover (or reconstruct) the
data instead of producing a read error.

A gotcha with changing the scsi command timer to a much larger value
is that it possibly gives the drive enough time to recover the data,
report it back to the kernel, and then everything goes on normally.
The slow sector doesn't get fixed. Even a scrub wouldn't fix that
unless the drive reported wrongly recovered data and Btrfs checksums
catch it.

So what you want to do with a drive that has, or is suspected of
having such slow sectors, is to balance it. Rewrite everything. That
should cause the drive firmware to map out those sectors if they
result in persistent write errors.

What ought to happen is the data from slow sectors, once recovered,
should get written to a reserve sector and the old sector removed from
use (remapping, i.e. the LBA is the same but the physical sector is
different) but every drive firmware handles this differently. I
definitely have had drives where this doesn't happen automatically.
Also, I've had drives that when ATA Secure Erased, did not test for
persistent write errors and therefore bad sectors weren't removed from
use, they'd remain persistently bad when doing smartctl -t long tests.
In those cases, using badblocks -w fixed the problem but of course
that's destructive.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt script fsck test case.

2014-12-23 Thread Dave Chinner

[ Sorry to take some time to get to this, it got caught by a spam
filter and I only just noticed. ]

On Tue, Dec 16, 2014 at 02:08:53PM +, Filipe David Manana wrote:
 On Tue, Dec 16, 2014 at 1:35 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 
   Original Message 
  Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
  script fsck test case.
  From: David Sterba dste...@suse.cz
  To: Filipe David Manana fdman...@gmail.com
  Date: 2014年12月16日 02:19
 
  On Mon, Dec 15, 2014 at 10:13:45AM +, Filipe David Manana wrote:
 
  So another thing I would like to see is doing a more comprehensive
  verification that the repair code worked as expected. Currently we
  only check that a readonly fsck, after running fsck --repair, returns
  0.
 
  For the improvements you've been doing, it's equally important to
  verify that --repair recovered the inodes, links, etc to the
  lost+found directory (or whatever is the directory's name).
 
  So perhaps adding a verify.sh script to the tarball for example?
 
  Or, forgot before, it might be better to do such verification/test in
  xfstests since we can create the fs and use the new btrfs-progs
  programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
  already and probably run by a lot more people (compared to the fsck
  tests of btrfs-progs).
 
  I'm thinking about the best way how to integrate that, but it seems that
  there will be always some level of code or infrastructure duplication
  (or other hassle).
 
  btrfs-corrupt-block is not installed by default (make install) and it's
  not a type of utility I'd consider for default installations. The tests
  would be skipped in absence of the utility, so there will be test
  environments where install xfstests, install btrfspprogs will not add
  the desired test coverage. Solvable by packaging the extra progs.
 
  Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
  btrfs-progs to be added).
 
  I don't know how much infrastructure code we'd have to either write or
  copy from fstests, but I think it would not be that much. Ideally we
  could write the tests within btrfs-progs and then submit them to fstests
  once they're considered reliable. If we keep the same syntax of the
  tests, provide stubs where applicable, the code duplication in test
  itself would be zero. We'd only have to write the stubs in btrfs-progs
  and probably extend fstests to provide helpers for preparing/unpacking
  the images.
 
  In my wildest idea, if we have a good enough btrfs debugger(maybe even
  stronger than debugfs), which can
  do almost everything from read key/item to corrupt given structure, then we
  can resolve them all.
  No binary image since corruption can be done by it and verify can also done
  by it.
  (OK, it's just a daydream)
 
  But IMHO, isn't xfstests designed to mainly detect kernel defeats?
  I don't see any fsck tool test case in it.
 
 I don't think xfstests is specific to test the kernel implementation
 of filesystems. I believe it includes user space code too, but I might
 be wrong so I'm CCing fstests and Dave to get an authoritative answer.

We use fstests to test everything we ship for XFS - kernel and
userspace. i.e. we have tests that corrupt filesystems with xfs_db
and then test that xfs_repair can fix them, and once fixed the
filesystem can be mounted and used by the kernel...

i.e. fstests is for testing both the kernel code and the utilities
used to manipulate filesystems.

 And I don't see a big problem with btrfs-corrupt not being built by
 default when running make on btrfs-progs. We can make the xfstest
 not run and print an informative message if the btrfs-corrupt program
 isn't available - several xfstests do this, they require some
 executable which isn't either in the xfstests nor xfsprogs
 repositories - for example generic/241 which requires 'dbench' and
 generic/299 which requires 'fio'.

_require_btrfs_corrupt_prog()

Just like we do with lots of other optional userspace tools that are
required for various tests to run.

 I also have a slight preference to get all
 tests in the same place (xfstests) rather than in multiple
 repositories (btrfs-progs, xfstests).

Definitely my preference as well.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How btrfs-find-root knows that the block is actually a root?

2014-12-23 Thread Qu Wenruo


 Original Message 
Subject: Re: How btrfs-find-root knows that the block is actually a root?
From: Alex Lyakas alex.bt...@zadarastorage.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年12月23日 18:08

Hi Qu,

On Tue, Dec 23, 2014 at 7:27 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 Original Message 
Subject: How btrfs-find-root knows that the block is actually a root?
From: Alex Lyakas alex.bt...@zadarastorage.com
To: linux-btrfs linux-btrfs@vger.kernel.org
Date: 2014年12月22日 22:57

Greetings,

I am looking at the code of search_iobuf() in
btrfs-find-root.c.(3.17.3)  I see that we probe nodesize blocks one by
one, and for each block we check:
- its owner is what we are looking for
- its header-bytenr is what we are looking at currently
- its level is not too small
- it has valid checksum
- it has the desired generation

If all those conditions are true, we declare this block as a root and
end the program.

How do we actually know that it's a root and not a leaf or an
intermediate node? What if we are searching for a root of the root
tree, which has one node and two leafs (all have the same highest
transid), and one of the leafs has logical lower than the actual
root, i.e., it comes first in our scan. Then we will declare this leaf
as a root, won't we? Or somehow the root always has the lowest
logical?

You can refer to this patch:
https://patchwork.kernel.org/patch/5285521/

I see that this has not been applied to any of David's branches. Do
you have a repo to look at this code in its entirety?

Saddly, no personal repo...



Your questions are mostly right.
The best method should be search through all the metadata, and only the
highest level header for
a given generation may be the root for that generation.

But that method still has some problems.
1) Overwritten old node/leaf
As btrfs metadata cow happens, old node/leaf may be overwritten and become
incompletely,
so above method won't always work as expected.

2) Corrupted fs
That will makes everything not work as expected.
But sadly, when someone needs to use btrfs-find-root, there is a high
possibility the fs is already corrupted.

3) Slow speed
It needs to scan over all the sectors of metadata chunks, it may var from
megabytese to tegabytes,
which makes the complete scan impractical.
So current find-root uses a trade-off, if find a header at the position
superblock points to, and generation
matches, then just consider it as the desired root and exit.

I think this is a bit optimistic. What if the root tree has several
leaves having the same generation as the root? Then we might declare a
leaf as a root and exit. But further recovery based on that output
will get us into trouble.
That's right, but if btrfs-find-root takes several hours each time to do 
full scan, most user will be crazy...

Especially when it is executed on a clean btrfs.




Also, I am confused by this line:
level = h_level;
This means that if we encounter a block that seems good, we will
skip all other blocks that have lower level. Is this intended?

This is intended, for case user already know the root's level, so it will
skip any header whose level is below it.

But this line is performed before the generation check. Let's say that
user did not specify any level (so search_level==0). Then assume we
encounter a block, which has lower generation than what we need, but
higher level. At this point, we do
level = h_level;
and we will skip any blocks lower than this level from now on. What if
the root tree got shirnked (due to subvolume deletion, for example),
and the good root has lower level? We will skip it then, and will
not find the root.

Thanks for your comments,
Alex.
Oh, I misunderstand your meaning, you're completely right, the original 
codes will skip any level

lower than current found.
That's deadly for root with higher generation but with lower level, as 
you mentioned, item delete.


So the best method would be only skip leaf/node whose level is lower 
than current found highest

level *among the same generation*.
Oh, that's what I do in my patch. :-)

Thanks,
Qu




Thanks,
Qu


Thanks,
Alex.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Debian/Jessie 3.16.7-ckt2-1 kernel error

2014-12-23 Thread Russell Coker
I've attached the kernel message log that I get after booting kernel 3.16.7 
from Debian/Unstable.  This is the kernel branch that will go into 
Debian/Jessie so it's important to get it fixed.

Below has the start of the errors, the attached file has everything from boot.  
I've got similar issues on another system, would it help if I collect the logs 
from multiple systems?

[6.618809] Btrfs loaded
[6.670878] BTRFS: device fsid a2d7cbbc-23be-4d97-b2bb-de99b0c58c7d devid 1 
t
ransid 548614 /dev/mapper/root-crypt
[6.706907] BTRFS info (device dm-0): disk space caching is enabled
[6.798762] BTRFS: detected SSD devices, enabling SSD mode
[6.881272] [ cut here ]
[6.881358] WARNING: CPU: 3 PID: 198 at /build/linux-CMiYW9/linux-3.16.7-
ckt2
/fs/btrfs/delayed-inode.c:1410 btrfs_commit_transaction+0x38a/0x9c0 [btrfs]()

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/


kern-err.gz
Description: GNU Zip compressed data


Re: [PATCH] btrfs-progs: rebuild missing block group during chunk recovery if possible

2014-12-23 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH] btrfs-progs: rebuild missing block group during 
chunk recovery if possible

From: Alex Lyakas alex.bt...@zadarastorage.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年12月24日 00:49

Hi Qu,

On Thu, Oct 30, 2014 at 4:54 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

[snipped]
+
+static int __insert_block_group(struct btrfs_trans_handle *trans,
+   struct chunk_record *chunk_rec,
+   struct btrfs_root *extent_root,
+   u64 used)
+{
+   struct btrfs_block_group_item bg_item;
+   struct btrfs_key key;
+   int ret = 0;
+
+   btrfs_set_block_group_used(bg_item, used);
+   btrfs_set_block_group_chunk_objectid(bg_item, used);

This looks like a bug. Instead of used, I think it should be
BTRFS_FIRST_CHUNK_TREE_OBJECTID.

Oh, my mistake, BTRFS_FIRST_CHUNK_TREE_OBJECTID is right.
Thanks for pointing out this.



[snipped]
--
2.1.2

Couple of questions:
# In remove_chunk_extent_item, should we also consider rebuild
chunks now? It can happen that a rebuild chunks is a SYSTEM chunk.
Should we try to handle it as well?

Not quite sure about the meaning of rebuild here.
The chunk-recovery has the rebuild_chunk_tree() function to rebuild the 
whole chunk tree with

the good/repaired chunks we found.

# Same question for rebuild_sys_array. Should we also consider
rebuild chunks?

The chunk-recovery has rebuild_sys_array() to handle SYSTEM chunk too.

Thanks,
Qu


Thanks,
Alex.




--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Fix btrfs fi show by uuid and label

2014-12-23 Thread Gui Hecheng
On Tue, 2014-12-23 at 11:34 -0800, Justin Maggard wrote:
 Commit 8be2fff (btrfs-progs: apply realpath for btrfs fi
 show when mount point is given) changed the behavior of
 btrfs fi show to return an error if the call to realpath()
 failed.  This broke the ability to specify a filesystem by
 uuid or label.

Oh, that's my fault, thanks for correcting this.

 So let's not consider a failed call to realpath() as an
 error.  If the user really specified a bad device, just
 return nothing like we did before.
 ---
  cmds-filesystem.c |9 ++---
  1 file changed, 2 insertions(+), 7 deletions(-)
 
 diff --git a/cmds-filesystem.c b/cmds-filesystem.c
 index 8f037dd..a654e6f 100644
 --- a/cmds-filesystem.c
 +++ b/cmds-filesystem.c
 @@ -901,13 +901,8 @@ static int cmd_show(int argc, char **argv)
* realpath do  /mnt/btrfs/  = /mnt/btrfs
* which shall be recognized by btrfs_scan_kernel()
*/
 - if (!realpath(search, path)) {
 - fprintf(stderr, ERROR: Could not show %s: %s\n,
 - search, strerror(errno));
 - return 1;
 - }
 -
 - search = path;
 + if (realpath(search, path))
 + search = path;

This looks nice and stay consistent with the behavior before my faulty
commit.

Reviewed-by: Gui Hecheng guihc.f...@cn.fujitsu.com

Thanks,
Gui

   /*
* Needs special handling if input arg is block dev And if


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.

2014-12-23 Thread Qu Wenruo
When btrfs allocate a chunk, it will try to alloc up to 1G for data and
256M for metadata, or 10% of all the writeable space if there is enough
space for the stripe on device.

However, when we run out of space, this allocation may cause unbalanced
chunk allocation.
For example, there are only 1G unallocated space, and request for
allocate DATA chunk is sent, and all the space will be allocated as data
chunk, making later metadata chunk alloc request unable to handle, which
will cause ENOSPC.
This is the one of the common complains from end users about why ENOSPC
happens but there is still available space.

This patch will try not to alloc chunk which is more than half of the
unallocated space, making the last space more balanced at a small cost
of more fragmented chunk at the last 1G.

Some easy example:
Preallocate 17.5G on a 20G empty btrfs fs:
[Before]
 # btrfs fi show /mnt/test
Label: none  uuid: da8741b1-5d47-4245-9e94-bfccea34e91e
Total devices 1 FS bytes used 17.50GiB
devid1 size 20.00GiB used 20.00GiB path /dev/sdb
All space is allocated. No space for later metadata allocation.

[After]
 # btrfs fi show /mnt/test
Label: none  uuid: e6935aeb-a232-4140-84f9-80aab1f23d56
Total devices 1 FS bytes used 17.50GiB
devid1 size 20.00GiB used 19.77GiB path /dev/sdb
About 230M is still available for later metadata allocation.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
Changelog:
v2:
   Remove false dead zone judgement since it won't happen
---
 fs/btrfs/volumes.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8e74b34..20b3eea 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4237,6 +4237,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
u64 max_stripe_size;
u64 max_logical_size;   /* Up limit on chunk's logical size */
u64 max_physical_size;  /* Up limit on a chunk's on-disk size */
+   u64 total_physical_avail = 0;
u64 stripe_size;
u64 num_bytes;
u64 raid_stripe_len = BTRFS_STRIPE_LEN;
@@ -4349,6 +4350,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
devices_info[ndevs].max_avail = max_avail;
devices_info[ndevs].total_avail = total_avail;
devices_info[ndevs].dev = device;
+   total_physical_avail += total_avail;
++ndevs;
}
 
@@ -4398,6 +4400,23 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
do_div(stripe_size, num_stripes);
need_bump = 1;
}
+
+   /*
+* Don't alloc chunk whose physical size is larger than half
+* of the rest physical space.
+* This will reduce the possibility of ENOSPC when comes to
+* last unallocated space
+*
+* For the last 16~32M (e.g. 20M), it will first alloc 16M
+* (bumped to 16M) and the next time will be the rest size
+* (bumped to 16M and reduced to 4M).
+* So no dead zone.
+*/
+   if (stripe_size * num_stripes  total_physical_avail / 2) {
+   stripe_size = total_physical_avail / 2;
+   need_bump = 1;
+
+   }
/* restrict logical chunk size  */
if (stripe_size * data_stripes  max_logical_size) {
stripe_size = max_logical_size;
-- 
2.2.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs: Make the chunk size limit on on-disk/logical more clean.

2014-12-23 Thread Qu Wenruo
Original __btrfs_alloc_chunk() use max_chunk_size to limit chunk size,
however it mixed the on-disk space with logical space.
When comes to 10% of writable space, max_chunk_size is used with on-disk
size, but it is also used as logical space size limit, so it is very
confusing and causing inconsistence in different profile.

For example:
on M single, D single 5G btrfs single device,
data chunk is limited to 512M due to 10% limit.

on M RAID1, D RAID1 10Gx2 btrfs 2 devices,
data chunk is limited to 2G due to 10% limit is mixed with on-disk
space, causing the logical chunk space to 1G, twice than single device.

This patch will make the logical and on-disk space limit independent and
clear and solve the above inconsistence.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
changelog:
v2:
   Newly introduced.
---
 fs/btrfs/volumes.c | 40 ++--
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0144790..8e74b34 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4235,10 +4235,12 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
int ncopies;/* how many copies to data has */
int ret;
u64 max_stripe_size;
-   u64 max_chunk_size;
+   u64 max_logical_size;   /* Up limit on chunk's logical size */
+   u64 max_physical_size;  /* Up limit on a chunk's on-disk size */
u64 stripe_size;
u64 num_bytes;
u64 raid_stripe_len = BTRFS_STRIPE_LEN;
+   int need_bump = 0;
int ndevs;
int i;
int j;
@@ -4260,7 +4262,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
 
if (type  BTRFS_BLOCK_GROUP_DATA) {
max_stripe_size = 1024 * 1024 * 1024;
-   max_chunk_size = 10 * max_stripe_size;
+   max_logical_size = 10 * max_stripe_size;
if (!devs_max)
devs_max = BTRFS_MAX_DEVS(info-chunk_root);
} else if (type  BTRFS_BLOCK_GROUP_METADATA) {
@@ -4269,12 +4271,12 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
max_stripe_size = 1024 * 1024 * 1024;
else
max_stripe_size = 256 * 1024 * 1024;
-   max_chunk_size = max_stripe_size;
+   max_logical_size = max_stripe_size;
if (!devs_max)
devs_max = BTRFS_MAX_DEVS(info-chunk_root);
} else if (type  BTRFS_BLOCK_GROUP_SYSTEM) {
max_stripe_size = 32 * 1024 * 1024;
-   max_chunk_size = 2 * max_stripe_size;
+   max_logical_size = 2 * max_stripe_size;
if (!devs_max)
devs_max = BTRFS_MAX_DEVS_SYS_CHUNK;
} else {
@@ -4284,8 +4286,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
}
 
/* we don't want a chunk larger than 10% of writeable space */
-   max_chunk_size = min(div_factor(fs_devices-total_rw_bytes, 1),
-max_chunk_size);
+   max_physical_size = div_factor(fs_devices-total_rw_bytes, 1);
 
devices_info = kzalloc(sizeof(*devices_info) * fs_devices-rw_devices,
   GFP_NOFS);
@@ -4391,15 +4392,21 @@ static int __btrfs_alloc_chunk(struct 
btrfs_trans_handle *trans,
data_stripes = num_stripes - 2;
}
 
-   /*
-* Use the number of data stripes to figure out how big this chunk
-* is really going to be in terms of logical address space,
-* and compare that answer with the max chunk size
-*/
-   if (stripe_size * data_stripes  max_chunk_size) {
-   u64 mask = (1ULL  24) - 1;
-   stripe_size = max_chunk_size;
+   /* Restrict on-disk chunk size */
+   if (stripe_size * num_stripes  max_physical_size) {
+   stripe_size = max_physical_size;
+   do_div(stripe_size, num_stripes);
+   need_bump = 1;
+   }
+   /* restrict logical chunk size  */
+   if (stripe_size * data_stripes  max_logical_size) {
+   stripe_size = max_logical_size;
do_div(stripe_size, data_stripes);
+   need_bump = 1;
+   }
+
+   if (need_bump) {
+   u64 mask = (1ULL  24) - 1;
 
/* bump the answer up to a 16MB boundary */
stripe_size = (stripe_size + mask)  ~mask;
@@ -4411,6 +4418,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle 
*trans,
stripe_size = devices_info[ndevs-1].max_avail;
}
 
+   /*
+* Special handle for DUP, since stripe_size is the largest free extent
+* we found, DUP can only use half of it. Other profile's dev_stripes
+* is always 1.
+*/
do_div(stripe_size, dev_stripes);
 
/* align to BTRFS_STRIPE_LEN */
-- 
2.2.1

--
To 

Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt script fsck test case.

2014-12-23 Thread Qu Wenruo


 Original Message 
Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + 
corrupt script fsck test case.

From: Dave Chinner da...@fromorbit.com
To: Filipe David Manana fdman...@gmail.com
Date: 2014年12月24日 08:03

[ Sorry to take some time to get to this, it got caught by a spam
filter and I only just noticed. ]

On Tue, Dec 16, 2014 at 02:08:53PM +, Filipe David Manana wrote:

On Tue, Dec 16, 2014 at 1:35 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 Original Message 
Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
script fsck test case.
From: David Sterba dste...@suse.cz
To: Filipe David Manana fdman...@gmail.com
Date: 2014年12月16日 02:19

On Mon, Dec 15, 2014 at 10:13:45AM +, Filipe David Manana wrote:

So another thing I would like to see is doing a more comprehensive
verification that the repair code worked as expected. Currently we
only check that a readonly fsck, after running fsck --repair, returns
0.

For the improvements you've been doing, it's equally important to
verify that --repair recovered the inodes, links, etc to the
lost+found directory (or whatever is the directory's name).

So perhaps adding a verify.sh script to the tarball for example?

Or, forgot before, it might be better to do such verification/test in
xfstests since we can create the fs and use the new btrfs-progs
programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
already and probably run by a lot more people (compared to the fsck
tests of btrfs-progs).

I'm thinking about the best way how to integrate that, but it seems that
there will be always some level of code or infrastructure duplication
(or other hassle).

btrfs-corrupt-block is not installed by default (make install) and it's
not a type of utility I'd consider for default installations. The tests
would be skipped in absence of the utility, so there will be test
environments where install xfstests, install btrfspprogs will not add
the desired test coverage. Solvable by packaging the extra progs.

Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
btrfs-progs to be added).

I don't know how much infrastructure code we'd have to either write or
copy from fstests, but I think it would not be that much. Ideally we
could write the tests within btrfs-progs and then submit them to fstests
once they're considered reliable. If we keep the same syntax of the
tests, provide stubs where applicable, the code duplication in test
itself would be zero. We'd only have to write the stubs in btrfs-progs
and probably extend fstests to provide helpers for preparing/unpacking
the images.

In my wildest idea, if we have a good enough btrfs debugger(maybe even
stronger than debugfs), which can
do almost everything from read key/item to corrupt given structure, then we
can resolve them all.
No binary image since corruption can be done by it and verify can also done
by it.
(OK, it's just a daydream)

But IMHO, isn't xfstests designed to mainly detect kernel defeats?
I don't see any fsck tool test case in it.

I don't think xfstests is specific to test the kernel implementation
of filesystems. I believe it includes user space code too, but I might
be wrong so I'm CCing fstests and Dave to get an authoritative answer.

We use fstests to test everything we ship for XFS - kernel and
userspace. i.e. we have tests that corrupt filesystems with xfs_db
and then test that xfs_repair can fix them, and once fixed the
filesystem can be mounted and used by the kernel...

i.e. fstests is for testing both the kernel code and the utilities
used to manipulate filesystems.

That's great.

But what will happen if some btrfs cases need binary(still huge even 
compressed) or

btrfs-image dump(some existing dumps are already several MB)?
Will it be OK for fstests?

Or should we wait until btrfs-progs has a good debug tools like xfs_db 
or debugfs and use

them to generate the corrupted image like xfs testcases do?

Thanks,
Qu




And I don't see a big problem with btrfs-corrupt not being built by
default when running make on btrfs-progs. We can make the xfstest
not run and print an informative message if the btrfs-corrupt program
isn't available - several xfstests do this, they require some
executable which isn't either in the xfstests nor xfsprogs
repositories - for example generic/241 which requires 'dbench' and
generic/299 which requires 'fio'.

_require_btrfs_corrupt_prog()

Just like we do with lots of other optional userspace tools that are
required for various tests to run.


I also have a slight preference to get all
tests in the same place (xfstests) rather than in multiple
repositories (btrfs-progs, xfstests).

Definitely my preference as well.

Cheers,

Dave.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][BTRFS-PROGS][CLEANUP] Remove gotos

2014-12-23 Thread Satoru Takeuchi
Hi Goffredo,

On 2014/12/23 4:07, Goffredo Baroncelli wrote:
 On 12/22/2014 12:34 PM, Satoru Takeuchi wrote:
 On 2014/12/22 3:07, Goffredo Baroncelli wrote:
 Change a spagetti-style code (there are some interlaced gotos) to
 a more modern style...

 This patch removes also some #define from utils.h, which define
 constants used only in cmds-filesystems.c . Instead an enum
 is used locally in cmds-filesystems.c .

 I'm happy if you have a regression test for this patch
 since such kind of change often cause regression.
 
 I am impressed, this is the first time that I received this kind
 of request on btrfs ml

It's because I'd like to contribute to Btrfs not only
by submitting patches, but also by reviewing/testing
other guy's patches to improve its quality better.

 
 In which form you want this regression test ? I see a
 directory tests/ under btrfs-progs; but also I saw
 reference to xfstest...

Any format is OK, for example a simple shell script.

Thanks,
Satoru

 
 
 

 Thanks,
 Satoru

 ---
cmds-filesystem.c | 106 
 --
utils.h   |   3 --
2 files changed, 55 insertions(+), 54 deletions(-)

 diff --git a/cmds-filesystem.c b/cmds-filesystem.c
 index 7eaccb9..08ddb5d 100644
 --- a/cmds-filesystem.c
 +++ b/cmds-filesystem.c
 @@ -40,6 +40,11 @@
#include list_sort.h
#include disk-io.h

 +enum filesystem_show_scan_method {
 +   BTRFS_SCAN_ANY,
 +   BTRFS_SCAN_MOUNTED,
 +   BTRFS_SCAN_LBLKID
 +};

/*
 * for btrfs fi show, we maintain a hash of fsids we've already printed.
 @@ -853,7 +858,7 @@ static int cmd_show(int argc, char **argv)
 char *search = NULL;
 int ret;
 /* default, search both kernel and udev */
 -   int where = -1;
 +   int where = BTRFS_SCAN_ANY;
 int type = 0;
 char mp[BTRFS_PATH_NAME_MAX + 1];
 char path[PATH_MAX];
 @@ -930,61 +935,60 @@ static int cmd_show(int argc, char **argv)
 uuid_unparse(fsid, uuid_buf);
 search = uuid_buf;
 type = BTRFS_ARG_UUID;
 -   goto devs_only;
 +   where = BTRFS_SCAN_LBLKID;
 }
 }
 }

 -   if (where == BTRFS_SCAN_LBLKID)
 -   goto devs_only;
 -
 -   /* show mounted btrfs */
 -   ret = btrfs_scan_kernel(search);
 -   if (search  !ret) {
 -   /* since search is found we are done */
 -   goto out;
 -   }
 -
 -   /* shows mounted only */
 -   if (where == BTRFS_SCAN_MOUNTED)
 -   goto out;
 -
 -devs_only:
 -   ret = btrfs_scan_lblkid();
 -
 -   if (ret) {
 -   fprintf(stderr, ERROR: %d while scanning\n, ret);
 -   return 1;
 -   }
 -
 -   found = search_umounted_fs_uuids(all_uuids, search);
 -   if (found  0) {
 -   fprintf(stderr,
 -   ERROR: %d while searching target device\n, ret);
 -   return 1;
 -   }
 -
 -   /*
 -* The seed/sprout mapping are not detected yet,
 -* do mapping build for all umounted fs
 -*/
 -   ret = map_seed_devices(all_uuids);
 -   if (ret) {
 -   fprintf(stderr,
 -   ERROR: %d while mapping seed devices\n, ret);
 -   return 1;
 +   if (where == BTRFS_SCAN_MOUNTED || where == BTRFS_SCAN_ANY) {
 +   
 +   /* show mounted btrfs */
 +   ret = btrfs_scan_kernel(search);
 +   if (search  !ret) {
 +   /* since search is found we are done */
 +   goto out;
 +   }
 +   
 }
 -
 -   list_for_each_entry(fs_devices, all_uuids, list)
 -   print_one_uuid(fs_devices);
 -
 -   if (search  !found)
 -   ret = 1;
 -
 -   while (!list_empty(all_uuids)) {
 -   fs_devices = list_entry(all_uuids.next,
 -   struct btrfs_fs_devices, list);
 -   free_fs_devices(fs_devices);
 +   
 +   if (where == BTRFS_SCAN_LBLKID || where == BTRFS_SCAN_ANY) {
 +   
 +   ret = btrfs_scan_lblkid();
 +   
 +   if (ret) {
 +   fprintf(stderr, ERROR: %d while scanning\n, ret);
 +   return 1;
 +   }
 +   
 +   found = search_umounted_fs_uuids(all_uuids, search);
 +   if (found  0) {
 +   fprintf(stderr,
 +   ERROR: %d while searching target device\n, 
 ret);
 +   return 1;
 +   }
 +   
 +   /*
 +* The seed/sprout mapping are not detected yet,
 +* do mapping build for all umounted fs
 +*/
 +   ret = map_seed_devices(all_uuids);
 +   if (ret) {
 +   fprintf(stderr,
 +   ERROR: %d while mapping seed devices\n, ret);
 +   return 1;
 +   }
 +   
 +   list_for_each_entry(fs_devices, all_uuids, list)
 +   print_one_uuid(fs_devices);
 +   
 +   if (search  

Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt script fsck test case.

2014-12-23 Thread Dave Chinner
On Wed, Dec 24, 2014 at 10:56:04AM +0800, Qu Wenruo wrote:
 
  Original Message 
 Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image +
 corrupt script fsck test case.
 From: Dave Chinner da...@fromorbit.com
 To: Filipe David Manana fdman...@gmail.com
 Date: 2014年12月24日 08:03
 [ Sorry to take some time to get to this, it got caught by a spam
 filter and I only just noticed. ]
 
 On Tue, Dec 16, 2014 at 02:08:53PM +, Filipe David Manana wrote:
 On Tue, Dec 16, 2014 at 1:35 AM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
  Original Message 
 Subject: Re: [PATCH 1/2] btrfs-progs: Add support for btrfs-image + corrupt
 script fsck test case.
 From: David Sterba dste...@suse.cz
 To: Filipe David Manana fdman...@gmail.com
 Date: 2014年12月16日 02:19
 On Mon, Dec 15, 2014 at 10:13:45AM +, Filipe David Manana wrote:
 So another thing I would like to see is doing a more comprehensive
 verification that the repair code worked as expected. Currently we
 only check that a readonly fsck, after running fsck --repair, returns
 0.
 
 For the improvements you've been doing, it's equally important to
 verify that --repair recovered the inodes, links, etc to the
 lost+found directory (or whatever is the directory's name).
 
 So perhaps adding a verify.sh script to the tarball for example?
 Or, forgot before, it might be better to do such verification/test in
 xfstests since we can create the fs and use the new btrfs-progs
 programs to corrupt leafs/nodes. xfstests has a lot of infrastructure
 already and probably run by a lot more people (compared to the fsck
 tests of btrfs-progs).
 I'm thinking about the best way how to integrate that, but it seems that
 there will be always some level of code or infrastructure duplication
 (or other hassle).
 
 btrfs-corrupt-block is not installed by default (make install) and it's
 not a type of utility I'd consider for default installations. The tests
 would be skipped in absence of the utility, so there will be test
 environments where install xfstests, install btrfspprogs will not add
 the desired test coverage. Solvable by packaging the extra progs.
 
 Adding corrupt-block into xfsprogs is infeasible (IMO too much code from
 btrfs-progs to be added).
 
 I don't know how much infrastructure code we'd have to either write or
 copy from fstests, but I think it would not be that much. Ideally we
 could write the tests within btrfs-progs and then submit them to fstests
 once they're considered reliable. If we keep the same syntax of the
 tests, provide stubs where applicable, the code duplication in test
 itself would be zero. We'd only have to write the stubs in btrfs-progs
 and probably extend fstests to provide helpers for preparing/unpacking
 the images.
 In my wildest idea, if we have a good enough btrfs debugger(maybe even
 stronger than debugfs), which can
 do almost everything from read key/item to corrupt given structure, then we
 can resolve them all.
 No binary image since corruption can be done by it and verify can also done
 by it.
 (OK, it's just a daydream)
 
 But IMHO, isn't xfstests designed to mainly detect kernel defeats?
 I don't see any fsck tool test case in it.
 I don't think xfstests is specific to test the kernel implementation
 of filesystems. I believe it includes user space code too, but I might
 be wrong so I'm CCing fstests and Dave to get an authoritative answer.
 We use fstests to test everything we ship for XFS - kernel and
 userspace. i.e. we have tests that corrupt filesystems with xfs_db
 and then test that xfs_repair can fix them, and once fixed the
 filesystem can be mounted and used by the kernel...
 
 i.e. fstests is for testing both the kernel code and the utilities
 used to manipulate filesystems.
 That's great.
 
 But what will happen if some btrfs cases need binary(still huge even
 compressed) or
 btrfs-image dump(some existing dumps are already several MB)?
 Will it be OK for fstests?

No. Random filesystem images don't make a regression test. They are
just different encoding of the game of whack-a-mole. Corruption
recovery tests work best with simple, obvious corruptions. This
makes it easy to make sure correct detection and recovery behaviour
works without any other complications.

e.g. overwriting the primary superblock with zeros is a simple test
that every filesystem recovery tool should be able to handle.

 Or should we wait until btrfs-progs has a good debug tools like
 xfs_db or debugfs and use
 them to generate the corrupted image like xfs testcases do?

You don't need a tool like xfs_db to do repeatable structure
corruption - you can do this with dd by writing to known offsets.

xfs_db just makes that really easy by having a filesystem format
parser that means finding the offset we want to write to is
trivial

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to 

Re: Quota limit question

2014-12-23 Thread Duncan
Christian Robottom Reis posted on Tue, 23 Dec 2014 18:36:02 -0200 as
excerpted:

 On Tue, Dec 16, 2014 at 11:15:37PM -0200, Christian Robottom Reis wrote:
 # btrfs qgroup limit 2000m 0/261 .  touch x touch: cannot touch
 ‘x’: Disk quota exceeded
 
 The strange thing is that it doesn't seem to be actually out of space:
 
 # btrfs qgroup show -p -r -e /var | grep 261
 0/261810048  391114752   2097152000  0  ---
 
 Replying to myself as I had not yet been subscribed in time to receive a
 reply; I just upgraded to 3.18.1 and am seeing the same issue on the
 same subvolume (and on no others).

Looking at the thread here on gmane.org (list2news and list2web gateway), 
it appears my reply was the only reply in any case, and it was general as 
I don't run quotas myself.

Basically I suggested upgrading, as the quota code as some rather huge 
bugs in it (quotas could go seriously negative!) with the old versions 
you were running.  But you've upgraded at least the kernel now (userspace 
you didn't say).

Here's a link to the thread on the gmane web interface for completeness, 
but the above about covers my reply, as I said the only one until your 
thread bump and my reply here, so there's not much new there unless 
someone posts further followups to this thread...


http://comments.gmane.org/gmane.comp.file-systems.btrfs/41491


-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Remove unnecessary placeholder in btrfs_err_code

2014-12-23 Thread Satoru Takeuchi

Hi Gui,

On 2014/12/23 10:59, Gui Hecheng wrote:

On Mon, 2014-12-22 at 19:39 +0900, Satoru Takeuchi wrote:

Sorry, I forgot to add Signed-off-by line.

---
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

notused is not necessary. Set 1 to the first entry is enough.


Hi Satoru,

Actually, the struct is copied from the kernel header
include/uapi/linux/btrfs.h.
I think they should stay the same, so either change the kernel header
in another path, or just keep them in old style.


OK, I'll send also kernel patches soon.

Thanks,
Satoru



Thanks,
Gui


Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

---
  ioctl.h | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/ioctl.h b/ioctl.h
index 2c2c7c1..c377b96 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -474,8 +474,7 @@ struct btrfs_ioctl_qgroup_create_args {

  /* Error codes as returned by the kernel */
  enum btrfs_err_code {
-   notused,
-   BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
+   BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET = 1,
BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET,
BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
-- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe 
linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo 
info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: Remove unnecessary placeholder in btrfs_err_code

2014-12-23 Thread Satoru Takeuchi
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

notused is not necessary. Set 1 to the first entry is enough.

Signed-off-by: Takeuchi Satoru takeuchi_sat...@jp.fujitsu.com
Cc: Gui Hecheng guihc.f...@cn.fujitsu.com
---
I once submit the similar patch to btrfs-progs.
Then Gui Hecheng tell me fixing original code in kernel
is better.

http://permalink.gmane.org/gmane.comp.file-systems.btrfs/41679
---
 include/uapi/linux/btrfs.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 611e1c5..b6dec05 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -495,8 +495,7 @@ struct btrfs_ioctl_send_args {
 
 /* Error codes as returned by the kernel */
 enum btrfs_err_code {
-   notused,
-   BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
+   BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET = 1,
BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET,
BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: call inode_dec_link_count() on mkdir error path

2014-12-23 Thread Wang Shilong
From: Wang Shilong wangshilong1...@gmail.com

In btrfs_mkdir(), if it fails to create dir, we should
clean up existed items, setting inode's link properly
to make sure it could be cleaned up properly.

Signed-off-by: Wang Shilong wangshilong1...@gmail.com
---
 fs/btrfs/inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8de2335..8a036ed 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6255,8 +6255,10 @@ static int btrfs_mkdir(struct inode *dir, struct dentry 
*dentry, umode_t mode)
 
 out_fail:
btrfs_end_transaction(trans, root);
-   if (drop_on_err)
+   if (drop_on_err) {
+   inode_dec_link_count(inode);
iput(inode);
+   }
btrfs_balance_delayed_items(root);
btrfs_btree_balance_dirty(root);
return err;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: doc: fix incorrect format of 'l' option in mkfs.btrfs

2014-12-23 Thread Tsutomu Itoh
The format of 'l' option in mkfs.btrfs.txt is wrong.
And, when the head of the character string is 65536, the following warning
is displayed.

$ make
Making all in Documentation
[ASCII]  mkfs.btrfs.xml
asciidoc: WARNING: mkfs.btrfs.xml.tmp1: line 67: list item index: expected 
1 got 65536
[XMLTO]  mkfs.btrfs.8
[GZ] mkfs.btrfs.8.gz
rm mkfs.btrfs.8 mkfs.btrfs.xml

So, fix it.

Signed-off-by: Tsutomu Itoh t-i...@jp.fujitsu.com
---
 Documentation/mkfs.btrfs.txt | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/Documentation/mkfs.btrfs.txt b/Documentation/mkfs.btrfs.txt
index ba7e42b..01b615d 100644
--- a/Documentation/mkfs.btrfs.txt
+++ b/Documentation/mkfs.btrfs.txt
@@ -59,19 +59,18 @@ By default, mkfs.btrfs will not write to the device if it 
suspects that
 there is a filesystem or partition table on the device already.
 
 -n|--nodesize size
-+
+
 -l|--leafsize size::
 Specify the nodesize, the tree block size in which btrfs stores
 data. The default value is 16KB (16384) or the page size, whichever is
-bigger. Must be a multiple of the sectorsize, but not larger than
-65536. Leafsize always equals nodesize and the options are aliases.
+bigger. Must be a multiple of the sectorsize, but not larger than 65536.
+Leafsize always equals nodesize and the options are aliases.
 
 -L|--label name::
 Specify a label for the filesystem.
 +
 NOTE: name should be less than 256 characters.
 
-
 -m|--metadata profile::
 Specify how metadata must be spanned across the devices specified. Valid
 values are 'raid0', 'raid1', 'raid5', 'raid6', 'raid10', 'single' or 'dup'.
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html