Re: [PATCH 0/4] 3- and 4- copy RAID1

2018-07-16 Thread Goffredo Baroncelli
On 07/15/2018 04:37 PM, waxhead wrote:
> David Sterba wrote:
>> An interesting question is the naming of the extended profiles. I picked
>> something that can be easily understood but it's not a final proposal.
>> Years ago, Hugo proposed a naming scheme that described the
>> non-standard raid varieties of the btrfs flavor:
>>
>> https://marc.info/?l=linux-btrfs=136286324417767
>>
>> Switching to this naming would be a good addition to the extended raid.
>>
> As just a humble BTRFS user I agree and really think it is about time to move 
> far away from the RAID terminology. However adding some more descriptive 
> profile names (or at least some aliases) would be much better for the 
> commoners (such as myself).
> 
> For example:
> 
> Old format / New Format / My suggested alias
> SINGLE  / 1C / SINGLE
> DUP / 2CD    / DUP (or even MIRRORLOCAL1)
> RAID0   / 1CmS   / STRIPE


> RAID1   / 2C / MIRROR1
> RAID1c3 / 3C / MIRROR2
> RAID1c4 / 4C / MIRROR3
> RAID10  / 2CmS   / STRIPE.MIRROR1

Striping and mirroring/pairing are orthogonal properties; mirror and parity are 
mutually exclusive. What about

RAID1 -> MIRROR1
RAID10 -> MIRROR1S
RAID1c3 -> MIRROR2
RAID1c3+striping -> MIRROR2S

and so on...

> RAID5   / 1CmS1P / STRIPE.PARITY1
> RAID6   / 1CmS2P / STRIPE.PARITY2

To me these should be called something like

RAID5 -> PARITY1S
RAID6 -> PARITY2S

The S final is due to the fact that usually RAID5/6 spread the data on all 
available disks

Question #1: for "parity" profiles, does make sense to limit the maximum disks 
number where the data may be spread ? If the answer is not, we could omit the 
last S. IMHO it should. 
Question #2: historically RAID10 is requires 4 disks. However I am guessing if 
the stripe could be done on a different number of disks: What about 
RAID1+Striping on 3 (or 5 disks) ? The key of striping is that every 64k, the 
data are stored on a different disk







> 
> I find that writing something like "btrfs balance start 
> -dconvert=stripe5.parity2 /mnt" is far less confusing and therefore less 
> error prone than writing "-dconvert=1C5S2P".
> 
> While Hugo's suggestion is compact and to the point I would call for 
> expanding that so it is a bit more descriptive and human readable.
> 
> So for example : STRIPE where  obviously is the same as Hugo 
> proposed - the number of storage devices for the stripe and no  would be 
> best to mean 'use max devices'.
> For PARITY then  is obviously required
> 
> Keep in mind that most people (...and I am willing to bet even Duncan which 
> probably HAS backups ;) ) get a bit stressed when their storage system is 
> degraded. With that in mind I hope for more elaborate, descriptive and human 
> readable profile names to be used to avoid making mistakes using the 
> "compact" layout.
> 
> ...and yes, of course this could go both ways. A more compact (and dare I say 
> cryptic) variant can cause people to stop and think before doing something 
> and thus avoid errors,
> 
> Now that I made my point I can't help being a bit extra hash, obnoxious and 
> possibly difficult so I would also suggest that Hugo's format could have been 
> changed (dare I say improved?) from
> 
> numCOPIESnumSTRIPESnumPARITY
> 
> to.
> 
> REPLICASnum.STRIPESnum.PARITYnum
> 
> Which would make the above table look like so:
> 
> Old format / My Format / My suggested alias
> SINGLE  / R0.S0.P0 / SINGLE
> DUP / R1.S1.P0 / DUP (or even MIRRORLOCAL1)
> RAID0   / R0.Sm.P0 / STRIPE
> RAID1   / R1.S0.P0 / MIRROR1
> RAID1c3 / R2.S0.P0 / MIRROR2
> RAID1c4 / R3.S0.P0 / MIRROR3
> RAID10  / R1.Sm.P0 / STRIPE.MIRROR1
> RAID5   / R1.Sm.P1 / STRIPE.PARITY1
> RAID6   / R1.Sm.P2 / STRIPE.PARITY2
> 
> And i think this is much more readable, but others may disagree. And as a 
> side note... from a (hobby) coders perspective this is probably simpler to 
> parse as well.
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: check: orig: Don't panic out when unexpected root item is referring to one extent

2018-07-16 Thread David Sterba
On Thu, Jul 05, 2018 at 03:45:56PM +0800, Qu Wenruo wrote:
> With crafted image, expected root item can refer to certain extent, and
> original mode uses BUG_ON() to handle such case.
> 
> Fix it by gracefully return error.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200403
> Signed-off-by: Qu Wenruo 

Please send cover lettter for patchsets with more than one patch.
Patches 1-3 applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 2/2] btrfs-progs: make all programs and libraries optional

2018-07-16 Thread Omar Sandoval
On Mon, Jul 16, 2018 at 04:56:57PM +0200, David Sterba wrote:
> On Thu, Jul 12, 2018 at 04:11:19PM -0700, Omar Sandoval wrote:
> > From: Omar Sandoval 
> > 
> > We have a build system internally which only needs to build the
> > libraries out of a repository, not any binaries. I looked at how this
> > works with other projects, and the best example was util-linux, which
> > makes it possible to enable or disable everything individually. This is
> > nice and really flexible, so let's do the same. This way, if you only
> > want to build and install libbtrfsutil, you can simply do
> > 
> >   ./configure --disable-documentation --disable-all-programs 
> > --enable-libbtrfsutil
> >   make
> >   make install
> 
> I think this is an overkill and abusing the --enable-XXX options.  You
> want to avoid building the tools by default, so adding an option for
> that is fine. Selectively building only certain tools can utilize that
> option too and just follow with 'make btrfs-image' etc.

Yeah, it's easy to build stuff selectively, but `make install` will
still try to build everything, that's the part I'm more concerned with.

> The number of --enable-* will stay minimal and we don't even have to
> discuss how to find a good naming scheme (that works for util-linux but
> looks a bit confusing for btrfs-progs).

Ok, I can collapse these into just --disable-programs/--enable-programs,
and --disable-libraries/--enable-libraries? That would be enough for me.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: tests/fuzz: Use correct suffix for raw image

2018-07-16 Thread David Sterba
On Thu, Jul 05, 2018 at 01:41:24PM +0800, Qu Wenruo wrote:
> * Please fold this patch into original commit in devel branch *
> 
> It's raw image, so it should be ".raw.xz" not ".img.xz"

Ah right, thanks, commit updated.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: lowmem: fix false alerts of referencer count mismatch for blocks relocated

2018-07-16 Thread David Sterba
On Tue, Jul 03, 2018 at 03:58:50PM +0800, Su Yue wrote:
> Btrfs lowmem check reports such false alerts:
...

1 and 2 applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] btrfs-progs: check: enhanced progress indicator

2018-07-16 Thread David Sterba
On Wed, Jul 11, 2018 at 01:18:55PM +0800, Qu Wenruo wrote:
> On 2018年07月05日 03:20, Stéphane Lesimple wrote:
> > We reuse the task_position enum and task_ctx struct of the original progress
> > indicator, adding more values and fields for our needs.
> > 
> > Then add hooks in all steps of the check to properly record progress.
> > 
> > Signed-off-by: Stéphane Lesimple 
> 
> Looks pretty good.
> 
> Just some small nitpicks related to code style.

The coding style rules need to be more relaxed for one-time or
infrequent contributors so we don't scare them away.

I go through such patches line by line and fix where needed but it's ok
to point them out so I don't miss something.

> > @@ -173,28 +161,48 @@ static int compare_extent_backref(struct rb_node 
> > *node1, struct rb_node *node2)
> > return compare_tree_backref(node1, node2);
> >  }
> >  
> > +static void print_status_check_line(void *p)
> > +{
> > +   struct task_ctx *priv = p;
> > +   char *task_position_string[] = {
> > +   "[1/7] checking root items ",
> > +   "[2/7] checking extents",
> > +   is_free_space_tree ?
> > +   "[3/7] checking free space tree":
> 
> The extra intent makes it a little hard to align the output.
> 
> What about using something like below to format the string?
> "[%d/%d] %-20s"
> 
> And this makes it more flex since lowmem and original mode has different
> progress number (lowmem doesn't need to check root refs separately).

I've kept it as is only did minor updates so it looks visually more
compact. If you have further ideas how to clean up the code, feel free
to send patches.

> 
> > +   "[3/7] checking free space cache   ",
> > +   "[4/7] checking fs roots   ",
> > +   check_data_csum ?
> > +   "[5/7] checking csums against data ":
> > +   "[5/7] checking csums (without verifying data) ",
> > +   "[6/7] checking root refs  ",
> > +   "[7/7] checking quota groups   ",
> > +   };
> > +
> > +   time_t elapsed = time(NULL) - priv->start_time;
> > +   int hours   = elapsed / 3600;
> > +   elapsed-= hours   * 3600;
> 
> It's not common in btrfs-progs to mix declaration and code.

Fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Healthy amount of free space?

2018-07-16 Thread Wolf
Greetings,
I would like to ask what what is healthy amount of free space to keep on
each device for btrfs to be happy?

This is how my disk array currently looks like

[root@dennas ~]# btrfs fi usage /raid
Overall:
Device size:  29.11TiB
Device allocated: 21.26TiB
Device unallocated:7.85TiB
Device missing:  0.00B
Used: 21.18TiB
Free (estimated):  3.96TiB  (min: 3.96TiB)
Data ratio:   2.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID1: Size:10.61TiB, Used:10.58TiB
   /dev/mapper/data1   1.75TiB
   /dev/mapper/data2   1.75TiB
   /dev/mapper/data3 856.00GiB
   /dev/mapper/data4 856.00GiB
   /dev/mapper/data5   1.75TiB
   /dev/mapper/data6   1.75TiB
   /dev/mapper/data7   6.29TiB
   /dev/mapper/data8   6.29TiB

Metadata,RAID1: Size:15.00GiB, Used:13.00GiB
   /dev/mapper/data1   2.00GiB
   /dev/mapper/data2   3.00GiB
   /dev/mapper/data3   1.00GiB
   /dev/mapper/data4   1.00GiB
   /dev/mapper/data5   3.00GiB
   /dev/mapper/data6   1.00GiB
   /dev/mapper/data7   9.00GiB
   /dev/mapper/data8  10.00GiB

System,RAID1: Size:64.00MiB, Used:1.50MiB
   /dev/mapper/data2  32.00MiB
   /dev/mapper/data6  32.00MiB
   /dev/mapper/data7  32.00MiB
   /dev/mapper/data8  32.00MiB

Unallocated:
   /dev/mapper/data11004.52GiB
   /dev/mapper/data21004.49GiB
   /dev/mapper/data31006.01GiB
   /dev/mapper/data41006.01GiB
   /dev/mapper/data51004.52GiB
   /dev/mapper/data61004.49GiB
   /dev/mapper/data71005.00GiB
   /dev/mapper/data81005.00GiB

Btrfs does quite good job of evenly using space on all devices. No, how
low can I let that go? In other words, with how much space
free/unallocated remaining space should I consider adding new disk?

Thanks for advice :)

W.

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: [PATCH 0/4] 3- and 4- copy RAID1

2018-07-16 Thread waxhead

waxhead wrote:

David Sterba wrote:

An interesting question is the naming of the extended profiles. I picked
something that can be easily understood but it's not a final proposal.
Years ago, Hugo proposed a naming scheme that described the
non-standard raid varieties of the btrfs flavor:

https://marc.info/?l=linux-btrfs=136286324417767

Switching to this naming would be a good addition to the extended raid.

As just a humble BTRFS user I agree and really think it is about time to 
move far away from the RAID terminology. However adding some more 
descriptive profile names (or at least some aliases) would be much 
better for the commoners (such as myself). 
...snip... > Which would make the above table look like so:


Old format / My Format / My suggested alias
SINGLE  / R0.S0.P0 / SINGLE
DUP / R1.S1.P0 / DUP (or even MIRRORLOCAL1)
RAID0   / R0.Sm.P0 / STRIPE
RAID1   / R1.S0.P0 / MIRROR1
RAID1c3 / R2.S0.P0 / MIRROR2
RAID1c4 / R3.S0.P0 / MIRROR3
RAID10  / R1.Sm.P0 / STRIPE.MIRROR1
RAID5   / R1.Sm.P1 / STRIPE.PARITY1
RAID6   / R1.Sm.P2 / STRIPE.PARITY2

And i think this is much more readable, but others may disagree. And as 
a side note... from a (hobby) coders perspective this is probably 
simpler to parse as well. 
...snap...


...and before someone else points this out that my suggestion has an 
ugly flaw , I got a bit copy / paste happy and messed up the RAID 5 and 
6 like profiles. The below table are corrected and hopefully it make the 
point why using the word 'replicas' is easier to understand than 
'copies' even if I messed it up :)


Old format / My Format / My suggested alias
SINGLE  / R0.S0.P0 / SINGLE
DUP / R1.S1.P0 / DUP (or even MIRRORLOCAL1)
RAID0   / R0.Sm.P0 / STRIPE
RAID1   / R1.S0.P0 / MIRROR1
RAID1c3 / R2.S0.P0 / MIRROR2
RAID1c4 / R3.S0.P0 / MIRROR3
RAID10  / R1.Sm.P0 / STRIPE.MIRROR1
RAID5   / R0.Sm.P1 / STRIPE.PARITY1
RAID6   / R0.Sm.P2 / STRIPE.PARITY2
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: inspect-internal: add option '-k u64,u8,u64' of dump-tree

2018-07-16 Thread Qu Wenruo



On 2018年07月16日 13:55, Su Yue wrote:
> For big filesystems, there are many items in trees(extent tree
> specially).
> For example, to dump one extent, we usually dump extent tree then pipe
> result to grep. The time-consuming part is that dump tree traverses
> items. And it eats cpu and memory too.
> 
> This patch introduces an option '-k u64,u8,u64' for users(developer more
> likely) to appoint a specific key to search and dump, then the path
> from root node down the leaf will be dumped.
> The search of the key costs most time of this way.

Indeed a pretty useful debug oriented feature.

> 
> Signed-off-by: Su Yue 
> ---
> In my experiment, not usual case though.
> Image was provided by Chris Murphy, thanks.
> 
> #btrfs filesystem df /mnt
> Data, single: total=767.00GiB, used=766.58GiB
> System, DUP: total=32.00MiB, used=112.00KiB
> Metadata, DUP: total=5.50GiB, used=4.69GiB
> Metadata, single: total=16.00MiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> Before:
> #time bash -c 'btrfs inspect-internal dump-tree -t 2 ~/test/test.img | grep 
> "1295194308608" >> /dev/null'
> real0m34.024s
> user0m3.269s
> sys 0m4.047s
> 
> Patched and use -k:
> #time bash -c './btrfs inspect-internal dump-tree -t 2  -k 1295194308608,0,0 
> ~/test/test.img >> /dev/null'
> 
> real0m1.408s
> user0m0.006s
> sys 0m0.014s
> 
> The new way is 34 times faster than 'grep' way, uses less memory and
> cpu, and it prints path useful for debug.
> 
> ---
>  cmds-inspect-dump-tree.c |  54 +--
>  print-tree.c | 112 +++
>  print-tree.h |   2 +
>  3 files changed, 163 insertions(+), 5 deletions(-)
> 
> diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
> index c8acd55a0c3a..2a1a40c8270d 100644
> --- a/cmds-inspect-dump-tree.c
> +++ b/cmds-inspect-dump-tree.c
> @@ -184,6 +184,21 @@ static u64 treeid_from_string(const char *str, const 
> char **end)
>   return id;
>  }
>  
> +static int parse_key(struct btrfs_key *key)
> +{
> +
> + int ret = sscanf(optarg, "%llu,%hhu,%llu", >objectid,
> +  >type, >offset);
> + if (ret != 3) {
> + error("error parsing key '%s'\n", optarg);
> + ret = -EINVAL;
> + } else {
> + ret = 0;
> + }
> +
> + return ret;
> +}
> +
>  const char * const cmd_inspect_dump_tree_usage[] = {
>   "btrfs inspect-internal dump-tree [options] device",
>   "Dump tree structures from a given device",
> @@ -199,6 +214,7 @@ const char * const cmd_inspect_dump_tree_usage[] = {
>   "-u|--uuid  print only the uuid tree",
>   "-b|--block  print info from the specified block only",
>   "-t|--tree print only tree with the given id (string or 
> number)",
> + "-k|--key   search the specific key then print path, must 
> with -t",

Shouldn't it conflict with -b and -r?

>   "--follow   use with -b, to show all children tree blocks 
> of ",
>   NULL
>  };
> @@ -224,8 +240,10 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   unsigned open_ctree_flags;
>   u64 block_only = 0;
>   struct btrfs_root *tree_root_scan;
> + struct btrfs_key spec_key = { 0 };
>   u64 tree_id = 0;
>   bool follow = false;
> + bool is_key_spec = false;

What about @key_set? @is_key_spec looks a little strange here.

>  
>   /*
>* For debug-tree, we care nothing about extent tree (it's just backref
> @@ -248,10 +266,11 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   { "block", required_argument, NULL, 'b'},
>   { "tree", required_argument, NULL, 't'},
>   { "follow", no_argument, NULL, GETOPT_VAL_FOLLOW },
> + {"key", required_argument, NULL, 'k'},

Small alignment mismatch, missing a space before "key".

>   { NULL, 0, NULL, 0 }
>   };
>  
> - c = getopt_long(argc, argv, "deb:rRut:", long_options, NULL);
> + c = getopt_long(argc, argv, "deb:rRut:k:", long_options, NULL);
>   if (c < 0)
>   break;
>   switch (c) {
> @@ -300,6 +319,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   }
>   break;
>   }
> + case 'k':
> + ret = parse_key(_key);
> + if (ret)
> + exit(1);
> + is_key_spec = true;
> + break;
>   case GETOPT_VAL_FOLLOW:
>   follow = true;
>   break;
> @@ -308,6 +333,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   }
>   }
>  
> + if (!tree_id && is_key_spec)
> + usage(cmd_inspect_dump_tree_usage);
> +
>   if (check_argc_exact(argc - optind, 1))
>   

Re: [PATCH] btrfs-progs: inspect-internal: add option '-k u64,u8,u64' of dump-tree

2018-07-16 Thread Su Yue




On 07/16/2018 03:45 PM, Qu Wenruo wrote:



On 2018年07月16日 13:55, Su Yue wrote:

For big filesystems, there are many items in trees(extent tree
specially).
For example, to dump one extent, we usually dump extent tree then pipe
result to grep. The time-consuming part is that dump tree traverses
items. And it eats cpu and memory too.

This patch introduces an option '-k u64,u8,u64' for users(developer more
likely) to appoint a specific key to search and dump, then the path
from root node down the leaf will be dumped.
The search of the key costs most time of this way.


Indeed a pretty useful debug oriented feature.



Signed-off-by: Su Yue 
---
In my experiment, not usual case though.
Image was provided by Chris Murphy, thanks.

#btrfs filesystem df /mnt
Data, single: total=767.00GiB, used=766.58GiB
System, DUP: total=32.00MiB, used=112.00KiB
Metadata, DUP: total=5.50GiB, used=4.69GiB
Metadata, single: total=16.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

Before:
#time bash -c 'btrfs inspect-internal dump-tree -t 2 ~/test/test.img | grep 
"1295194308608" >> /dev/null'
real0m34.024s
user0m3.269s
sys 0m4.047s

Patched and use -k:
#time bash -c './btrfs inspect-internal dump-tree -t 2  -k 1295194308608,0,0 
~/test/test.img >> /dev/null'

real0m1.408s
user0m0.006s
sys 0m0.014s

The new way is 34 times faster than 'grep' way, uses less memory and
cpu, and it prints path useful for debug.

---
  cmds-inspect-dump-tree.c |  54 +--
  print-tree.c | 112 +++
  print-tree.h |   2 +
  3 files changed, 163 insertions(+), 5 deletions(-)

diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index c8acd55a0c3a..2a1a40c8270d 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -184,6 +184,21 @@ static u64 treeid_from_string(const char *str, const char 
**end)
return id;
  }
  
+static int parse_key(struct btrfs_key *key)

+{
+
+   int ret = sscanf(optarg, "%llu,%hhu,%llu", >objectid,
+>type, >offset);
+   if (ret != 3) {
+   error("error parsing key '%s'\n", optarg);
+   ret = -EINVAL;
+   } else {
+   ret = 0;
+   }
+
+   return ret;
+}
+
  const char * const cmd_inspect_dump_tree_usage[] = {
"btrfs inspect-internal dump-tree [options] device",
"Dump tree structures from a given device",
@@ -199,6 +214,7 @@ const char * const cmd_inspect_dump_tree_usage[] = {
"-u|--uuid  print only the uuid tree",
"-b|--block  print info from the specified block only",
"-t|--tree print only tree with the given id (string or 
number)",
+   "-k|--key   search the specific key then print path, must with 
-t",


Shouldn't it conflict with -b and -r?


"--follow   use with -b, to show all children tree blocks of 
",
NULL
  };
@@ -224,8 +240,10 @@ int cmd_inspect_dump_tree(int argc, char **argv)
unsigned open_ctree_flags;
u64 block_only = 0;
struct btrfs_root *tree_root_scan;
+   struct btrfs_key spec_key = { 0 };
u64 tree_id = 0;
bool follow = false;
+   bool is_key_spec = false;


What about @key_set? @is_key_spec looks a little strange here.


Ok, will call it is_spec_key_set in next version.

  
  	/*

 * For debug-tree, we care nothing about extent tree (it's just backref
@@ -248,10 +266,11 @@ int cmd_inspect_dump_tree(int argc, char **argv)
{ "block", required_argument, NULL, 'b'},
{ "tree", required_argument, NULL, 't'},
{ "follow", no_argument, NULL, GETOPT_VAL_FOLLOW },
+   {"key", required_argument, NULL, 'k'},


Small alignment mismatch, missing a space before "key".


{ NULL, 0, NULL, 0 }
};
  
-		c = getopt_long(argc, argv, "deb:rRut:", long_options, NULL);

+   c = getopt_long(argc, argv, "deb:rRut:k:", long_options, NULL);
if (c < 0)
break;
switch (c) {
@@ -300,6 +319,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
}
break;
}
+   case 'k':
+   ret = parse_key(_key);
+   if (ret)
+   exit(1);
+   is_key_spec = true;
+   break;
case GETOPT_VAL_FOLLOW:
follow = true;
break;
@@ -308,6 +333,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
}
}
  
+	if (!tree_id && is_key_spec)

+   usage(cmd_inspect_dump_tree_usage);
+
if (check_argc_exact(argc - optind, 1))
usage(cmd_inspect_dump_tree_usage);
  
@@ -398,7 +426,11 @@ again:

  

Re: Corrupted FS with "open_ctree failed" and "failed to recover balance: -5"

2018-07-16 Thread Qu Wenruo


On 2018年07月16日 16:15, Udo Waechter wrote:
> Hello,
> 
> noone any ideas? Do you need more information?
> 
> Cheers,
> udo.
> 
> On 11/07/18 17:37, Udo Waechter wrote:
>> Hello everyone,
>>
>> I have a corrupted filesystem which I can't seem to recover.
>>
>> The machine is:
>> Debian Linux, kernel 4.9 and btrfs-progs v4.13.3
>>
>> I have a HDD RAID5 with LVM and the volume in question is a LVM volume.
>> On top of that I had a RAID1 SSD cache with lvm-cache.
>>
>> Yesterday both! SSDs died within minutes. This lead to the corruped
>> filesystem that I have now.
>>
>> I hope I followed the procedure correctly.
>>
>> What I tried so far:
>> * "mount -o usebackuproot,ro " and "nospace_cache" "clear_cache" and all
>> permutations of these mount options
>>
>> I'm getting:
>>
>> [96926.830400] BTRFS info (device dm-2): trying to use backup root at
>> mount time
>> [96926.830406] BTRFS info (device dm-2): disk space caching is enabled
>> [96926.927978] BTRFS error (device dm-2): parent transid verify failed
>> on 321269628928 wanted 3276017 found 3275985
>> [96926.938619] BTRFS error (device dm-2): parent transid verify failed
>> on 321269628928 wanted 3276017 found 3275985
>> [96926.940705] BTRFS error (device dm-2): failed to recover balance: -5

This means your fs failed to recover the balance.

And it should mostly be caused by transid error just one line above.
Normally this means your fs is more or less corrupted, could be caused
by powerloss or something else.

>> [96926.985801] BTRFS error (device dm-2): open_ctree failed
>>
>> The weird thing is that I can't really find information about the
>> "failed to recover balance: -5" error. - There was no rebalancing
>> running when during the crash.

Can only be determined by tree dump.

# btrfs ins dump-tree -t root 

>>
>> * btrfs-find-root: https://pastebin.com/qkjnSUF7 - It bothers me that I
>> don't see any "good generations" as described here:
>> https://btrfs.wiki.kernel.org/index.php/Restore
>>
>> * "btrfs rescue" - it starts, then goes to "looping on XYZ" then stops
>>
>> * "btrfs rescue super-recover -v" gives:
>>
>> All Devices:
>>  Device: id = 1, name = /dev/vg00/...
>> Before Recovering:
>>  [All good supers]:
>>  device name = /dev/vg00/...
>>  superblock bytenr = 65536
>>
>>  device name = /dev/vg00/...
>>  superblock bytenr = 67108864
>>
>>  device name = /dev/vg00/...
>>  superblock bytenr = 274877906944
>>
>>  [All bad supers]:
>>
>> All supers are valid, no need to recover
>>
>>
>> * Unfortunatly I did a "btrfs rescue zero-log" at some point :( - As it
>> turns out that might have been a bad idea
>>
>>
>> * Also, a "btrfs  check --init-extent-tree" - https://pastebin.com/jATDCFZy

Then it is making things worse, fortunately it should terminate before
it causes more damage.

I'm just curious why people doesn't try the safest "btrfs check" without
any options, but goes the most dangerous option.

And "btrfs check" output please.
If possible, "btrfs check --mode=lowmem" is also good for debug.

Thanks,
Qu

>>
>> The volume contained qcow2 images for VMs. I need only one of those,
>> since one piece of important software decided to not do backups :(
>>
>> Any help is highly appreciated.
>>
>> Many thanks,
>> udo.
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] btrfs-progs: inspect-internal: add option '-k u64,u8,u64' of dump-tree

2018-07-16 Thread Qu Wenruo



On 2018年07月16日 16:14, Su Yue wrote:
> 
> 
> On 07/16/2018 03:45 PM, Qu Wenruo wrote:
>>
>>
>> On 2018年07月16日 13:55, Su Yue wrote:
>>> For big filesystems, there are many items in trees(extent tree
>>> specially).
>>> For example, to dump one extent, we usually dump extent tree then pipe
>>> result to grep. The time-consuming part is that dump tree traverses
>>> items. And it eats cpu and memory too.
>>>
>>> This patch introduces an option '-k u64,u8,u64' for users(developer more
>>> likely) to appoint a specific key to search and dump, then the path
>>> from root node down the leaf will be dumped.
>>> The search of the key costs most time of this way.
>>
>> Indeed a pretty useful debug oriented feature.
>>
>>>
>>> Signed-off-by: Su Yue 
>>> ---
>>> In my experiment, not usual case though.
>>> Image was provided by Chris Murphy, thanks.
>>>
>>> #btrfs filesystem df /mnt
>>> Data, single: total=767.00GiB, used=766.58GiB
>>> System, DUP: total=32.00MiB, used=112.00KiB
>>> Metadata, DUP: total=5.50GiB, used=4.69GiB
>>> Metadata, single: total=16.00MiB, used=0.00B
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
>>>
>>> Before:
>>> #time bash -c 'btrfs inspect-internal dump-tree -t 2 ~/test/test.img
>>> | grep "1295194308608" >> /dev/null'
>>> real    0m34.024s
>>> user    0m3.269s
>>> sys 0m4.047s
>>>
>>> Patched and use -k:
>>> #time bash -c './btrfs inspect-internal dump-tree -t 2  -k
>>> 1295194308608,0,0 ~/test/test.img >> /dev/null'
>>>
>>> real    0m1.408s
>>> user    0m0.006s
>>> sys 0m0.014s
>>>
>>> The new way is 34 times faster than 'grep' way, uses less memory and
>>> cpu, and it prints path useful for debug.
>>>
>>> ---
>>>   cmds-inspect-dump-tree.c |  54 +--
>>>   print-tree.c | 112 +++
>>>   print-tree.h |   2 +
>>>   3 files changed, 163 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
>>> index c8acd55a0c3a..2a1a40c8270d 100644
>>> --- a/cmds-inspect-dump-tree.c
>>> +++ b/cmds-inspect-dump-tree.c
>>> @@ -184,6 +184,21 @@ static u64 treeid_from_string(const char *str,
>>> const char **end)
>>>   return id;
>>>   }
>>>   +static int parse_key(struct btrfs_key *key)
>>> +{
>>> +
>>> +    int ret = sscanf(optarg, "%llu,%hhu,%llu", >objectid,
>>> + >type, >offset);
>>> +    if (ret != 3) {
>>> +    error("error parsing key '%s'\n", optarg);
>>> +    ret = -EINVAL;
>>> +    } else {
>>> +    ret = 0;
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>>   const char * const cmd_inspect_dump_tree_usage[] = {
>>>   "btrfs inspect-internal dump-tree [options] device",
>>>   "Dump tree structures from a given device",
>>> @@ -199,6 +214,7 @@ const char * const cmd_inspect_dump_tree_usage[] = {
>>>   "-u|--uuid  print only the uuid tree",
>>>   "-b|--block  print info from the specified block only",
>>>   "-t|--tree     print only tree with the given id
>>> (string or number)",
>>> +    "-k|--key   search the specific key then print path,
>>> must with -t",
>>
>> Shouldn't it conflict with -b and -r?
>>
>>>   "--follow   use with -b, to show all children tree
>>> blocks of ",
>>>   NULL
>>>   };
>>> @@ -224,8 +240,10 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>>>   unsigned open_ctree_flags;
>>>   u64 block_only = 0;
>>>   struct btrfs_root *tree_root_scan;
>>> +    struct btrfs_key spec_key = { 0 };
>>>   u64 tree_id = 0;
>>>   bool follow = false;
>>> +    bool is_key_spec = false;
>>
>> What about @key_set? @is_key_spec looks a little strange here.
>>
> Ok, will call it is_spec_key_set in next version.
> 
>>>     /*
>>>    * For debug-tree, we care nothing about extent tree (it's just
>>> backref
>>> @@ -248,10 +266,11 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>>>   { "block", required_argument, NULL, 'b'},
>>>   { "tree", required_argument, NULL, 't'},
>>>   { "follow", no_argument, NULL, GETOPT_VAL_FOLLOW },
>>> +    {"key", required_argument, NULL, 'k'},
>>
>> Small alignment mismatch, missing a space before "key".
>>
>>>   { NULL, 0, NULL, 0 }
>>>   };
>>>   -    c = getopt_long(argc, argv, "deb:rRut:", long_options, NULL);
>>> +    c = getopt_long(argc, argv, "deb:rRut:k:", long_options, NULL);
>>>   if (c < 0)
>>>   break;
>>>   switch (c) {
>>> @@ -300,6 +319,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>>>   }
>>>   break;
>>>   }
>>> +    case 'k':
>>> +    ret = parse_key(_key);
>>> +    if (ret)
>>> +    exit(1);
>>> +    is_key_spec = true;
>>> +    break;
>>>   case GETOPT_VAL_FOLLOW:
>>>   follow = true;
>>>   break;
>>> @@ -308,6 +333,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>>>    

Re: Corrupted FS with "open_ctree failed" and "failed to recover balance: -5"

2018-07-16 Thread Udo Waechter
Hello,

noone any ideas? Do you need more information?

Cheers,
udo.

On 11/07/18 17:37, Udo Waechter wrote:
> Hello everyone,
> 
> I have a corrupted filesystem which I can't seem to recover.
> 
> The machine is:
> Debian Linux, kernel 4.9 and btrfs-progs v4.13.3
> 
> I have a HDD RAID5 with LVM and the volume in question is a LVM volume.
> On top of that I had a RAID1 SSD cache with lvm-cache.
> 
> Yesterday both! SSDs died within minutes. This lead to the corruped
> filesystem that I have now.
> 
> I hope I followed the procedure correctly.
> 
> What I tried so far:
> * "mount -o usebackuproot,ro " and "nospace_cache" "clear_cache" and all
> permutations of these mount options
> 
> I'm getting:
> 
> [96926.830400] BTRFS info (device dm-2): trying to use backup root at
> mount time
> [96926.830406] BTRFS info (device dm-2): disk space caching is enabled
> [96926.927978] BTRFS error (device dm-2): parent transid verify failed
> on 321269628928 wanted 3276017 found 3275985
> [96926.938619] BTRFS error (device dm-2): parent transid verify failed
> on 321269628928 wanted 3276017 found 3275985
> [96926.940705] BTRFS error (device dm-2): failed to recover balance: -5
> [96926.985801] BTRFS error (device dm-2): open_ctree failed
> 
> The weird thing is that I can't really find information about the
> "failed to recover balance: -5" error. - There was no rebalancing
> running when during the crash.
> 
> * btrfs-find-root: https://pastebin.com/qkjnSUF7 - It bothers me that I
> don't see any "good generations" as described here:
> https://btrfs.wiki.kernel.org/index.php/Restore
> 
> * "btrfs rescue" - it starts, then goes to "looping on XYZ" then stops
> 
> * "btrfs rescue super-recover -v" gives:
> 
> All Devices:
>   Device: id = 1, name = /dev/vg00/...
> Before Recovering:
>   [All good supers]:
>   device name = /dev/vg00/...
>   superblock bytenr = 65536
> 
>   device name = /dev/vg00/...
>   superblock bytenr = 67108864
> 
>   device name = /dev/vg00/...
>   superblock bytenr = 274877906944
> 
>   [All bad supers]:
> 
> All supers are valid, no need to recover
> 
> 
> * Unfortunatly I did a "btrfs rescue zero-log" at some point :( - As it
> turns out that might have been a bad idea
> 
> 
> * Also, a "btrfs  check --init-extent-tree" - https://pastebin.com/jATDCFZy
> 
> The volume contained qcow2 images for VMs. I need only one of those,
> since one piece of important software decided to not do backups :(
> 
> Any help is highly appreciated.
> 
> Many thanks,
> udo.
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] btrfs-progs: inspect-internal: add option '-k u64,u8,u64' of dump-tree

2018-07-16 Thread Qu Wenruo



On 2018年07月16日 17:13, Su Yue wrote:
> For big filesystems, there are many items in trees(extent tree
> specially).
> For example, to dump one extent, we usually dump extent tree then pipe
> result to grep. The time-consuming part is that dump tree traverses
> items. And it eats cpu and memory too.
> 
> This patch introduces an option '-k u64,u8,u64' for users(developers more
> likely) to appoint a specific key to search and dump, then the path
> from root node down the leaf will be dumped.
> The search of the key costs most time of this way.
> 
> Signed-off-by: Su Yue 

Looks pretty good, just small nitpicks.

Basically you could add my reviewed-by tag:

Reviewed-by: Qu Wenruo 


> ---
> Change log:
> v2:
>   Rename btrfs_search_spec_key() to btrfs_search_print_path().
>   Rename is_key_spec to is_spec_key_set.
>   Move btrfs_search_print_path() after open_ctree(), call it only once.
>   Remove duplicate code and replaced with btrfs_print_tree().
>   Modify the usage about '-k'.
>   All are suggested by Qu Wenruo.
>   
>  cmds-inspect-dump-tree.c | 41 -
>  print-tree.c | 77 
>  print-tree.h |  2 ++
>  3 files changed, 119 insertions(+), 1 deletion(-)
> 
> diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
> index c8acd55a0c3a..80df826af0fd 100644
> --- a/cmds-inspect-dump-tree.c
> +++ b/cmds-inspect-dump-tree.c
> @@ -184,6 +184,21 @@ static u64 treeid_from_string(const char *str, const 
> char **end)
>   return id;
>  }
>  
> +static int parse_key(struct btrfs_key *key)
> +{
> +
> + int ret = sscanf(optarg, "%llu,%hhu,%llu", >objectid,
> +  >type, >offset);
> + if (ret != 3) {
> + error("error parsing key '%s'\n", optarg);
> + ret = -EINVAL;
> + } else {
> + ret = 0;
> + }
> +
> + return ret;
> +}
> +
>  const char * const cmd_inspect_dump_tree_usage[] = {
>   "btrfs inspect-internal dump-tree [options] device",
>   "Dump tree structures from a given device",
> @@ -199,6 +214,7 @@ const char * const cmd_inspect_dump_tree_usage[] = {
>   "-u|--uuid  print only the uuid tree",
>   "-b|--block  print info from the specified block only",
>   "-t|--tree print only tree with the given id (string or 
> number)",
> + "-k|--key   search the specific key then print path, must 
> with -t(conflicts with others)",

What are the "others"?
At least for me, other includes -r and -b.

>   "--follow   use with -b, to show all children tree blocks 
> of ",
>   NULL
>  };
> @@ -224,8 +240,10 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   unsigned open_ctree_flags;
>   u64 block_only = 0;
>   struct btrfs_root *tree_root_scan;
> + struct btrfs_key spec_key = { 0 };
>   u64 tree_id = 0;
>   bool follow = false;
> + bool is_spec_key_set = false;
>  
>   /*
>* For debug-tree, we care nothing about extent tree (it's just backref
> @@ -248,10 +266,11 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   { "block", required_argument, NULL, 'b'},
>   { "tree", required_argument, NULL, 't'},
>   { "follow", no_argument, NULL, GETOPT_VAL_FOLLOW },
> + { "key", required_argument, NULL, 'k'},
>   { NULL, 0, NULL, 0 }
>   };
>  
> - c = getopt_long(argc, argv, "deb:rRut:", long_options, NULL);
> + c = getopt_long(argc, argv, "deb:rRut:k:", long_options, NULL);
>   if (c < 0)
>   break;
>   switch (c) {
> @@ -300,6 +319,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   }
>   break;
>   }
> + case 'k':
> + ret = parse_key(_key);
> + if (ret)
> + exit(1);
> + is_spec_key_set = true;
> + break;
>   case GETOPT_VAL_FOLLOW:
>   follow = true;
>   break;
> @@ -308,6 +333,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   }
>   }
>  
> + if (!tree_id && is_spec_key_set)
> + usage(cmd_inspect_dump_tree_usage);
> +

And options conflict should also be done here.

Thanks,
Qu

>   if (check_argc_exact(argc - optind, 1))
>   usage(cmd_inspect_dump_tree_usage);
>  
> @@ -325,6 +353,17 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   goto out;
>   }
>  
> + if (is_spec_key_set) {
> + root = info->tree_root;
> + if (IS_ERR(root) || !root) {
> + ret = root ? PTR_ERR(root) : -1;
> + error("unable to open %s", argv[optind]);
> + goto out;
> + }
> +
> + 

[PATCH v2] btrfs-progs: inspect-internal: add option '-k u64,u8,u64' of dump-tree

2018-07-16 Thread Su Yue
For big filesystems, there are many items in trees(extent tree
specially).
For example, to dump one extent, we usually dump extent tree then pipe
result to grep. The time-consuming part is that dump tree traverses
items. And it eats cpu and memory too.

This patch introduces an option '-k u64,u8,u64' for users(developers more
likely) to appoint a specific key to search and dump, then the path
from root node down the leaf will be dumped.
The search of the key costs most time of this way.

Signed-off-by: Su Yue 
---
Change log:
v2:
  Rename btrfs_search_spec_key() to btrfs_search_print_path().
  Rename is_key_spec to is_spec_key_set.
  Move btrfs_search_print_path() after open_ctree(), call it only once.
  Remove duplicate code and replaced with btrfs_print_tree().
  Modify the usage about '-k'.
  All are suggested by Qu Wenruo.
  
 cmds-inspect-dump-tree.c | 41 -
 print-tree.c | 77 
 print-tree.h |  2 ++
 3 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
index c8acd55a0c3a..80df826af0fd 100644
--- a/cmds-inspect-dump-tree.c
+++ b/cmds-inspect-dump-tree.c
@@ -184,6 +184,21 @@ static u64 treeid_from_string(const char *str, const char 
**end)
return id;
 }
 
+static int parse_key(struct btrfs_key *key)
+{
+
+   int ret = sscanf(optarg, "%llu,%hhu,%llu", >objectid,
+>type, >offset);
+   if (ret != 3) {
+   error("error parsing key '%s'\n", optarg);
+   ret = -EINVAL;
+   } else {
+   ret = 0;
+   }
+
+   return ret;
+}
+
 const char * const cmd_inspect_dump_tree_usage[] = {
"btrfs inspect-internal dump-tree [options] device",
"Dump tree structures from a given device",
@@ -199,6 +214,7 @@ const char * const cmd_inspect_dump_tree_usage[] = {
"-u|--uuid  print only the uuid tree",
"-b|--block  print info from the specified block only",
"-t|--tree print only tree with the given id (string or 
number)",
+   "-k|--key   search the specific key then print path, must 
with -t(conflicts with others)",
"--follow   use with -b, to show all children tree blocks 
of ",
NULL
 };
@@ -224,8 +240,10 @@ int cmd_inspect_dump_tree(int argc, char **argv)
unsigned open_ctree_flags;
u64 block_only = 0;
struct btrfs_root *tree_root_scan;
+   struct btrfs_key spec_key = { 0 };
u64 tree_id = 0;
bool follow = false;
+   bool is_spec_key_set = false;
 
/*
 * For debug-tree, we care nothing about extent tree (it's just backref
@@ -248,10 +266,11 @@ int cmd_inspect_dump_tree(int argc, char **argv)
{ "block", required_argument, NULL, 'b'},
{ "tree", required_argument, NULL, 't'},
{ "follow", no_argument, NULL, GETOPT_VAL_FOLLOW },
+   { "key", required_argument, NULL, 'k'},
{ NULL, 0, NULL, 0 }
};
 
-   c = getopt_long(argc, argv, "deb:rRut:", long_options, NULL);
+   c = getopt_long(argc, argv, "deb:rRut:k:", long_options, NULL);
if (c < 0)
break;
switch (c) {
@@ -300,6 +319,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
}
break;
}
+   case 'k':
+   ret = parse_key(_key);
+   if (ret)
+   exit(1);
+   is_spec_key_set = true;
+   break;
case GETOPT_VAL_FOLLOW:
follow = true;
break;
@@ -308,6 +333,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
}
}
 
+   if (!tree_id && is_spec_key_set)
+   usage(cmd_inspect_dump_tree_usage);
+
if (check_argc_exact(argc - optind, 1))
usage(cmd_inspect_dump_tree_usage);
 
@@ -325,6 +353,17 @@ int cmd_inspect_dump_tree(int argc, char **argv)
goto out;
}
 
+   if (is_spec_key_set) {
+   root = info->tree_root;
+   if (IS_ERR(root) || !root) {
+   ret = root ? PTR_ERR(root) : -1;
+   error("unable to open %s", argv[optind]);
+   goto out;
+   }
+
+   btrfs_search_print_path(info, tree_id, _key);
+   goto close_root;
+   }
if (block_only) {
root = info->chunk_root;
leaf = read_tree_block(info, block_only, 0);
diff --git a/print-tree.c b/print-tree.c
index a09ecfbb28f0..3868e42d5d29 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -1459,3 +1459,80 @@ void btrfs_print_tree(struct extent_buffer *eb, int 

[PATCH v5 2/2] btrfs: get device pointer from btrfs_scan_one_device

2018-07-16 Thread Gu Jinxiang
Instead of pointer to btrfs_fs_devices as an arg in
btrfs_scan_one_device, better to make it as a return value.

And since btrfs_fs_devices can be get by btrfs_device,
better to return btrfs_device than fs_btrfs_devices.

Signed-off-by: Gu Jinxiang 
---

Changelog:
v5: rebase to misc-next.
v4: as suggested by Anand, change return value of
btrfs_scan_one_device from btrfs_fs_devices to btrfs_device.
v3: as comment by robot, use PTR_ERR_OR_ZERO, and rebase to misc-next.
v2: as comment by Nikolay, use ERR_CAST instead of cast type manually.

 fs/btrfs/super.c   | 35 ++-
 fs/btrfs/volumes.c | 22 --
 fs/btrfs/volumes.h |  4 ++--
 3 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index dcab4a0244e5..413bbe2d6db2 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -888,7 +888,7 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
 {
substring_t args[MAX_OPT_ARGS];
char *device_name, *opts, *orig, *p;
-   struct btrfs_fs_devices *fs_devices = NULL;
+   struct btrfs_device *device = NULL;
int error = 0;
 
lockdep_assert_held(_mutex);
@@ -918,11 +918,13 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
error = -ENOMEM;
goto out;
}
-   error = btrfs_scan_one_device(device_name, flags,
-   holder, _devices);
+   device = btrfs_scan_one_device(device_name, flags,
+   holder);
kfree(device_name);
-   if (error)
+   if (IS_ERR(device)) {
+   error = PTR_ERR(device);
goto out;
+   }
}
}
 
@@ -1518,6 +1520,7 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
 {
struct block_device *bdev = NULL;
struct super_block *s;
+   struct btrfs_device *device = NULL;
struct btrfs_fs_devices *fs_devices = NULL;
struct btrfs_fs_info *fs_info = NULL;
struct security_mnt_opts new_sec_opts;
@@ -1561,12 +1564,15 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
goto error_fs_info;
}
 
-   error = btrfs_scan_one_device(device_name, mode, fs_type, _devices);
-   if (error) {
+   device = btrfs_scan_one_device(device_name, mode, fs_type);
+   if (IS_ERR(device)) {
+   error = PTR_ERR(device);
mutex_unlock(_mutex);
goto error_fs_info;
}
 
+   fs_devices = device->fs_devices;
+
fs_info->fs_devices = fs_devices;
 
error = btrfs_open_devices(fs_devices, mode, fs_type);
@@ -2227,7 +2233,7 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
unsigned long arg)
 {
struct btrfs_ioctl_vol_args *vol;
-   struct btrfs_fs_devices *fs_devices;
+   struct btrfs_device *device = NULL;
int ret = -ENOTTY;
 
if (!capable(CAP_SYS_ADMIN))
@@ -2240,19 +2246,22 @@ static long btrfs_control_ioctl(struct file *file, 
unsigned int cmd,
switch (cmd) {
case BTRFS_IOC_SCAN_DEV:
mutex_lock(_mutex);
-   ret = btrfs_scan_one_device(vol->name, FMODE_READ,
-   _root_fs_type, _devices);
+   device = btrfs_scan_one_device(vol->name, FMODE_READ,
+   _root_fs_type);
+   ret = PTR_ERR_OR_ZERO(device);
mutex_unlock(_mutex);
break;
case BTRFS_IOC_DEVICES_READY:
mutex_lock(_mutex);
-   ret = btrfs_scan_one_device(vol->name, FMODE_READ,
-   _root_fs_type, _devices);
-   if (ret) {
+   device = btrfs_scan_one_device(vol->name, FMODE_READ,
+   _root_fs_type);
+   if (IS_ERR(device)) {
+   ret = PTR_ERR(device);
mutex_unlock(_mutex);
break;
}
-   ret = !(fs_devices->num_devices == fs_devices->total_devices);
+   ret = !(device->fs_devices->num_devices ==
+   device->fs_devices->total_devices);
mutex_unlock(_mutex);
break;
case BTRFS_IOC_GET_SUPPORTED_FEATURES:
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 88d37bfa99c8..9f485696461b 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1213,15 +1213,14 @@ static int btrfs_read_disk_super(struct block_device 
*bdev, u64 bytenr,
  * and we are 

[PATCH v5 1/2] btrfs: make fs_devices to be a local variable

2018-07-16 Thread Gu Jinxiang
fs_devices is always passed to btrfs_scan_one_device which
overrides it. And in the call stack below fs_devices is passed to
btrfs_scan_one_device from btrfs_mount_root.
And in btrfs_mount_root the output fs_devices of this call stack
is not used.
btrfs_mount_root
-> btrfs_parse_early_options
->btrfs_scan_one_device
So, there is no necessary to pass fs_devices from btrfs_mount_root,
use a local variable in btrfs_parse_early_options is enough.

Signed-off-by: Gu Jinxiang 
Reviewed-by: Anand Jain 
---

Changelog:
v5: change a line wrap, and rebase to misc-next.
v4: changed a line warp, and adjusted the order of two rows.
v3: rebase to misc-next.
v2: deal with Nikolay's comment, make changelog more clair.

 fs/btrfs/super.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 28ab75ebb983..dcab4a0244e5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -884,10 +884,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
  * only when we need to allocate a new super block.
  */
 static int btrfs_parse_early_options(const char *options, fmode_t flags,
-   void *holder, struct btrfs_fs_devices **fs_devices)
+   void *holder)
 {
substring_t args[MAX_OPT_ARGS];
char *device_name, *opts, *orig, *p;
+   struct btrfs_fs_devices *fs_devices = NULL;
int error = 0;
 
lockdep_assert_held(_mutex);
@@ -917,8 +918,8 @@ static int btrfs_parse_early_options(const char *options, 
fmode_t flags,
error = -ENOMEM;
goto out;
}
-   error = btrfs_scan_one_device(device_name,
-   flags, holder, fs_devices);
+   error = btrfs_scan_one_device(device_name, flags,
+   holder, _devices);
kfree(device_name);
if (error)
goto out;
@@ -1554,7 +1555,7 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
}
 
mutex_lock(_mutex);
-   error = btrfs_parse_early_options(data, mode, fs_type, _devices);
+   error = btrfs_parse_early_options(data, mode, fs_type);
if (error) {
mutex_unlock(_mutex);
goto error_fs_info;
-- 
2.17.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 0/2] btrfs: do some parameters optimization

2018-07-16 Thread Gu Jinxiang
Do parameters optimization for function btrfs_parse_early_options
and btrfs_scan_one_device.

Gu Jinxiang (2):
  btrfs: make fs_devices to be a local variable
  btrfs: get device pointer from btrfs_scan_one_device

 fs/btrfs/super.c   | 38 --
 fs/btrfs/volumes.c | 22 --
 fs/btrfs/volumes.h |  4 ++--
 3 files changed, 34 insertions(+), 30 deletions(-)

-- 
2.17.1



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/2] btrfs: get device pointer from btrfs_scan_one_device

2018-07-16 Thread Anand Jain



::


@@ -1561,12 +1564,15 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
goto error_fs_info;
}

-   error = btrfs_scan_one_device(device_name, mode, fs_type, _devices);
-   if (error) {
+   device = btrfs_scan_one_device(device_name, mode, fs_type);
+   if (IS_ERR(device)) {
+   error = PTR_ERR(device);
mutex_unlock(_mutex);
goto error_fs_info;
}

+   fs_devices = device->fs_devices;
+


 The version at misc-next is missing this assignment and it panics.

::

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 2/2] btrfs: get device pointer from btrfs_scan_one_device

2018-07-16 Thread Gu, Jinxiang


> -Original Message-
> From: David Sterba [mailto:dste...@suse.cz]
> Sent: Monday, July 16, 2018 8:34 PM
> To: Gu, Jinxiang/顾 金香 
> Cc: linux-btrfs@vger.kernel.org; anand.j...@oracle.com
> Subject: Re: [PATCH v5 2/2] btrfs: get device pointer from 
> btrfs_scan_one_device
> 
> On Mon, Jul 16, 2018 at 05:11:17PM +0800, Gu Jinxiang wrote:
> > Instead of pointer to btrfs_fs_devices as an arg in
> > btrfs_scan_one_device, better to make it as a return value.
> >
> > And since btrfs_fs_devices can be get by btrfs_device,
> > better to return btrfs_device than fs_btrfs_devices.
> >
> > Signed-off-by: Gu Jinxiang 
> > ---
> >
> > Changelog:
> > v5: rebase to misc-next.
> > v4: as suggested by Anand, change return value of
> > btrfs_scan_one_device from btrfs_fs_devices to btrfs_device.
> > v3: as comment by robot, use PTR_ERR_OR_ZERO, and rebase to misc-next.
> > v2: as comment by Nikolay, use ERR_CAST instead of cast type manually.
> >
> >  fs/btrfs/super.c   | 35 ++-
> >  fs/btrfs/volumes.c | 22 --
> >  fs/btrfs/volumes.h |  4 ++--
> >  3 files changed, 32 insertions(+), 29 deletions(-)
> >
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index dcab4a0244e5..413bbe2d6db2 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -888,7 +888,7 @@ static int btrfs_parse_early_options(const char 
> > *options, fmode_t flags,
> >  {
> > substring_t args[MAX_OPT_ARGS];
> > char *device_name, *opts, *orig, *p;
> > -   struct btrfs_fs_devices *fs_devices = NULL;
> > +   struct btrfs_device *device = NULL;
> > int error = 0;
> >
> > lockdep_assert_held(_mutex);
> > @@ -918,11 +918,13 @@ static int btrfs_parse_early_options(const char 
> > *options, fmode_t flags,
> > error = -ENOMEM;
> > goto out;
> > }
> > -   error = btrfs_scan_one_device(device_name, flags,
> > -   holder, _devices);
> > +   device = btrfs_scan_one_device(device_name, flags,
> > +   holder);
> > kfree(device_name);
> > -   if (error)
> > +   if (IS_ERR(device)) {
> > +   error = PTR_ERR(device);
> > goto out;
> > +   }
> > }
> > }
> >
> > @@ -1518,6 +1520,7 @@ static struct dentry *btrfs_mount_root(struct 
> > file_system_type *fs_type,
> >  {
> > struct block_device *bdev = NULL;
> > struct super_block *s;
> > +   struct btrfs_device *device = NULL;
> > struct btrfs_fs_devices *fs_devices = NULL;
> > struct btrfs_fs_info *fs_info = NULL;
> > struct security_mnt_opts new_sec_opts;
> > @@ -1561,12 +1564,15 @@ static struct dentry *btrfs_mount_root(struct 
> > file_system_type *fs_type,
> > goto error_fs_info;
> > }
> >
> > -   error = btrfs_scan_one_device(device_name, mode, fs_type, _devices);
> > -   if (error) {
> > +   device = btrfs_scan_one_device(device_name, mode, fs_type);
> > +   if (IS_ERR(device)) {
> > +   error = PTR_ERR(device);
> > mutex_unlock(_mutex);
> > goto error_fs_info;
> > }
> >
> > +   fs_devices = device->fs_devices;
> > +
> > fs_info->fs_devices = fs_devices;
> >
> > error = btrfs_open_devices(fs_devices, mode, fs_type);
> > @@ -2227,7 +2233,7 @@ static long btrfs_control_ioctl(struct file *file, 
> > unsigned int cmd,
> > unsigned long arg)
> >  {
> > struct btrfs_ioctl_vol_args *vol;
> > -   struct btrfs_fs_devices *fs_devices;
> > +   struct btrfs_device *device = NULL;
> > int ret = -ENOTTY;
> >
> > if (!capable(CAP_SYS_ADMIN))
> > @@ -2240,19 +2246,22 @@ static long btrfs_control_ioctl(struct file *file, 
> > unsigned int cmd,
> > switch (cmd) {
> > case BTRFS_IOC_SCAN_DEV:
> > mutex_lock(_mutex);
> > -   ret = btrfs_scan_one_device(vol->name, FMODE_READ,
> > -   _root_fs_type, _devices);
> > +   device = btrfs_scan_one_device(vol->name, FMODE_READ,
> > +   _root_fs_type);
> > +   ret = PTR_ERR_OR_ZERO(device);
> > mutex_unlock(_mutex);
> > break;
> > case BTRFS_IOC_DEVICES_READY:
> > mutex_lock(_mutex);
> > -   ret = btrfs_scan_one_device(vol->name, FMODE_READ,
> > -   _root_fs_type, _devices);
> > -   if (ret) {
> > +   device = btrfs_scan_one_device(vol->name, FMODE_READ,
> > +   _root_fs_type);
> > +   if (IS_ERR(device)) {
> > +   ret = PTR_ERR(device);
> > mutex_unlock(_mutex);
> > break;
> > }
> > -   ret = !(fs_devices->num_devices 

Re: [PATCH 0/2] btrfs-progs: check: enhanced progress indicator

2018-07-16 Thread David Sterba
On Wed, Jul 04, 2018 at 09:20:12PM +0200, Stéphane Lesimple wrote:
> This patch replaces the current ".oOo."-style progress indicator of
> btrfs check (-p), by a more helpful live indication of the check progress.
> I've been using it on a custom btrfs-progs version since 2015.
> 
> Here's how the output looks like on a 22 Tb 5-disk RAID1 FS:
> 
> # btrfs check -p /dev/mapper/luks-ST1VN0004-
> Opening filesystem to check...
> Checking filesystem on /dev/mapper/luks-ST1VN0004-
> UUID: ----
> [1/7] checking extents   (0:20:21 elapsed, 950958 items checked)
> [2/7] checking root items(0:01:29 elapsed, 15121 items checked)
> [3/7] checking free space cache  (0:00:11 elapsed, 4928 items checked)
> [4/7] checking fs roots  (0:51:31 elapsed, 600892 items checked)
> [5/7] checking csums (0:14:35 elapsed, 754522 items checked)
> [6/7] checking root refs (0:00:00 elapsed, 232 items checked)
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 5286458060800 bytes used, no error found

Nice, this looks better than what we have now. Patches applied, with
some minor coding style adjustments and I changed the update period
back to 1 second, 50ms is IMHO too frequent for an operation that can
run for tens of minutes. Also as a safety against dumping too much
information to the log into if the output is redirected.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] 3- and 4- copy RAID1

2018-07-16 Thread Austin S. Hemmelgarn

On 2018-07-16 14:29, Goffredo Baroncelli wrote:

On 07/15/2018 04:37 PM, waxhead wrote:

David Sterba wrote:

An interesting question is the naming of the extended profiles. I picked
something that can be easily understood but it's not a final proposal.
Years ago, Hugo proposed a naming scheme that described the
non-standard raid varieties of the btrfs flavor:

https://marc.info/?l=linux-btrfs=136286324417767

Switching to this naming would be a good addition to the extended raid.


As just a humble BTRFS user I agree and really think it is about time to move 
far away from the RAID terminology. However adding some more descriptive 
profile names (or at least some aliases) would be much better for the commoners 
(such as myself).

For example:

Old format / New Format / My suggested alias
SINGLE  / 1C / SINGLE
DUP / 2CD    / DUP (or even MIRRORLOCAL1)
RAID0   / 1CmS   / STRIPE




RAID1   / 2C / MIRROR1
RAID1c3 / 3C / MIRROR2
RAID1c4 / 4C / MIRROR3
RAID10  / 2CmS   / STRIPE.MIRROR1


Striping and mirroring/pairing are orthogonal properties; mirror and parity are 
mutually exclusive. What about

RAID1 -> MIRROR1
RAID10 -> MIRROR1S
RAID1c3 -> MIRROR2
RAID1c3+striping -> MIRROR2S

and so on...


RAID5   / 1CmS1P / STRIPE.PARITY1
RAID6   / 1CmS2P / STRIPE.PARITY2


To me these should be called something like

RAID5 -> PARITY1S
RAID6 -> PARITY2S

The S final is due to the fact that usually RAID5/6 spread the data on all 
available disks

Question #1: for "parity" profiles, does make sense to limit the maximum disks 
number where the data may be spread ? If the answer is not, we could omit the last S. 
IMHO it should.
Currently, there is no ability to cap the number of disks that striping 
can happen across.  Ideally, that will change in the future, in which 
case not only the S will be needed, but also a number indicating how 
wide the stripe is.



Question #2: historically RAID10 is requires 4 disks. However I am guessing if 
the stripe could be done on a different number of disks: What about 
RAID1+Striping on 3 (or 5 disks) ? The key of striping is that every 64k, the 
data are stored on a different disk
This is what MD and LVM RAID10 do.  They work somewhat differently from 
what BTRFS calls raid10 (actually, what we currently call raid1 works 
almost identically to MD and LVM RAID10 when more than 3 disks are 
involved, except that the chunk size is 1G or larger).  Short of drastic 
internal changes to how that profile works, this isn't likely to happen.


In spite of both of these, there is practical need for indicating the 
stripe width.  Depending on the configuration of the underlying storage, 
it's fully possible (and sometimes even certain) that you will see 
chunks with differing stripe widths, so properly reporting the stripe 
width (in devices, not bytes) is useful for monitoring purposes).


Consider for example a 6-device array using what's currently called a 
raid10 profile where 2 of the disks are smaller than the other four.  On 
such an array, chunks will span all six disks (resulting in 2 copies 
striped across 3 disks each) until those two smaller disks are full, at 
which point new chunks will span only the remaining four disks 
(resulting in 2 copies striped across 2 disks each).

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] fstests: btrfs/168 verify device ready after device delete

2018-07-16 Thread Eryu Guan
On Fri, Jul 06, 2018 at 02:01:37PM +0800, Anand Jain wrote:
> This test case verifies if the device ready return success after the
> device delete.
> 
> Signed-off-by: Anand Jain 

Need some helps from btrfs folks to see if it's a valid test. Thanks!

Eryu

> ---
> v1->v2: use _run_btrfs_util_prog instead of open coding it.
> 
>  tests/btrfs/168 | 68 
> +
>  tests/btrfs/168.out |  2 ++
>  tests/btrfs/group   |  1 +
>  3 files changed, 71 insertions(+)
>  create mode 100755 tests/btrfs/168
>  create mode 100644 tests/btrfs/168.out
> 
> diff --git a/tests/btrfs/168 b/tests/btrfs/168
> new file mode 100755
> index ..0d3e99839209
> --- /dev/null
> +++ b/tests/btrfs/168
> @@ -0,0 +1,68 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2018 Oracle. All Rights Reserved.
> +#
> +# FS QA Test 168
> +#
> +# Test if btrfs is still reported ready after the device delete
> +#
> +# This could be fixed by the following kernel commit:
> +#  btrfs: fix missing superblock update in the device delete commit 
> transaction
> +#
> +seq=`basename $0`
> +seqres=$RESULT_DIR/$seq
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +tmp=/tmp/$$
> +status=1 # failure is the default!
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +_cleanup()
> +{
> + cd /
> + rm -f $tmp.*
> +}
> +
> +# get standard environment, filters and checks
> +. ./common/rc
> +. ./common/filter
> +
> +# remove previous $seqres.full before test
> +rm -f $seqres.full
> +
> +# real QA test starts here
> +
> +# Modify as appropriate.
> +_supported_fs btrfs
> +_supported_os Linux
> +_require_command "$BTRFS_TUNE_PROG" btrfstune
> +_require_scratch_dev_pool 2
> +
> +_scratch_dev_pool_get 2
> +dev_1=$(echo $SCRATCH_DEV_POOL | awk '{print $1}')
> +dev_2=$(echo $SCRATCH_DEV_POOL | awk '{print $2}')
> +
> +# normal delete device and then check for ready
> +run_check _scratch_pool_mkfs "-d single -m single"
> +_scratch_mount
> +_run_btrfs_util_prog device delete $dev_1 $SCRATCH_MNT
> +_run_btrfs_util_prog device ready $dev_2
> +
> +_scratch_unmount
> +# delete a seed device and then check for ready
> +run_check $BTRFS_TUNE_PROG -S 1 $dev_2
> +run_check _mount $dev_2 $SCRATCH_MNT
> +_run_btrfs_util_prog device add -f $dev_1 $SCRATCH_MNT
> +run_check mount -o rw,remount $dev_1 $SCRATCH_MNT
> +_run_btrfs_util_prog device delete $dev_2 $SCRATCH_MNT
> +_run_btrfs_util_prog device ready $dev_1
> +
> +_scratch_unmount
> +_scratch_dev_pool_put
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/168.out b/tests/btrfs/168.out
> new file mode 100644
> index ..893a41d859c8
> --- /dev/null
> +++ b/tests/btrfs/168.out
> @@ -0,0 +1,2 @@
> +QA output created by 168
> +Silence is golden
> diff --git a/tests/btrfs/group b/tests/btrfs/group
> index 5cff3bd6cc03..7bc3ea457992 100644
> --- a/tests/btrfs/group
> +++ b/tests/btrfs/group
> @@ -170,3 +170,4 @@
>  165 auto quick subvol
>  166 auto quick qgroup
>  167 auto quick replace volume
> +168 auto quick volume
> -- 
> 2.15.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs: Use btrfs_mark_bg_unused() to replace open code

2018-07-16 Thread David Sterba
On Sun, May 27, 2018 at 09:25:25AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年05月22日 20:14, David Sterba wrote:
> > On Tue, May 22, 2018 at 04:43:47PM +0800, Qu Wenruo wrote:
> >> Introduce a small helper, btrfs_mark_bg_unused(), to accquire needed
> >> locks and add a block group to unused_bgs list.
> > 
> > The helper is nice but hides that there's a reference taken on the 'bg'.
> > This would be good to add at least to the function comment or somehow
> > squeeze it into the function name itself, like
> > btrfs_get_and_mark_bg_unused.
> 
> That btrfs_get_block_group() call is only for later removal.
> The reference will be removed in btrfs_delete_unused_bgs().
> 
> So I don't think the function name needs the "_get" part.

Agreed, it's inside the helpers. Patch added to misc-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/2] btrfs: make fs_devices to be a local variable

2018-07-16 Thread Nikolay Borisov



On 16.07.2018 12:11, Gu Jinxiang wrote:
> fs_devices is always passed to btrfs_scan_one_device which
> overrides it. And in the call stack below fs_devices is passed to
> btrfs_scan_one_device from btrfs_mount_root.
> And in btrfs_mount_root the output fs_devices of this call stack
> is not used.
> btrfs_mount_root
> -> btrfs_parse_early_options
> ->btrfs_scan_one_device
> So, there is no necessary to pass fs_devices from btrfs_mount_root,
> use a local variable in btrfs_parse_early_options is enough.
> 
> Signed-off-by: Gu Jinxiang 
> Reviewed-by: Anand Jain 
> ---
> 
> Changelog:
> v5: change a line wrap, and rebase to misc-next.
> v4: changed a line warp, and adjusted the order of two rows.
> v3: rebase to misc-next.
> v2: deal with Nikolay's comment, make changelog more clair.
> 
>  fs/btrfs/super.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 28ab75ebb983..dcab4a0244e5 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -884,10 +884,11 @@ int btrfs_parse_options(struct btrfs_fs_info *info, 
> char *options,
>   * only when we need to allocate a new super block.
>   */
>  static int btrfs_parse_early_options(const char *options, fmode_t flags,
> - void *holder, struct btrfs_fs_devices **fs_devices)
> + void *holder)
>  {
>   substring_t args[MAX_OPT_ARGS];
>   char *device_name, *opts, *orig, *p;
> + struct btrfs_fs_devices *fs_devices = NULL;
>   int error = 0;
>  
>   lockdep_assert_held(_mutex);
> @@ -917,8 +918,8 @@ static int btrfs_parse_early_options(const char *options, 
> fmode_t flags,
>   error = -ENOMEM;
>   goto out;
>   }
> - error = btrfs_scan_one_device(device_name,
> - flags, holder, fs_devices);
> + error = btrfs_scan_one_device(device_name, flags,
> + holder, _devices);

You are line wraps are in fact worse than before, the first letter of
the "holder.." line has to be aligned exactly under the first letter of
"device_name". I assume David will fix this while merging so no need to
resend but fix your editor for future submissions.

>   kfree(device_name);
>   if (error)
>   goto out;
> @@ -1554,7 +1555,7 @@ static struct dentry *btrfs_mount_root(struct 
> file_system_type *fs_type,
>   }
>  
>   mutex_lock(_mutex);
> - error = btrfs_parse_early_options(data, mode, fs_type, _devices);
> + error = btrfs_parse_early_options(data, mode, fs_type);
>   if (error) {
>   mutex_unlock(_mutex);
>   goto error_fs_info;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/2] btrfs: get device pointer from btrfs_scan_one_device

2018-07-16 Thread David Sterba
On Mon, Jul 16, 2018 at 05:11:17PM +0800, Gu Jinxiang wrote:
> Instead of pointer to btrfs_fs_devices as an arg in
> btrfs_scan_one_device, better to make it as a return value.
> 
> And since btrfs_fs_devices can be get by btrfs_device,
> better to return btrfs_device than fs_btrfs_devices.
> 
> Signed-off-by: Gu Jinxiang 
> ---
> 
> Changelog:
> v5: rebase to misc-next.
> v4: as suggested by Anand, change return value of
> btrfs_scan_one_device from btrfs_fs_devices to btrfs_device.
> v3: as comment by robot, use PTR_ERR_OR_ZERO, and rebase to misc-next.
> v2: as comment by Nikolay, use ERR_CAST instead of cast type manually.
> 
>  fs/btrfs/super.c   | 35 ++-
>  fs/btrfs/volumes.c | 22 --
>  fs/btrfs/volumes.h |  4 ++--
>  3 files changed, 32 insertions(+), 29 deletions(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index dcab4a0244e5..413bbe2d6db2 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -888,7 +888,7 @@ static int btrfs_parse_early_options(const char *options, 
> fmode_t flags,
>  {
>   substring_t args[MAX_OPT_ARGS];
>   char *device_name, *opts, *orig, *p;
> - struct btrfs_fs_devices *fs_devices = NULL;
> + struct btrfs_device *device = NULL;
>   int error = 0;
>  
>   lockdep_assert_held(_mutex);
> @@ -918,11 +918,13 @@ static int btrfs_parse_early_options(const char 
> *options, fmode_t flags,
>   error = -ENOMEM;
>   goto out;
>   }
> - error = btrfs_scan_one_device(device_name, flags,
> - holder, _devices);
> + device = btrfs_scan_one_device(device_name, flags,
> + holder);
>   kfree(device_name);
> - if (error)
> + if (IS_ERR(device)) {
> + error = PTR_ERR(device);
>   goto out;
> + }
>   }
>   }
>  
> @@ -1518,6 +1520,7 @@ static struct dentry *btrfs_mount_root(struct 
> file_system_type *fs_type,
>  {
>   struct block_device *bdev = NULL;
>   struct super_block *s;
> + struct btrfs_device *device = NULL;
>   struct btrfs_fs_devices *fs_devices = NULL;
>   struct btrfs_fs_info *fs_info = NULL;
>   struct security_mnt_opts new_sec_opts;
> @@ -1561,12 +1564,15 @@ static struct dentry *btrfs_mount_root(struct 
> file_system_type *fs_type,
>   goto error_fs_info;
>   }
>  
> - error = btrfs_scan_one_device(device_name, mode, fs_type, _devices);
> - if (error) {
> + device = btrfs_scan_one_device(device_name, mode, fs_type);
> + if (IS_ERR(device)) {
> + error = PTR_ERR(device);
>   mutex_unlock(_mutex);
>   goto error_fs_info;
>   }
>  
> + fs_devices = device->fs_devices;
> +
>   fs_info->fs_devices = fs_devices;
>  
>   error = btrfs_open_devices(fs_devices, mode, fs_type);
> @@ -2227,7 +2233,7 @@ static long btrfs_control_ioctl(struct file *file, 
> unsigned int cmd,
>   unsigned long arg)
>  {
>   struct btrfs_ioctl_vol_args *vol;
> - struct btrfs_fs_devices *fs_devices;
> + struct btrfs_device *device = NULL;
>   int ret = -ENOTTY;
>  
>   if (!capable(CAP_SYS_ADMIN))
> @@ -2240,19 +2246,22 @@ static long btrfs_control_ioctl(struct file *file, 
> unsigned int cmd,
>   switch (cmd) {
>   case BTRFS_IOC_SCAN_DEV:
>   mutex_lock(_mutex);
> - ret = btrfs_scan_one_device(vol->name, FMODE_READ,
> - _root_fs_type, _devices);
> + device = btrfs_scan_one_device(vol->name, FMODE_READ,
> + _root_fs_type);
> + ret = PTR_ERR_OR_ZERO(device);
>   mutex_unlock(_mutex);
>   break;
>   case BTRFS_IOC_DEVICES_READY:
>   mutex_lock(_mutex);
> - ret = btrfs_scan_one_device(vol->name, FMODE_READ,
> - _root_fs_type, _devices);
> - if (ret) {
> + device = btrfs_scan_one_device(vol->name, FMODE_READ,
> + _root_fs_type);
> + if (IS_ERR(device)) {
> + ret = PTR_ERR(device);
>   mutex_unlock(_mutex);
>   break;
>   }
> - ret = !(fs_devices->num_devices == fs_devices->total_devices);
> + ret = !(device->fs_devices->num_devices ==
> + device->fs_devices->total_devices);
>   mutex_unlock(_mutex);
>   break;
>   case BTRFS_IOC_GET_SUPPORTED_FEATURES:
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 88d37bfa99c8..9f485696461b 100644
> 

Re: [PATCH v2] btrfs: Rewrite retry logic in do_chunk_alloc

2018-07-16 Thread David Sterba
On Wed, Apr 18, 2018 at 10:27:57AM +0300, Nikolay Borisov wrote:
> do_chunk_alloc implements logic to detect whether there is currently
> pending chunk allocation  (by means of space_info->chunk_alloc being
> set) and if so it loops around to the 'again' label. Additionally,
> based on the state of the space_info (e.g. whether it's full or not)
> and the return value of should_alloc_chunk() it decides whether this
> is a "hard" error (ENOSPC) or we can just return 0.
> 
> This patch refactors all of this:
> 
> 1. Put order to the scattered ifs handling the various cases in an
> easy-to-read if {} else if{} branches. This makes clear the various
> cases we are interested in handling.
> 
> 2. Call should_alloc_chunk only once and use the result in the
> if/else if constructs. All of this is done under space_info->lock, so
> even before multiple calls of should_alloc_chunk were unnecessary.
> 
> 3. Rewrite the "do {} while()" loop currently implemented via label
> into an explicit loop construct.
> 
> 4. Move the mutex locking for the case where the caller is the one doing the 
> allocation. For the case where the caller needs to wait a concurrent 
> allocation, 
> introduce a pair of mutex_lock/mutex_unlock to act as a barrier and reword 
> the 
> comment. 
> 
> 5. Switch local vars to bool type where pertinent.
> 
> All in all this shouldn't introduce any functional changes.
> 
> Signed-off-by: Nikolay Borisov 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: rename btrfs_parse_early_options

2018-07-16 Thread Anand Jain
Rename btrfs_parse_early_options() to btrfs_parse_device_options(). As
btrfs_parse_early_options() parses the -o device options and scan the
device provided. So this rename specifies its action. Also the function
name is inline with btrfs_parse_subvol_options().
No functional changes.

Signed-off-by: Anand Jain 
---
 fs/btrfs/super.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 9d340f8d3457..9dc56d3eaa66 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -465,9 +465,9 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
case Opt_subvolrootid:
case Opt_device:
/*
-* These are parsed by btrfs_parse_subvol_options
-* and btrfs_parse_early_options
-* and can be happily ignored here.
+* These are parsed by btrfs_parse_subvol_options and
+* btrfs_parse_device_options and can be happily
+* ignored here.
 */
break;
case Opt_nodatasum:
@@ -883,8 +883,8 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char 
*options,
  * All other options will be parsed on much later in the mount process and
  * only when we need to allocate a new super block.
  */
-static int btrfs_parse_early_options(const char *options, fmode_t flags,
-void *holder)
+static int btrfs_parse_device_options(const char *options, fmode_t flags,
+ void *holder)
 {
substring_t args[MAX_OPT_ARGS];
char *device_name, *opts, *orig, *p;
@@ -951,7 +951,7 @@ static int btrfs_parse_subvol_options(const char *options, 
char **subvol_name,
 
/*
 * strsep changes the string, duplicate it because
-* btrfs_parse_early_options gets called later
+* btrfs_parse_device_options gets called later
 */
opts = kstrdup(options, GFP_KERNEL);
if (!opts)
@@ -1558,7 +1558,7 @@ static struct dentry *btrfs_mount_root(struct 
file_system_type *fs_type,
}
 
mutex_lock(_mutex);
-   error = btrfs_parse_early_options(data, mode, fs_type);
+   error = btrfs_parse_device_options(data, mode, fs_type);
if (error) {
mutex_unlock(_mutex);
goto error_fs_info;
-- 
2.15.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2nd try v4] btrfs: drop uuid_mutex in btrfs_free_extra_devids()

2018-07-16 Thread Anand Jain




On 05/29/2018 07:19 PM, David Sterba wrote:

On Tue, May 29, 2018 at 01:40:34PM +0800, Anand Jain wrote:

v3->v4: As we traverse through the seed device, fs_device gets
updated with
 the child seed fs_devices, so make sure we use the same fs_devices
 pointer for the mutex_unlock as used for the mutex_lock.


Well, now that I see the change, shouldn't we always hold the
device_list_mutex of the fs_devices that's being processed? Ie. each
time it's switched, the previous is unlocked and new one locked.


   No David. That's because we organize seed device under its parent
   fs_devices ((fs_devices::seed)::seed)..so on, and they are a local
   cloned copy of the original seed fs_devices. So parent's
   fs_devices::device_list_mutex lock will suffice.


   On the 2nd thought, though theoretically you are right. But practically
   there isn't any use case which can benefit by using the intricate
   locking as you suggested above.


I don't think it's intricate or complex, just lock the fs_devices that
can be potentially modified in the following loop.

Schematically:

function called with some fs_devices

again:
lock fs_devices->device_list_mutex
foreach device in fs_devices
if ...
fs_devices counters get changed after device
deletion
endfor

if (fs_devices->seed)
unlock fs_devices->device_list_mutex
fs_devices = fs_devices->seed
lock fs_devices->device_list_mutex  <-- lock the new one
goto again
endif

unlock fs_devices->device_list_mutex<-- correctly unlock
endfunc


   I am following the following method:-
   By holding the parent fs_devices (which is also the mounted fs_devices
   lock) it would imply to lock its dependent cloned seed fs_devices, as
   to reach the cloned seed device, first we need to traverse from the
   parent fs_devices.


Locking the parent fs_devices might be the right scheme, but if any
child fs_devices pointer gets passed to a random function that will
change it, then the locking will not protect it.


If its not holding the parent fs_devices lock then it would a bug and
if its already holding the parent lock then child lock isn't necessary
at all.

Also sent the
  [DOC] BTRFS Volume operations, Device Lists and Locks all in one page
to have a holistic vewi.

Its not too convincing to me the approach of using the per fs_devices
lock when fs_devices::seed is traversed its not necessary.

Thanks, Anand



This might need a deeper audit how the seeding device is done and maybe
add some helpers that eg. lock the whole chain so all of them are
protected.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs: Introduce mount time chunk <-> dev extent mapping check

2018-07-16 Thread David Sterba
On Mon, Jul 09, 2018 at 02:42:02PM +0800, Qu Wenruo wrote:
> This patch will introduce chunk <-> dev extent mapping check, to protect
> us against invalid dev extents or chunks.
> 
> Since chunk mapping is the fundamental infrastructure of btrfs, extra
> check at mount time could prevent a lot of unexpected behavior (BUG_ON).
> 
> Reported-by: Xu Wen 
> Links: https://bugzilla.kernel.org/show_bug.cgi?id=200403

Link:

> Links: https://bugzilla.kernel.org/show_bug.cgi?id=200407
> Signed-off-by: Qu Wenruo 
> ---
>  fs/btrfs/disk-io.c |   7 ++
>  fs/btrfs/volumes.c | 173 +
>  fs/btrfs/volumes.h |   2 +
>  3 files changed, 182 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 205092dc9390..068ca7498e94 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3075,6 +3075,13 @@ int open_ctree(struct super_block *sb,
>   fs_info->generation = generation;
>   fs_info->last_trans_committed = generation;
>  
> + ret = btrfs_verify_dev_extents(fs_info);
> + if (ret) {
> + btrfs_err(fs_info,
> +   "failed to verify dev extents against chunks: %d",
> +   ret);
> + goto fail_block_groups;
> + }
>   ret = btrfs_recover_balance(fs_info);
>   if (ret) {
>   btrfs_err(fs_info, "failed to recover balance: %d", ret);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index e6a8e4aabc66..05e418cb37f3 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6440,6 +6440,7 @@ static int read_one_chunk(struct btrfs_fs_info 
> *fs_info, struct btrfs_key *key,
>   map->stripe_len = btrfs_chunk_stripe_len(leaf, chunk);
>   map->type = btrfs_chunk_type(leaf, chunk);
>   map->sub_stripes = btrfs_chunk_sub_stripes(leaf, chunk);
> + map->verified_stripes = 0;
>   for (i = 0; i < num_stripes; i++) {
>   map->stripes[i].physical =
>   btrfs_stripe_offset_nr(leaf, chunk, i);
> @@ -7295,3 +7296,175 @@ void btrfs_reset_fs_info_ptr(struct btrfs_fs_info 
> *fs_info)
>   fs_devices = fs_devices->seed;
>   }
>  }
> +
> +static u64 calc_stripe_length(u64 type, u64 chunk_len, int num_stripes)
> +{
> + switch (type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> + case BTRFS_BLOCK_GROUP_RAID0:
> + return div_u64(chunk_len, num_stripes);
> + case BTRFS_BLOCK_GROUP_RAID10:
> + return div_u64(chunk_len * 2, num_stripes);
> + case BTRFS_BLOCK_GROUP_RAID5:
> + return div_u64(chunk_len, num_stripes - 1);
> + case BTRFS_BLOCK_GROUP_RAID6:
> + return div_u64(chunk_len, num_stripes - 2);
> + default:
> + return chunk_len;
> + }
> +}

There are already too many hardcoded values for the raid profiles,
please don't add another one unless really necessary and use existing
predefined constants or helpers (eg. nr_data_stripes or
btrfs_raid_array).

> +static int verify_one_dev_extent(struct btrfs_fs_info *fs_info,
> +  u64 chunk_offset, u64 devid,
> +  u64 physical_offset, u64 physical_len)
> +{
> + struct extent_map_tree *em_tree = _info->mapping_tree.map_tree;
> + struct extent_map *em;
> + struct map_lookup *map;
> + u64 stripe_len;
> + bool found = false;

This variable is only set and never checked.

> + int ret = 0;
> + int i;
> +
> + read_lock(_tree->lock);
> + em = lookup_extent_mapping(em_tree, chunk_offset, 1);
> + read_unlock(_tree->lock);
> +
> + if (!em) {
> + ret = -EUCLEAN;
> + btrfs_err(fs_info,
> + "dev extent (%llu, %llu) doesn't have corresponding chunk",
> +   devid, physical_offset);
> + goto out;
> + }
> +
> + map = em->map_lookup;
> + stripe_len = calc_stripe_length(map->type, em->len, map->num_stripes);
> + if (physical_len != stripe_len) {
> + btrfs_err(fs_info,
> +"dev extent (%llu, %llu) length doesn't match with chunk %llu, have %llu 
> expect %llu",
> +   devid, physical_offset, em->start, physical_len,
> +   stripe_len);
> + ret = -EUCLEAN;
> + goto out;
> + }
> +
> + for (i = 0; i < map->num_stripes; i++) {
> + if (map->stripes[i].dev->devid == devid &&
> + map->stripes[i].physical == physical_offset) {
> + found = true;

2nd time set

> + if (map->verified_stripes >= map->num_stripes) {
> + btrfs_err(fs_info,
> + "too many dev extent for chunk %llu is detected",
> +   em->start);
> + ret = -EUCLEAN;
> + goto out;
> + }
> + map->verified_stripes++;
> + break;
> 

Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time

2018-07-16 Thread Qu Wenruo


On 2018年07月16日 21:16, David Sterba wrote:
> On Fri, Jul 06, 2018 at 07:44:37AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2018年07月05日 23:18, David Sterba wrote:
>>> On Tue, Jul 03, 2018 at 05:10:09PM +0800, Qu Wenruo wrote:
 If a crafted btrfs has missing block group items, it could cause
 unexpected behavior and breaks our expectation on 1:1
 chunk<->block group mapping.

 Although we added block group -> chunk mapping check, we still need
 chunk -> block group mapping check.

 This patch will do extra check to ensure each chunk has its
 corresponding block group.

 Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
 Reported-by: Xu Wen 
 Signed-off-by: Qu Wenruo 
 ---
  fs/btrfs/extent-tree.c | 52 +-
  1 file changed, 51 insertions(+), 1 deletion(-)

 diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
 index 82b446f014b9..746095034ca2 100644
 --- a/fs/btrfs/extent-tree.c
 +++ b/fs/btrfs/extent-tree.c
 @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info 
 *fs_info, u64 start, u64 len,
return ret;
  }
  
 +/*
 + * Iterate all chunks and verify each of them has corresponding block 
 group
 + */
 +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
 +{
 +  struct btrfs_mapping_tree *map_tree = _info->mapping_tree;
 +  struct extent_map *em;
 +  struct btrfs_block_group_cache *bg;
 +  u64 start = 0;
 +  int ret = 0;
 +
 +  while (1) {
 +  read_lock(_tree->map_tree.lock);
 +  em = lookup_extent_mapping(_tree->map_tree, start,
 + (u64)-1 - start);
>>>
>>> This needs a comment.
>>
>> For the @len part?
> 
> Yes, for the expression how it's calculated.
> 
>>>
 +  read_unlock(_tree->map_tree.lock);
 +  if (!em)
 +  break;
 +
 +  bg = btrfs_lookup_block_group(fs_info, em->start);
 +  if (!bg) {
 +  btrfs_err_rl(fs_info,
 +  "chunk start=%llu len=%llu doesn't have corresponding block group",
 +   em->start, em->len);
 +  ret = -ENOENT;
>>>
>>> -EUCLEAN ?
>>
>> Either works for me.
> 
> That's not just a cosmetic change, there's a semantic difference between
> the error codes, I maybe make that more explicit and not expect that this
> is obvious.
> 
> ENOENT does not make much sense in this context, the caller (mount in
> this case) cannot do anything about a code that says 'some internal
> structure not found'.

The point here is, if every self-checker should only return -EUCLEAN, it
won't really indicate what's going wrong, except points to some
self-checker (and such self-checkers are growing larger than our
expectation already).

My practice here is, put some human readable and meaningful error
message. No matter what we choose to return, the error message should
tell us what's going wrong.

In this case, I don't really care the return value. If it's explicitly
needed to return -EUCLEAN, I could make all existing checker (from
tree-checker to chunk/bg/dev-extent checker) to return -EUCLEAN if
anything is wrong (and save several "ret = -EUCLEAN" lines).
The return value doesn't really have much meaning nowadays, it's the
error message important now.

Thanks,
Qu

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 0/2] btrfs: chunk and dev-extent related error handler enhancement

2018-07-16 Thread David Sterba
On Mon, Jul 09, 2018 at 02:42:01PM +0800, Qu Wenruo wrote:
> Can be fetched with all existing tree-checker/bg<->chunk error detector
> from github:
> https://github.com/adam900710/linux/tree/tree_checker_enhance
> 
> Still some fuzzed images reported from Xu Wen.
> This time, 2 can be fixed by chunk <-> dev extent mapping verification.
> One BUG_ON() can be removed.
> 
> The remaining 2 images are all mostly about extent tree corruption,
> which is not as easy to detect, since extent tree has way more complex
> reference relationship.
> 
> At least, fix what we can first.
> 
> And for chunk <-> dev extent mapping verification, it will trigger
> read on the whole device tree, to iterate through all DEV_EXTENT items.
> This will introduce an extra overhead on mount.
> 
> However since device tree is pretty small (the same level as chunk tree),
> and according to previous analyse on mount time, chunk tree iteration is
> only a pretty small fraction of the whole mount time (less than 5%), it
> shouldn't bring obvious impact to users.

Ok, that's acceptable.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] btrfs: Verify every chunk has corresponding block group at mount time

2018-07-16 Thread David Sterba
On Fri, Jul 06, 2018 at 07:44:37AM +0800, Qu Wenruo wrote:
> 
> 
> On 2018年07月05日 23:18, David Sterba wrote:
> > On Tue, Jul 03, 2018 at 05:10:09PM +0800, Qu Wenruo wrote:
> >> If a crafted btrfs has missing block group items, it could cause
> >> unexpected behavior and breaks our expectation on 1:1
> >> chunk<->block group mapping.
> >>
> >> Although we added block group -> chunk mapping check, we still need
> >> chunk -> block group mapping check.
> >>
> >> This patch will do extra check to ensure each chunk has its
> >> corresponding block group.
> >>
> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847
> >> Reported-by: Xu Wen 
> >> Signed-off-by: Qu Wenruo 
> >> ---
> >>  fs/btrfs/extent-tree.c | 52 +-
> >>  1 file changed, 51 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index 82b446f014b9..746095034ca2 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -10038,6 +10038,56 @@ static int check_exist_chunk(struct btrfs_fs_info 
> >> *fs_info, u64 start, u64 len,
> >>return ret;
> >>  }
> >>  
> >> +/*
> >> + * Iterate all chunks and verify each of them has corresponding block 
> >> group
> >> + */
> >> +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info)
> >> +{
> >> +  struct btrfs_mapping_tree *map_tree = _info->mapping_tree;
> >> +  struct extent_map *em;
> >> +  struct btrfs_block_group_cache *bg;
> >> +  u64 start = 0;
> >> +  int ret = 0;
> >> +
> >> +  while (1) {
> >> +  read_lock(_tree->map_tree.lock);
> >> +  em = lookup_extent_mapping(_tree->map_tree, start,
> >> + (u64)-1 - start);
> > 
> > This needs a comment.
> 
> For the @len part?

Yes, for the expression how it's calculated.

> > 
> >> +  read_unlock(_tree->map_tree.lock);
> >> +  if (!em)
> >> +  break;
> >> +
> >> +  bg = btrfs_lookup_block_group(fs_info, em->start);
> >> +  if (!bg) {
> >> +  btrfs_err_rl(fs_info,
> >> +  "chunk start=%llu len=%llu doesn't have corresponding block group",
> >> +   em->start, em->len);
> >> +  ret = -ENOENT;
> > 
> > -EUCLEAN ?
> 
> Either works for me.

That's not just a cosmetic change, there's a semantic difference between
the error codes, I maybe make that more explicit and not expect that this
is obvious.

ENOENT does not make much sense in this context, the caller (mount in
this case) cannot do anything about a code that says 'some internal
structure not found'.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 1/2] btrfs-progs: remove stale dir-test and quick-test

2018-07-16 Thread David Sterba
On Thu, Jul 12, 2018 at 04:11:18PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> These don't build anymore and don't appear to be used for anything.

I've fixed build of quick-test and will keep the code for now, it can be
used for some stress testing. The dir-test does not seem to be even
potentially useful in the fucute, so it's dropped now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] btrfs: do device clone using the btrfs_scan_one_device

2018-07-16 Thread Anand Jain
When we add a device to the RO mounted seed device, it becomes a
RW sprout FS. The following steps are used to hold the seed and
sprout fs_devices.
 (first two steps are not mandatory for the sprouting, they are there
  to ensure the seed device remains in the scanned state)
  . Clone the (mounted) fs_devices, lets call it as old_devices
  . Now add old_devices to fs_uuids (yeah, there is duplicate fsid in the
list, as we are under uuid_mutex so its fine).

  . Alloc a new fs_devices, lets call it as seed_devices
  . Copy fs_devices into the seed_devices
  . Move fs_devices::devices into seed_devices::devices
  . Bring seed_devices to under fs_devices::seed
(fs_devices->seed = seed_devices)
  . Assign a new FSID to the fs_devices and add the new writable device
to the fs_devices.

This patch makes the following changes..
As we clone fs_devices to make sure the device remains scanned after the
sprouting. So use the btrfs_scan_one_device() code instead. And do it
at the end of the sprouting.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8450bcfed4cb..c6f3f0dfbabe 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2179,7 +2179,7 @@ int btrfs_find_device_by_devspec(struct btrfs_fs_info 
*fs_info, u64 devid,
 static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 {
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
-   struct btrfs_fs_devices *old_devices;
+   struct btrfs_fs_devices *old_fs_devices;
struct btrfs_fs_devices *seed_devices;
struct btrfs_super_block *disk_super = fs_info->super_copy;
struct btrfs_device *device;
@@ -2193,14 +2193,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info 
*fs_info)
if (IS_ERR(seed_devices))
return PTR_ERR(seed_devices);
 
-   old_devices = clone_fs_devices(fs_devices);
-   if (IS_ERR(old_devices)) {
-   kfree(seed_devices);
-   return PTR_ERR(old_devices);
-   }
-
-   list_add(_devices->fs_list, _uuids);
-
memcpy(seed_devices, fs_devices, sizeof(*seed_devices));
seed_devices->opened = 1;
INIT_LIST_HEAD(_devices->devices);
@@ -2233,6 +2225,17 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info 
*fs_info)
  ~BTRFS_SUPER_FLAG_SEEDING;
btrfs_set_super_flags(disk_super, super_flags);
 
+   /*
+* As the above code hijacked the original seed fs_devices, now
+* create a new one for the original seed FSID.
+*/
+   list_for_each_entry(device, _devices->seed->devices, dev_list) {
+   if (!device->name)
+   continue;
+   btrfs_scan_one_device(device->name->str, FMODE_READ,
+ fs_info->bdev_holder, _fs_devices);
+   }
+
return 0;
 }
 
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/7] btrfs: drop uuid_mutex in btrfs_free_extra_devids()

2018-07-16 Thread Anand Jain
btrfs_free_extra_devids() is called only in the mount context which
traverses through the fs_devices::devices and frees the orphan devices
devices in the given %fs_devices if any. As the search for the orphan
device is limited to fs_devices::devices so we don't need the global
uuid_mutex.

There can't be any mount-point based ioctl threads in this context as
the mount thread is not yet returned. But there can be the btrfs-control
based scan ioctls thread which calls device_list_add().

Here in the mount thread the fs_devices::opened is incremented way before
btrfs_free_extra_devids() is called and in the scan context the fs_devices
which are already opened neither be freed or alloc-able at
device_list_add().

But lets say you change the device-path and call the scan again, then scan
would update the new device path and this operation could race against the
btrfs_free_extra_devids() thread, which might be in the process of
free-ing the same device. So synchronize it by using the
device_list_mutex.

This scenario is a very corner case, and practically the scan and mount
are anyway serialized by the usage so unless the race is instrumented its
very difficult to achieve.

Signed-off-by: Anand Jain 
Reviewed-by: David Sterba 
---
v3->v4: As we traverse through the seed device, fs_device gets updated with
the child seed fs_devices, so make sure we use the same fs_devices
pointer for the mutex_unlock as used for the mutex_lock.
v2->v3: Update change log.
(Currently device_list_add() is very lean on its device_list_mutex 
usage,
a cleanup and fix is wip. Given the practicality of the above race
condition this patch is good to merge).
v1->v2: replace uuid_mutex with device_list_mutex instead of delete.
change log updated.

 fs/btrfs/volumes.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 88d37bfa99c8..870c9f69a6a4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -934,8 +934,9 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices 
*fs_devices, int step)
 {
struct btrfs_device *device, *next;
struct btrfs_device *latest_dev = NULL;
+   struct btrfs_fs_devices *parent_fs_devices = fs_devices;
 
-   mutex_lock(_mutex);
+   mutex_lock(_fs_devices->device_list_mutex);
 again:
/* This is the initialized path, it is safe to release the devices. */
list_for_each_entry_safe(device, next, _devices->devices, dev_list) {
@@ -989,8 +990,7 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices 
*fs_devices, int step)
}
 
fs_devices->latest_bdev = latest_dev->bdev;
-
-   mutex_unlock(_mutex);
+   mutex_unlock(_fs_devices->device_list_mutex);
 }
 
 static void free_device_rcu(struct rcu_head *head)
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/7] btrfs: fix race between free_stale_devices and close_fs_devices

2018-07-16 Thread Anand Jain
From: Anand Jain 

%fs_devices can be free-ed by btrfs_free_stale_devices() when the
close_fs_devices() drops fs_devices::opened to zero, but close_fs_devices
tries to access the %fs_devices again without the device_list_mutex.

Fix this by bringing the %fs_devices access with in the device_list_mutex.

Stack trace as below.

WARNING: CPU: 1 PID: 4499 at fs/btrfs/volumes.c:1071 
close_fs_devices+0xbc7/0xfa0 fs/btrfs/volumes.c:1071
Kernel panic - not syncing: panic_on_warn set ...
::
RIP: 0010:close_fs_devices+0xbc7/0xfa0 fs/btrfs/volumes.c:1071
::
 btrfs_close_devices+0x29/0x150 fs/btrfs/volumes.c:1085
 open_ctree+0x589/0x7898 fs/btrfs/disk-io.c:3358
 btrfs_fill_super fs/btrfs/super.c:1202 [inline]
 btrfs_mount_root+0x16df/0x1e70 fs/btrfs/super.c:1593
 mount_fs+0xae/0x328 fs/super.c:1277
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount+0x40/0x60 fs/namespace.c:1027
 btrfs_mount+0x4a1/0x213e fs/btrfs/super.c:1661
 mount_fs+0xae/0x328 fs/super.c:1277

Reported-by: syzbot+ceb2606025ec1cc34...@syzkaller.appspotmail.com
Signed-off-by: Anand Jain 
---
v1->v2: Commit log update. Satisfy checkpatch.pl. Remove HEAD..

 fs/btrfs/volumes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 870c9f69a6a4..8450bcfed4cb 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1062,13 +1062,13 @@ static int close_fs_devices(struct btrfs_fs_devices 
*fs_devices)
list_for_each_entry_safe(device, tmp, _devices->devices, dev_list) {
btrfs_close_one_device(device);
}
-   mutex_unlock(_devices->device_list_mutex);
-
WARN_ON(fs_devices->open_devices);
WARN_ON(fs_devices->rw_devices);
fs_devices->opened = 0;
fs_devices->seeding = 0;
 
+   mutex_unlock(_devices->device_list_mutex);
+
return 0;
 }
 
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] Misc volume patch set part2

2018-07-16 Thread Anand Jain
As the last set if the pull wasn't integrated due to its pending
review corrections. Here, I have done them (not much except for adding
comments and function renames) and so sending it again. Also
this set clubs other independent patches which are in the mailing list.
I am explaining the changes of the individual patches in its change list.
And most of it still remains at v1.

This can also be pulled from:

  g...@github.com:asj/btrfs-devel.git misc-next-for-kdave-16jul

Anand Jain (7):
  btrfs: drop uuid_mutex in btrfs_free_extra_devids()
  btrfs: fix race between free_stale_devices and close_fs_devices
  btrfs: do device clone using the btrfs_scan_one_device
  btrfs: use the assigned fs_devices instead of the dereference
  btrfs: warn for num_devices below 0
  btrfs: add helper btrfs_num_devices() to deduce num_devices
  btrfs: add helper function check device delete able

 fs/btrfs/volumes.c | 106 +++--
 1 file changed, 62 insertions(+), 44 deletions(-)

-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] btrfs: use the assigned fs_devices instead of the dereference

2018-07-16 Thread Anand Jain
We have assigned the %fs_info->fs_devices in %fs_devices as its not
modified just use it for the mutex_lock().

Signed-off-by: Anand Jain 
---
I don't see this in the ML at all, I remember sending this. Anyway
I am marking this as V1.

 fs/btrfs/volumes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index c6f3f0dfbabe..7f4973fc2b52 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2199,7 +2199,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info 
*fs_info)
INIT_LIST_HEAD(_devices->alloc_list);
mutex_init(_devices->device_list_mutex);
 
-   mutex_lock(_info->fs_devices->device_list_mutex);
+   mutex_lock(_devices->device_list_mutex);
list_splice_init_rcu(_devices->devices, _devices->devices,
  synchronize_rcu);
list_for_each_entry(device, _devices->devices, dev_list)
@@ -2219,7 +2219,7 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info 
*fs_info)
generate_random_uuid(fs_devices->fsid);
memcpy(fs_info->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
-   mutex_unlock(_info->fs_devices->device_list_mutex);
+   mutex_unlock(_devices->device_list_mutex);
 
super_flags = btrfs_super_flags(disk_super) &
  ~BTRFS_SUPER_FLAG_SEEDING;
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] btrfs: warn for num_devices below 0

2018-07-16 Thread Anand Jain
In preparation to de-duplicate a section of code where we deduce the
num_devices, use warn instead of bug.

Signed-off-by: Anand Jain 
---
 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7f4973fc2b52..0f4c512aa6b4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -3726,7 +3726,7 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
num_devices = fs_info->fs_devices->num_devices;
btrfs_dev_replace_read_lock(_info->dev_replace);
if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   BUG_ON(num_devices < 1);
+   WARN_ON(num_devices < 1);
num_devices--;
}
btrfs_dev_replace_read_unlock(_info->dev_replace);
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] btrfs: add helper btrfs_num_devices() to deduce num_devices

2018-07-16 Thread Anand Jain
When the replace is running the fs_devices::num_devices also includes
the replace device, however in some operations like device delete and
balance it needs the actual num_devices without the repalce devices, so
now the function btrfs_num_devices() just provides that.

Signed-off-by: Anand Jain 
---
v2: add comments. Thanks Nikolay.

 fs/btrfs/volumes.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 0f4c512aa6b4..1c0b56374992 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1844,6 +1844,21 @@ void btrfs_assign_next_active_device(struct 
btrfs_fs_info *fs_info,
fs_info->fs_devices->latest_bdev = next_device->bdev;
 }
 
+/* Returns btrfs_fs_devices::num_devices excluding replace device if any */
+static inline u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
+{
+   u64 num_devices = fs_info->fs_devices->num_devices;
+
+   btrfs_dev_replace_read_lock(_info->dev_replace);
+   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
+   WARN_ON(num_devices < 1);
+   num_devices--;
+   }
+   btrfs_dev_replace_read_unlock(_info->dev_replace);
+
+   return num_devices;
+}
+
 int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
u64 devid)
 {
@@ -1857,13 +1872,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const 
char *device_path,
 
mutex_lock(_mutex);
 
-   num_devices = fs_devices->num_devices;
-   btrfs_dev_replace_read_lock(_info->dev_replace);
-   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   WARN_ON(num_devices < 1);
-   num_devices--;
-   }
-   btrfs_dev_replace_read_unlock(_info->dev_replace);
+   num_devices = btrfs_num_devices(fs_info);
 
ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
if (ret)
@@ -3723,13 +3732,8 @@ int btrfs_balance(struct btrfs_fs_info *fs_info,
}
}
 
-   num_devices = fs_info->fs_devices->num_devices;
-   btrfs_dev_replace_read_lock(_info->dev_replace);
-   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   WARN_ON(num_devices < 1);
-   num_devices--;
-   }
-   btrfs_dev_replace_read_unlock(_info->dev_replace);
+   num_devices = btrfs_num_devices(fs_info);
+
allowed = BTRFS_AVAIL_ALLOC_BIT_SINGLE | BTRFS_BLOCK_GROUP_DUP;
if (num_devices > 1)
allowed |= (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1);
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] btrfs: add helper function check device delete able

2018-07-16 Thread Anand Jain
Move the section of the code which performs the check if the device is
indelible, move that into a helper function.

Signed-off-by: Anand Jain 
---
v1->v2: Rename function to btrfs_get_device_for_delete(), thanks
Nikolay.

 fs/btrfs/volumes.c | 49 ++---
 1 file changed, 30 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1c0b56374992..0cefc24b028c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1859,6 +1859,33 @@ static inline u64 btrfs_num_devices(struct btrfs_fs_info 
*fs_info)
return num_devices;
 }
 
+static struct btrfs_device *btrfs_get_device_for_delete(
+   struct btrfs_fs_info *fs_info,
+   const char *device_path, u64 devid)
+{
+   int ret;
+   struct btrfs_device *device;
+
+   ret = btrfs_check_raid_min_devices(fs_info,
+  btrfs_num_devices(fs_info) - 1);
+   if (ret)
+   return ERR_PTR(ret);
+
+   ret = btrfs_find_device_by_devspec(fs_info, devid, device_path,
+  );
+   if (ret)
+   return ERR_PTR(ret);
+
+   if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, >dev_state))
+   return ERR_PTR(BTRFS_ERROR_DEV_TGT_REPLACE);
+
+   if (test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state) &&
+   fs_info->fs_devices->rw_devices == 1)
+   return ERR_PTR(BTRFS_ERROR_DEV_ONLY_WRITABLE);
+
+   return device;
+}
+
 int btrfs_rm_device(struct btrfs_fs_info *fs_info, const char *device_path,
u64 devid)
 {
@@ -1872,25 +1899,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info, const 
char *device_path,
 
mutex_lock(_mutex);
 
-   num_devices = btrfs_num_devices(fs_info);
-
-   ret = btrfs_check_raid_min_devices(fs_info, num_devices - 1);
-   if (ret)
-   goto out;
-
-   ret = btrfs_find_device_by_devspec(fs_info, devid, device_path,
-  );
-   if (ret)
-   goto out;
-
-   if (test_bit(BTRFS_DEV_STATE_REPLACE_TGT, >dev_state)) {
-   ret = BTRFS_ERROR_DEV_TGT_REPLACE;
-   goto out;
-   }
-
-   if (test_bit(BTRFS_DEV_STATE_WRITEABLE, >dev_state) &&
-   fs_info->fs_devices->rw_devices == 1) {
-   ret = BTRFS_ERROR_DEV_ONLY_WRITABLE;
+   device = btrfs_get_device_for_delete(fs_info, device_path, devid);
+   if (IS_ERR(device)) {
+   ret = PTR_ERR(device);
goto out;
}
 
-- 
2.7.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND 2/2] btrfs-progs: make all programs and libraries optional

2018-07-16 Thread David Sterba
On Thu, Jul 12, 2018 at 04:11:19PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> We have a build system internally which only needs to build the
> libraries out of a repository, not any binaries. I looked at how this
> works with other projects, and the best example was util-linux, which
> makes it possible to enable or disable everything individually. This is
> nice and really flexible, so let's do the same. This way, if you only
> want to build and install libbtrfsutil, you can simply do
> 
>   ./configure --disable-documentation --disable-all-programs 
> --enable-libbtrfsutil
>   make
>   make install

I think this is an overkill and abusing the --enable-XXX options.  You
want to avoid building the tools by default, so adding an option for
that is fine. Selectively building only certain tools can utilize that
option too and just follow with 'make btrfs-image' etc.

The number of --enable-* will stay minimal and we don't even have to
discuss how to find a good naming scheme (that works for util-linux but
looks a bit confusing for btrfs-progs).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: rename btrfs_parse_early_options

2018-07-16 Thread Anand Jain




On 07/16/2018 10:18 PM, Anand Jain wrote:

Rename btrfs_parse_early_options() to btrfs_parse_device_options(). As
btrfs_parse_early_options() parses the -o device options and scan the
device provided. So this rename specifies its action. Also the function
name is inline with btrfs_parse_subvol_options().
No functional changes.

Signed-off-by: Anand Jain 
---


Forgot to mention: This should be applied on top of
  Gu Jinxiang (2):
  btrfs: make fs_devices to be a local variable
  btrfs: get device pointer from btrfs_scan_one_device

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/15] Add delayed-refs support to btrfs-progs

2018-07-16 Thread David Sterba
On Fri, Jun 08, 2018 at 03:47:43PM +0300, Nikolay Borisov wrote:
> Hello,
>   
>   
>   
> Here is a series which adds support for delayed refs. This is needed to 
> enable  
> later work on adding freespace tree repair code. Additionally, it results in  
> more code sharing between kernel/user space.
> 
> Patches 1-9 are simple prep patches removing some arguments, causing problems
> later. They can go independently of the delayed refs work. They don't 
> introduce
> any functional changes. Next, patches 10-13 introduce the needed 
> infrastructure
> to for delayed refs without actually activating it. Patch 14 finally wires it
> up by adding the necessary call outs to btrfs_run_delayed refs and reworking 
> the
> extent addition/freeing functions. With all of this done, patch 15 finally
> removes the old code.
> 
> This series passes all btrfs progs fsck and misc tests + fuzz tests apart from
> fuzz-003/007/009 - but those fail without this series so it's unlikely it's
> caused by it.
> 
> Nikolay Borisov (15):
>   btrfs-progs: Remove root argument from pin_down_bytes
>   btrfs-progs: Remove root argument from btrfs_del_csums
>   btrfs-progs: Add functions to modify the used space by a root
>   btrfs-progs: Refactor the root used bytes are updated
>   btrfs-progs: Make update_block_group take fs_info instead of root
>   btrfs-progs: check: Drop trans/root arguments from free_extent_hook
>   btrfs-progs: Remove root argument from __free_extent
>   btrfs-progs: Remove root argument from alloc_reserved_tree_block
>   btrfs-progs: Always pass 0 for offset when calling btrfs_free_extent
> for btree blocks.
>   btrfs-progs: Add boolean to signal whether we are re-initing extent
> tree
>   btrfs-progs: Add delayed refs infrastructure
>   btrfs-progs: Add __free_extent2 function
>   btrfs-progs: Add alloc_reserved_tree_block2 function
>   btrfs-progs: Wire up delayed refs
>   btrfs-progs: Remove old delayed refs infrastructure

Added to devel. There were some patch-to-patch compilation issues,
function alloc_reserved_tree_block2 used earlier than defined so I
reordered the patches to fix that.

The CI fails at test check/020-extent-ref-cases but it works on my
machine so it's not caused by the patchset.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] Misc volume patch set part2

2018-07-16 Thread Anand Jain




On 07/16/2018 10:58 PM, Anand Jain wrote:

As the last set if the pull wasn't integrated due to its pending
review corrections. Here, I have done them (not much except for adding
comments and function renames) and so sending it again. Also
this set clubs other independent patches which are in the mailing list.
I am explaining the changes of the individual patches in its change list.
And most of it still remains at v1.

This can also be pulled from:

   g...@github.com:asj/btrfs-devel.git misc-next-for-kdave-16jul


Sorry please pull from use the branch 'misc-next-for-kdave-17jul' 
instead. I notice that there were some changes since my last

fast-forward. And also here, I have added the following patch
which is non function change patch.
   3696c3aaf81b btrfs: rename btrfs_parse_early_options

Thanks, Anand


Anand Jain (7):
   btrfs: drop uuid_mutex in btrfs_free_extra_devids()
   btrfs: fix race between free_stale_devices and close_fs_devices
   btrfs: do device clone using the btrfs_scan_one_device
   btrfs: use the assigned fs_devices instead of the dereference
   btrfs: warn for num_devices below 0
   btrfs: add helper btrfs_num_devices() to deduce num_devices
   btrfs: add helper function check device delete able

  fs/btrfs/volumes.c | 106 +++--
  1 file changed, 62 insertions(+), 44 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: Exit gracefully when overlap chunks are detected

2018-07-16 Thread David Sterba
On Mon, Jul 09, 2018 at 02:50:53PM +0800, Qu Wenruo wrote:
> BUG_ON() can be triggered if some image contains overlap chunks.
> volumes.c:1930: read_one_chunk: BUG_ON `ret` triggered, value -17
> btrfs(+0x2cf12)[0x5601efa17f12]
> btrfs(+0x2fd8b)[0x5601efa1ad8b]
> btrfs(btrfs_read_chunk_tree+0x2bf)[0x5601efa1b30f]
> btrfs(btrfs_setup_chunk_tree_and_device_map+0xe8)[0x5601efa07718]
> btrfs(+0x1c944)[0x5601efa07944]
> btrfs(open_ctree_fs_info+0x90)[0x5601efa07b90]
> btrfs(cmd_check+0x4d7)[0x5601efa4f8c7]
> btrfs(main+0x88)[0x5601ef9fd768]
> /usr/lib/libc.so.6(__libc_start_main+0xeb)[0x7f3c7787306b]
> btrfs(_start+0x2a)[0x5601ef9fd88a]
> 
> Extent cache code can already detect it without problem, we only need to
> remove the BUG_ON() and exit gracefully.
> 
> Reported-by: Xu Wen 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200409
> Signed-off-by: Qu Wenruo 

1 and 2 applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: inspect-internal: add option '-k u64,u8,u64' of dump-tree

2018-07-16 Thread David Sterba
On Mon, Jul 16, 2018 at 05:13:25PM +0800, Su Yue wrote:
> For big filesystems, there are many items in trees(extent tree
> specially).
> For example, to dump one extent, we usually dump extent tree then pipe
> result to grep. The time-consuming part is that dump tree traverses
> items. And it eats cpu and memory too.
> 
> This patch introduces an option '-k u64,u8,u64' for users(developers more
> likely) to appoint a specific key to search and dump, then the path
> from root node down the leaf will be dumped.
> The search of the key costs most time of this way.
> 
> Signed-off-by: Su Yue 
> ---
> Change log:
> v2:
>   Rename btrfs_search_spec_key() to btrfs_search_print_path().
>   Rename is_key_spec to is_spec_key_set.
>   Move btrfs_search_print_path() after open_ctree(), call it only once.
>   Remove duplicate code and replaced with btrfs_print_tree().
>   Modify the usage about '-k'.
>   All are suggested by Qu Wenruo.
>   
>  cmds-inspect-dump-tree.c | 41 -
>  print-tree.c | 77 
>  print-tree.h |  2 ++
>  3 files changed, 119 insertions(+), 1 deletion(-)
> 
> diff --git a/cmds-inspect-dump-tree.c b/cmds-inspect-dump-tree.c
> index c8acd55a0c3a..80df826af0fd 100644
> --- a/cmds-inspect-dump-tree.c
> +++ b/cmds-inspect-dump-tree.c
> @@ -184,6 +184,21 @@ static u64 treeid_from_string(const char *str, const 
> char **end)
>   return id;
>  }
>  
> +static int parse_key(struct btrfs_key *key)
> +{
> +

Extra newline.

> + int ret = sscanf(optarg, "%llu,%hhu,%llu", >objectid,
> +  >type, >offset);

Please don't do complex initializations in the declaration block.

And use of the global variable 'optarg' is not good, pass the scanned
string as a parameter.

> + if (ret != 3) {
> + error("error parsing key '%s'\n", optarg);

no "\n" at the end of the string in error(...)

> + ret = -EINVAL;
> + } else {
> + ret = 0;
> + }
> +
> + return ret;
> +}
> +
>  const char * const cmd_inspect_dump_tree_usage[] = {
>   "btrfs inspect-internal dump-tree [options] device",
>   "Dump tree structures from a given device",
> @@ -199,6 +214,7 @@ const char * const cmd_inspect_dump_tree_usage[] = {
>   "-u|--uuid  print only the uuid tree",
>   "-b|--block  print info from the specified block only",
>   "-t|--tree print only tree with the given id (string or 
> number)",
> + "-k|--key   search the specific key then print path, must 
> with -t(conflicts with others)",

As there are more conflicting options, it would be better to group them
and just put a note about that. The message as you write it does not
provide clear usage instructions.

>   "--follow   use with -b, to show all children tree blocks 
> of ",
>   NULL
>  };
> @@ -224,8 +240,10 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   unsigned open_ctree_flags;
>   u64 block_only = 0;
>   struct btrfs_root *tree_root_scan;
> + struct btrfs_key spec_key = { 0 };
>   u64 tree_id = 0;
>   bool follow = false;
> + bool is_spec_key_set = false;
>  
>   /*
>* For debug-tree, we care nothing about extent tree (it's just backref
> @@ -248,10 +266,11 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   { "block", required_argument, NULL, 'b'},
>   { "tree", required_argument, NULL, 't'},
>   { "follow", no_argument, NULL, GETOPT_VAL_FOLLOW },
> + { "key", required_argument, NULL, 'k'},
>   { NULL, 0, NULL, 0 }
>   };
>  
> - c = getopt_long(argc, argv, "deb:rRut:", long_options, NULL);
> + c = getopt_long(argc, argv, "deb:rRut:k:", long_options, NULL);
>   if (c < 0)
>   break;
>   switch (c) {
> @@ -300,6 +319,12 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   }
>   break;
>   }
> + case 'k':
> + ret = parse_key(_key);
> + if (ret)
> + exit(1);
> + is_spec_key_set = true;

What if the key is already set?

> + break;
>   case GETOPT_VAL_FOLLOW:
>   follow = true;
>   break;
> @@ -308,6 +333,9 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   }
>   }
>  
> + if (!tree_id && is_spec_key_set)
> + usage(cmd_inspect_dump_tree_usage);
> +
>   if (check_argc_exact(argc - optind, 1))
>   usage(cmd_inspect_dump_tree_usage);
>  
> @@ -325,6 +353,17 @@ int cmd_inspect_dump_tree(int argc, char **argv)
>   goto out;
>   }
>  
> + if (is_spec_key_set) {
> + root = info->tree_root;
> + if (IS_ERR(root) || !root) {

Re: [PATCH 2/2] btrfs-progs: check: enhanced progress indicator

2018-07-16 Thread Misono Tomohiro
On 2018/07/05 4:20, Stéphane Lesimple wrote:
> We reuse the task_position enum and task_ctx struct of the original progress
> indicator, adding more values and fields for our needs.
> 
> Then add hooks in all steps of the check to properly record progress.
> 
> Signed-off-by: Stéphane Lesimple 
> ---
>  check/main.c| 176 
> ++--
>  check/mode-common.h |  20 ++
>  check/mode-lowmem.c |   1 +
>  convert/main.c  |   2 +-
>  qgroup-verify.c |   7 +++
>  qgroup-verify.h |   2 +
>  task-utils.c|   8 ++-
>  task-utils.h|   3 +-
>  8 files changed, 154 insertions(+), 65 deletions(-)
> 
> diff --git a/check/main.c b/check/main.c
> index 3190b5d..bb3ebea 100644
> --- a/check/main.c
> +++ b/check/main.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "ctree.h"
>  #include "volumes.h"
>  #include "repair.h"
> @@ -47,20 +48,6 @@
>  #include "check/mode-original.h"
>  #include "check/mode-lowmem.h"
>  
> -enum task_position {
> - TASK_EXTENTS,
> - TASK_FREE_SPACE,
> - TASK_FS_ROOTS,
> - TASK_NOTHING, /* have to be the last element */
> -};
> -
> -struct task_ctx {
> - int progress_enabled;
> - enum task_position tp;
> -
> - struct task_info *info;
> -};
> -
>  u64 bytes_used = 0;
>  u64 total_csum_bytes = 0;
>  u64 total_btree_bytes = 0;
> @@ -72,6 +59,7 @@ u64 data_bytes_referenced = 0;
>  LIST_HEAD(duplicate_extents);
>  LIST_HEAD(delete_items);
>  int no_holes = 0;
> +static int is_free_space_tree = 0;
>  int init_extent_tree = 0;
>  int check_data_csum = 0;
>  struct btrfs_fs_info *global_info;
> @@ -173,28 +161,48 @@ static int compare_extent_backref(struct rb_node 
> *node1, struct rb_node *node2)
>   return compare_tree_backref(node1, node2);
>  }
>  
> +static void print_status_check_line(void *p)
> +{
> + struct task_ctx *priv = p;
> + char *task_position_string[] = {
> + "[1/7] checking root items ",
> + "[2/7] checking extents",
> + is_free_space_tree ?
> + "[3/7] checking free space tree":
> + "[3/7] checking free space cache   ",
> + "[4/7] checking fs roots   ",
> + check_data_csum ?
> + "[5/7] checking csums against data ":
> + "[5/7] checking csums (without verifying data) ",
> + "[6/7] checking root refs  ",
> + "[7/7] checking quota groups   ",
> + };
> +
> + time_t elapsed = time(NULL) - priv->start_time;
> + int hours   = elapsed / 3600;
> + elapsed-= hours   * 3600;
> + int minutes = elapsed / 60;
> + elapsed-= minutes * 60;
> + int seconds = elapsed;
> + printf("%s (%d:%02d:%02d elapsed", task_position_string[priv->tp], 
> hours, minutes, seconds);
> + if (priv->item_count > 0)
> + printf(", %llu items checked)\r", priv->item_count);
> + else
> + printf(")\r");
> + fflush(stdout);
> +}
>  
>  static void *print_status_check(void *p)
>  {
>   struct task_ctx *priv = p;
> - const char work_indicator[] = { '.', 'o', 'O', 'o' };
> - uint32_t count = 0;
> - static char *task_position_string[] = {
> - "checking extents",
> - "checking free space cache",
> - "checking fs roots",
> - };
>  
> - task_period_start(priv->info, 1000 /* 1s */);
> + task_period_start(priv->info, 50);
>  
>   if (priv->tp == TASK_NOTHING)
>   return NULL;
>  
>   while (1) {
> - printf("%s [%c]\r", task_position_string[priv->tp],
> - work_indicator[count % 4]);
> - count++;
> - fflush(stdout);
> + print_status_check_line(p);
>   task_period_wait(priv->info);
>   }
>   return NULL;
> @@ -202,6 +210,7 @@ static void *print_status_check(void *p)
>  
>  static int print_status_return(void *p)
>  {
> + print_status_check_line(p);
>   printf("\n");
>   fflush(stdout);
>  
> @@ -2942,6 +2951,7 @@ static int check_root_refs(struct btrfs_root *root,
>   loop = 0;
>   cache = search_cache_extent(root_cache, 0);
>   while (1) {
> + ctx.item_count++;
>   if (!cache)
>   break;
>   rec = container_of(cache, struct root_record, cache);
> @@ -3263,6 +3273,7 @@ static int check_fs_root(struct btrfs_root *root,
>   }
>  
>   while (1) {
> + ctx.item_count++;
>   wret = walk_down_tree(root, , wc, , );
>   if (wret < 0)
>   ret = wret;
> @@ -3340,11 +3351,6 @@ static int check_fs_roots(struct btrfs_fs_info 
> *fs_info,
>   int