Re: [PATCH 3/3] btrfs-progs: Add command to check if balance op is req

2016-10-28 Thread Graham Cobb
On 28/10/16 16:20, David Sterba wrote:
> I tend to agree with this approach. The usecase, with some random sample
> balance options:
> 
>  $ btrfs balance start --analyze -dusage=10 -musage=5 /path

Wouldn't a "balance analyze" command be better than "balance start
--analyze"? I would have guessed the latter started the balance but
printed some analysis as well (before or, probably more usefully,
afterwards).

There might, of course, be some point in a (future)

$ btrfs balance start --if-needed -dusage=10 -musage=5 /path

command.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: repair: Trickle down EEXISTS while freeing

2016-10-28 Thread David Sterba
On Fri, Oct 28, 2016 at 10:02:06AM -0500, Goldwyn Rodrigues wrote:
> 
> 
> On 10/28/2016 09:42 AM, David Sterba wrote:
> > On Thu, Oct 27, 2016 at 01:05:08PM -0500, Goldwyn Rodrigues wrote:
> >> While deleting pending extents, we insert extents to the
> >> extent_tree. However, because of corruption, this might already
> >> be freed. We trickle the EEXISTS error and let the caller
> >> decide what needs to be done with it. In this case, ignore, so
> >> the repair may proceed.
> >>
> >> The primary motivation of this is to resolve this BUG_ON:
> >> extent-tree.c:2731: alloc_reserved_tree_block: Assertion `ret` failed, 
> >> value 0
> >>
> >> #2  assert_trace (val=0, line=2728, func=0x489510 <__func__.11558> 
> >> "alloc_reserved_tree_block", 
> >>filename=0x489065 "extent-tree.c", assertion=0x48cfb5 "ret") at 
> >> kerncompat.h:102
> >> #3  alloc_reserved_tree_block (trans=trans@entry=0x7e7240, 
> >> root=root@entry=0x6a88a0, root_objectid=2, 
> >>generation=175654, flags=0, key=key@entry=0x2bc5c70, level=0, 
> >> ins=ins@entry=0x7fffcf30) at extent-tree.c:2728
> >> #4  finish_current_insert (trans=trans@entry=0x7e7240, 
> >> extent_root=extent_root@entry=0x6a88a0)
> >> at extent-tree.c:2108
> >> #5  __free_extent (trans=trans@entry=0x7e7240, root=root@entry=0x6a88a0, 
> >> bytenr=57127387136, 
> >>num_bytes=, parent=parent@entry=0, 
> >> root_objectid=, owner_objectid=0, 
> >>owner_offset=owner_offset@entry=0, refs_to_drop=refs_to_drop@entry=1) 
> >> at extent-tree.c:2410
> >> #6  del_pending_extents (trans=trans@entry=0x7e7240, extent_root=0x6a88a0) 
> >> at extent-tree.c:2448
> > 
> > Do you have a reproducer for that?
> 
> I have a corrupt btrfs-image, not a test case though. Should I share
> that? If yes, where should I upload it?

The image should not be large (compressed), if it's about to be stored
in git. Minimizing can take time, I haven't analyzed the trace if
there's an easier way to trigger it (eg corrupting a specific key).
> >> Signed-off-by: Goldwyn Rodrigues 
> > 
> > Applied, thanks.
> > 
> 
> Ahh, just realized. Spell check s/EEXISTS/EEXIST/

Fixed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] btrfs-progs: Add a command to show bg info

2016-10-28 Thread David Sterba
On Mon, Oct 17, 2016 at 05:35:14PM -0700, Divya Indi wrote:
> Add a new subcommand to btrfs inspect-internal
> 
> btrfs inspect-internal bg_analysis 
> Gives information about all the block groups.

The sample output from the cover letter should also go here (or just
here).

Below are some comments, but overall, I think the block group dumping
should be more advanced than just what this patch does. It's a good
start, but given the rich structure of the blockgroups and chunks, I'd
rather see the basics done right from the beginning.

The blockgroups and chunks can be viewed in a logical or physical way.
I've sent a patch some time agou, that dumps the physical structure, but
got reminded that the balance analysis is more interesting on the
logical level.

Ideally I'd like to keep both ways to show the information. The physical
way is easier, just iterate the device extents and print start, lenght
etc. The logical is more tricky as it's a tree structure, when one
logical chunk could comprised of several physical chunks. And this must
reflect all current raid profiles, where we're mixing mirorring and
striping, and chunks of special kind (parity).

Now, I'm not asking you to implement all of that, but I want to make
sure that code that touches the area of interest does not block further
extensions. I understand that you want to add the ability to optimize
the balance.

> Signed-off-by: Divya Indi 
> Reviewed-by: Ashish Samant 
> Reviewed-by: Liu Bo 
> ---
>  cmds-inspect.c |  114 
> 
>  1 files changed, 114 insertions(+), 0 deletions(-)
> 
> diff --git a/cmds-inspect.c b/cmds-inspect.c
> index f435ea9..0e2f15a 100644
> --- a/cmds-inspect.c
> +++ b/cmds-inspect.c
> @@ -147,6 +147,118 @@ static int get_bg_info(int fd, struct 
> btrfs_ioctl_search_args *bg_args,
>   }
>   return ret;
>  }
> +
> +static const char * const cmd_inspect_bg_analysis_usage[] = {
> + "btrfs inspect-internal bg_analysis  ",
> + "Get information about all block groups",
> + "",
> + "",
> + NULL
> +};
> +
> +static int cmd_inspect_bg_analysis(int argc, char **argv)
> +{
> + struct btrfs_ioctl_search_args args;
> + struct btrfs_ioctl_search_args bg_args;
> + struct btrfs_ioctl_search_header *header;
> + struct btrfs_ioctl_search_header *bg_header;
> + struct btrfs_ioctl_search_key *sk;
> + struct btrfs_ioctl_search_key *bg_sk;
> + struct btrfs_block_group_item *bg;
> + struct btrfs_chunk *chunk;
> + unsigned long off = 0;
> + unsigned long bg_off = 0;
> + DIR *dirstream = NULL;
> + int fd;
> + int i;
> + int ret = 0;
> + u64 used;
> + u64 flags;
> + char bg_type[32] = {0};
> +
> + if (check_argc_exact(argc, 2))
> + usage(cmd_inspect_bg_analysis_usage);
> +
> + fd = btrfs_open_dir(argv[optind], , 1);
> + if (fd < 0)
> + return 1;
> +
> + memset(, 0, sizeof(args));
> + sk = 
> + sk->min_offset = sk->min_transid = 0;
> + sk->max_offset = sk->max_transid = (u64)-1;
> + printf("%20s%20s%20s%20s\n", "Type", "Start", "Len", "Used");
> + while (1) {
> +
> + /* Walk through the chunk tree and retrieve all the chunks */
> + ret = get_chunks(fd, );
> + if (ret < 0)
> + goto out;
> +
> + /*
> +  * it should not happen.
> +  */
> + if (sk->nr_items == 0)
> + break;

So is this an error condition? If yes, then it should be handled.

> +
> + off = 0;
> + memset(_args, 0, sizeof(bg_args));
> + bg_sk = _args.key;
> +
> + bg_sk->tree_id = BTRFS_EXTENT_TREE_OBJECTID;
> + bg_sk->min_type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> + bg_sk->max_type = BTRFS_BLOCK_GROUP_ITEM_KEY;
> + bg_sk->min_transid =  0;
> + bg_sk->max_transid = (u64)-1;
> +
> + for (i = 0; i < sk->nr_items; i++) {
> + header = (struct btrfs_ioctl_search_header *)(args.buf
> +   + off);
> + off += sizeof(*header);
> + if (header->type == BTRFS_CHUNK_ITEM_KEY) {
> + chunk = (struct btrfs_chunk *)
> + (args.buf + off);
> +
> + /* For every chunk lookup an exact match(bg) in
> +  * the extent tree and read its used values */

Comment formatting

> + ret = get_bg_info(fd, _args, header->offset,
> +   chunk->length);
> + if (ret < 0)
> + goto out;
> +
> + /*
> +  * it 

Re: [PATCH 1/3] btrfs-progs: Generic functions to retrieve chunks and their bg info

2016-10-28 Thread David Sterba
On Mon, Oct 17, 2016 at 05:35:13PM -0700, Divya Indi wrote:
> An efficient alternative to retrieving block groups:
> get_chunks(): Walk the chunk tree to retrieve the chunks.
> get_bg_info(): For each retrieved chunk, lookup an exact match of block
> group in the extent tree.
> 
> Signed-off-by: Divya Indi 
> Reviewed-by: Ashish Samant 
> Reviewed-by: Liu Bo 
> ---
>  cmds-inspect.c |   66 
> 
>  1 files changed, 66 insertions(+), 0 deletions(-)
> 
> diff --git a/cmds-inspect.c b/cmds-inspect.c
> index 4b7cea0..f435ea9 100644
> --- a/cmds-inspect.c
> +++ b/cmds-inspect.c
> @@ -81,6 +81,72 @@ out:
>   return !!ret;
>  }
>  
> +static void bg_flags_to_str(u64 flags, char *ret)
> +{
> + int empty = 1;
> +
> + if (flags & BTRFS_BLOCK_GROUP_DATA) {
> + empty = 0;
> + strcpy(ret, "DATA");
> + }
> + if (flags & BTRFS_BLOCK_GROUP_METADATA) {
> + if (!empty)
> + strcat(ret, "|");
> + strcat(ret, "METADATA");
> + }
> + if (flags & BTRFS_BLOCK_GROUP_SYSTEM) {
> + if (!empty)
> + strcat(ret, "|");
> + strcat(ret, "SYSTEM");
> + }
> +}
> +
> +/* Walking through the chunk tree to retrieve chunks. */

No empty newline.

> +
> +static int get_chunks(int fd, struct btrfs_ioctl_search_args *chunk_args)
> +{
> + struct btrfs_ioctl_search_key *sk;
> + int ret;
> + int e;
> +
> + sk = _args->key;
> +
> + sk->tree_id = BTRFS_CHUNK_TREE_OBJECTID;
> + sk->min_objectid = sk->max_objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID;
> + sk->max_type = sk->min_type = BTRFS_CHUNK_ITEM_KEY;

Please don't do multiple asignments in one statement.

> + sk->nr_items = 4096;
> +
> + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, chunk_args);
> + e = errno;

This is useless asignment, I've removed it from the code, please don't
reintrduce it.

> + if (ret < 0) {
> + fprintf(stderr, "ret %d error '%s'\n", ret,
> + strerror(e));
> + }
> + return ret;
> +}


> +
> +/* Given the objectid, find the block group item in the extent tree */
> +static int get_bg_info(int fd, struct btrfs_ioctl_search_args *bg_args,
> +u64 objectid, unsigned long length)
> +{
> + struct btrfs_ioctl_search_key *bg_sk;
> + int ret;
> + int e;
> +
> + bg_sk = _args->key;
> +
> + bg_sk->min_objectid = bg_sk->max_objectid = objectid;
> + bg_sk->nr_items = 1;
> + bg_sk->min_offset = bg_sk->max_offset = length;

Same here.

> +
> + ret = ioctl(fd, BTRFS_IOC_TREE_SEARCH, bg_args);
> + e = errno;
> + if (ret < 0) {
> + fprintf(stderr, "ret %d error '%s'\n", ret,
> + strerror(e));

Please take a look how the error messages are constructed when the tree
search ioctl fails, there are enough examples in the code.

> + }
> + return ret;
> +}
>  static const char * const cmd_inspect_inode_resolve_usage[] = {
>   "btrfs inspect-internal inode-resolve [-v]  ",
>   "Get file system paths for the given inode",

Actually, I'm not sure if such functions should exist at all, as they
only hide the search ioctl but don't do any validation of the returned
keys and data.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Compression and device replace on raid10 kernel panic on 4.4.6 and 4.6.x

2016-10-28 Thread Lionel Bouton
Hi,

as I don't have much time to handle a long backup recovery, I didn't try
the delete/add combination to avoid any risk.
What I tried though, was fatal_errors=bug. As I don't have any console I
thought it might at least help log the problem instead of the usual
kernel panic.

No luck : the problem still made the kernel panic. Unless someone comes
up with a somewhat safe way to recover from this situation I'll let the
filesystem as is (we are building a new platform where redundancy will
be handled by Ceph anyway).

Lionel

Le 27/10/2016 à 18:07, Lionel Bouton a écrit :
> Hi,
>
> Le 27/10/2016 à 02:50, Lionel Bouton a écrit :
>> [...]
>> I'll stop for tonight and see what happens during the day. I'd like to
>> try a device add / delete next but I'm worried I could end up with a
>> completely unusable filesystem if the device delete hits the same
>> problem than replace.
>> If the replace resuming on mount crashes the system I can cancel it but
>> there's no way to do so with a device delete. Or is by any chance the
>> skip_balance mount option a way to cancel a delete ?
> Can anyone just confirm (or infirm) that skip_balance will indeed
> effectively cancel a device delete before I try something that could
> force me to resort to backups ?
> Lionel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] btrfs-progs: Add command to check if balance op is req

2016-10-28 Thread David Sterba
On Wed, Oct 19, 2016 at 10:08:23AM -0700, Divya Indi wrote:
> On 10/17/2016 06:42 PM, Qu Wenruo wrote:
> > At 10/18/2016 08:35 AM, Divya Indi wrote:
> >> Add new subcommand to btrfs inspect-internal
> >>
> >> btrfs inspect-internal balance_check 
> >> Checks whether 'btrfs balance' can help creating more space (Only
> >> considers data block groups).
> >
> > I didn't think it's good to add a new subcommand just for that.
> >
> > Why not output such relocation sugguestion for you previous bg-analyze 
> > subcommand?
> > (It's better to make it a parameter to trigger such output)
> Or maybe as an option to btrfs balance start?
> Eg: btrfs balance start --check-only 

I tend to agree with this approach. The usecase, with some random sample
balance options:

 $ btrfs balance start --analyze -dusage=10 -musage=5 /path

returns if the command would be able to make any progress according to
the given filters and then

 $ btrfs balance start -dusage=10 -musage=5 /path

would actually do that. This is inherently racy, as the analysis will
happen in userspace and the filesystem block group layout could change
in the meantime.

This also implies that the filters are implemented in the userspace,
duplicating the kernel code. This is not neccesarily bad, as this would
work even on older kernels (with new progs), compared to a new analysis
mode of balance implemented in kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] Btrfs: incremental send, add generation for waiting_dir_move struct and check it in can_rmdir.

2016-10-28 Thread Filipe Manana
On Fri, Oct 28, 2016 at 2:40 AM, robbieko  wrote:
> From: Robbie Ko 
>
> Example scenario:
> Parent snapshot:
> | dir258/ (ino 258, gen 27)
> | dir257/ (ino 257, gen 27)
> | dir259/ (ino 259, gen 27)
>
> Send snapshot:
> | new_dir259/ (ino 259, gen 38)
> | new_dir258/ (ino 258, gen 38)
> | new_dir257/ (ino 257, gen 38)
>
> rmdir dir258/dir257
> mkdir o257-38-0
> mkdir o258-38-0
> chown o257-38-0 - uid=0, gid=0
> chmod o257-38-0 - mode=0755
> rename dir258 -> o258-27-0  -> dir is empty, but waiting o257-38-0
> mkdir o259-38-0
> chown o258-38-0 - uid=0, gid=0
> chmod o258-38-0 - mode=0755
> rmdir dir259
> rename o259-38-0 -> new_dir259
> utimes
> chown new_dir259 - uid=0, gid=0
> chmod new_dir259 - mode=0755
> rename o258-38-0 -> new_dir259/new_dir258
> utimes new_dir259/new_dir258
> utimes new_dir259
> rename o257-38-0 -> new_dir259/new_dir258/new_dir257
> rmdir o258-27-0 -> when o257-38-0 finish, delete
> utimes new_dir259/new_dir258/new_dir257
> utimes new_dir259/new_dir258
> utimes new_dir259
>
> While computing the send stream the following steps happen:
>
> 1) While processing inode 257 we delete dir258/dir257 immediately.
>After create o257-38-0, we then delay its rename operation
>because its new parent in the send snapshot, inode 258,
>was not yet processed and therefore not yet renamed.
>
> 2) On processing inode 258, we need to check if all under files are done
>before rmdir dir258. However, the waiting o257-38-0 (ino 257) will
>mislead the check making it think dir257 (ino 257) is still there.
>dir258 will be orphanized at first but in fact it has become empty.
>
> Fix this by adding generation to waiting_dir_move struct, so can_rmdir
> can use it to correct the result.

Fix what?
Your changelog does not say what is the problem. Does the send
operation fails? Or is it the receiver that fails for example due to
an operation using a wrong path name or attempting to rmdir a
non-empty directory? Or is the problem something else, like for
example are you optimizing something like avoiding unnecessary
renames?

Also the subject is not a summary of what is being fixed (e.g. "btrfs:
incremental send, fix invalid rmdir operations"). Instead you say what
code changes you do... and not what problem they are solving.

I haven't read all the other patches you sent along (and probably
won't have the time today), but by just looking at their subjects I
can infer the same comments apply.




>
> Signed-off-by: Robbie Ko 
> ---

After this line (---) you should mention what changed between patch
versions, see [1]

[1] 
https://btrfs.wiki.kernel.org/index.php/Writing_patch_for_btrfs#Updating_patches


thanks


>  fs/btrfs/send.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index 22eca86..eaf1c92 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -237,6 +237,7 @@ struct pending_dir_move {
>  struct waiting_dir_move {
> struct rb_node node;
> u64 ino;
> +   u64 gen;
> /*
>  * There might be some directory that could not be removed because it
>  * was waiting for this directory inode to be moved first. Therefore
> @@ -2930,6 +2931,7 @@ static int can_rmdir(struct send_ctx *sctx, u64 dir, 
> u64 dir_gen,
> struct btrfs_key found_key;
> struct btrfs_key loc;
> struct btrfs_dir_item *di;
> +   u64 gen;
>
> /*
>  * Don't try to rmdir the top/root subvolume dir.
> @@ -2969,8 +2971,13 @@ static int can_rmdir(struct send_ctx *sctx, u64 dir, 
> u64 dir_gen,
> struct btrfs_dir_item);
> btrfs_dir_item_key_to_cpu(path->nodes[0], di, );
>
> +   ret = get_inode_info(root, loc.objectid, NULL, , NULL,
> +NULL, NULL, NULL);
> +   if (ret < 0)
> +   goto out;
> +
> dm = get_waiting_dir_move(sctx, loc.objectid);
> -   if (dm) {
> +   if (dm && dm->gen == gen) {
> struct orphan_dir_info *odi;
>
> odi = add_orphan_dir_info(sctx, dir);
> @@ -3010,7 +3017,8 @@ static int is_waiting_for_move(struct send_ctx *sctx, 
> u64 ino)
> return entry != NULL;
>  }
>
> -static int add_waiting_dir_move(struct send_ctx *sctx, u64 ino, bool 
> orphanized)
> +static int add_waiting_dir_move(struct send_ctx *sctx, u64 ino, u64 gen,
> +bool orphanized)
>  {
> struct rb_node **p = >waiting_dir_moves.rb_node;
> struct rb_node *parent = NULL;
> @@ -3020,6 +3028,7 @@ static int add_waiting_dir_move(struct send_ctx *sctx, 
> u64 ino, bool orphanized)
> if (!dm)
> return -ENOMEM;
> dm->ino = ino;
> +   dm->gen = gen;
> dm->rmdir_ino = 0;

Re: [PATCH 2/3] btrfs-progs: send: fix handling of multiple snapshots (-p option)

2016-10-28 Thread David Sterba
On Wed, Oct 19, 2016 at 11:35:03AM +0900, Tsutomu Itoh wrote:
> We cannot send multiple snapshots at once by -p option.

We cannot like that it's broken, or we cannot because it's not supposed
to work that way. I guess it's the former, but the changelog text is a
bit confusing.
> 
> [before]
> # btrfs send -f /tmp/data0 -p Snap0 Snap[12]
> At subvol Snap1
> At subvol Snap2
> ERROR: parent determination failed for 0
> # 
> 
> [after]
> # btrfs send -f /tmp/data0 -p Snap0 Snap[12]
> At subvol Snap1
> At subvol Snap2
> # 

I'm not sure it's fixed, I wrote a simple test, attached, that triggers
the bug even with the patch applied.
#!/bin/bash
#
# minimal test for the following syntax: btrfs send -p parent subvol1 subvol2

source $TOP/tests/common

check_prereq mkfs.btrfs
check_prereq btrfs

setup_root_helper
prepare_test_dev 2g

run_check $TOP/mkfs.btrfs -f $IMAGE
run_check_mount_test_dev

here=`pwd`
cd "$TEST_MNT" || _fail "cannot chdir to TEST_MNT"

run_check $SUDO_HELPER btrfs subvolume create subv-parent
run_check $SUDO_HELPER dd if=/dev/urandom of=subv-parent/file bs=1M count=10
run_check $SUDO_HELPER btrfs subvolume snapshot -r subv-parent subv-snap1
run_check $SUDO_HELPER dd if=/dev/urandom of=subv-parent/file bs=1M count=10
run_check $SUDO_HELPER btrfs subvolume snapshot -r subv-parent subv-snap2
run_check $SUDO_HELPER dd if=/dev/urandom of=subv-parent/file bs=1M count=10
run_check $SUDO_HELPER btrfs subvolume snapshot -r subv-parent subv-snap3

run_check truncate -s0 "$here"/send.stream
run_check chmod a+w "$here"/send.stream
run_check $SUDO_HELPER btrfs send -f "$here"/send.stream -p subv-snap1 
subv-snap2 subv-snap3

cd "$here" || _fail "cannot chdir back to test directory"

run_check_umount_test_dev


Re: [PATCH 0/4] Qgroup comment enhance and balance fix

2016-10-28 Thread Goldwyn Rodrigues


On 10/27/2016 07:33 PM, Qu Wenruo wrote:
> Any comment?
> 
> Especially the final patch will fix a long standing bug.
> 

While I have tested it, and it works, I did not get time to review it.
So, you can have my

Tested-by: Goldwyn Rodrigues 

> Thanks,
> Qu
> 
> At 10/18/2016 09:31 AM, Qu Wenruo wrote:
>> The patchset does the following things:
>> 1) Enhance comment for qgroup, rename 2 functions
>>Explain the how qgroup works, so new developers won't waste too much
>>time digging into the boring codes.
>>
>>The qgroup work flow is split into 3 main phrases:
>>Reverse, Trace, Account.
>>And rename functions like btrfs_qgroup_insert_dirty_extent_record()
>>to btrfs_qgroup_trace_extent(), to follow the "Trace" phrase.
>>
>>Other function name already follows such schema before.
>>
>> 2) Move account_shared_subtree() and account_leaf_items() to qgroup.c
>>Such functions are only used by qgroup, so move them to qgroup.c and
>>rename them to follow "trace" schema.
>>
>> 3) Fix the long standing qgroup balance corruption
>>Commit 62b99540a1d91e4 doesn't fix the problem completely.
>>It can only handle case that merge_reloc_roots() are all done in one
>>transaction.
>>
>>If transaction commits during merge_reloc_roots(), the data extents
>>will leak again.
>>
>>The tree fix is to info qgroup to trace both subtree(tree reloc tree
>>and destination fs tree), at replace_path() time.
>>Inside  replace_path(), there is one transaction start and end, so we
>>must make qgroup to trace both subtrees.
>>
>>Thanks for previous work, now we can easily trace subtree, so the fix
>>is quite simple now.
>>
>>And the cause also makes it easier to create pinpoint test case for
>>this bug.
>>
>> Qu Wenruo (4):
>>   btrfs: qgroup: Add comments explaining how btrfs qgroup works
>>   btrfs: qgroup: Rename functions to make it follow
>> reserve,trace,account steps
>>   btrfs: Expoert and move leaf/subtree qgroup helpers to qgroup.c
>>   btrfs: qgroup: Fix qgroup data leaking by using subtree tracing
>>
>>  fs/btrfs/delayed-ref.c   |   2 +-
>>  fs/btrfs/extent-tree.c   | 220
>> +--
>>  fs/btrfs/qgroup.c| 219
>> +-
>>  fs/btrfs/qgroup.h|  64 +++--
>>  fs/btrfs/relocation.c| 119 +--
>>  fs/btrfs/tree-log.c  |   2 +-
>>  include/trace/events/btrfs.h |   2 +-
>>  7 files changed, 302 insertions(+), 326 deletions(-)
>>
> 
> 

-- 
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: remove balance warning that does not reflect any problem

2016-10-28 Thread fdmanana
From: Filipe Manana 

On openSUSE/SLE systems where balance is triggered periodically in the
background, snapshotting happens when doing package installations and
upgrades, and (by default) the root system is organized with multiple
subvolumes, the following warning was triggered often:

[  630.773059] WARNING: CPU: 1 PID: 2549 at fs/btrfs/relocation.c:1848 
replace_path+0x3f0/0x940 [btrfs]
[  630.773060] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs xfs 
libcrc32c acpi_cpufreq tpm_tis ppdev tpm parport_pc parport pcspkr e1000
qemu_fw_cfg joydev i2c_piix4 button btrfs xor raid6_pq sr_mod cdrom ata_generic 
virtio_scsi bochs_drm drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops ttm ata_piix virtio_pci virtio_ring virtio serio_raw floppy drm sg
[  630.773070] CPU: 1 PID: 2549 Comm: btrfs Tainted: GW   
4.7.7-2-btrfs+ #2
[  630.773071] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[  630.773072]   8801f704b8c8 813afd12 

[  630.773073]   8801f704b908 81081f8b 
0738
[  630.773075]  0001 8801e32eb8c0 1600 
8800
[  630.773076] Call Trace:
[  630.773078]  [] dump_stack+0x63/0x81
[  630.773079]  [] __warn+0xcb/0xf0
[  630.773080]  [] warn_slowpath_null+0x1d/0x20
[  630.773090]  [] replace_path+0x3f0/0x940 [btrfs]
[  630.773092]  [] ? ring_buffer_unlock_commit+0x3e/0x2a0
[  630.773102]  [] merge_reloc_root+0x2b4/0x600 [btrfs]
[  630.773111]  [] merge_reloc_roots+0x140/0x250 [btrfs]
[  630.773120]  [] relocate_block_group+0x317/0x680 [btrfs]
[  630.773129]  [] btrfs_relocate_block_group+0x1cc/0x2d0 
[btrfs]
[  630.773139]  [] btrfs_relocate_chunk.isra.40+0x56/0xf0 
[btrfs]
[  630.773149]  [] __btrfs_balance+0x8d5/0xbb0 [btrfs]
[  630.773159]  [] btrfs_balance+0x2d0/0x5e0 [btrfs]
[  630.773168]  [] btrfs_ioctl_balance+0x383/0x390 [btrfs]
[  630.773178]  [] btrfs_ioctl+0x90f/0x1fb0 [btrfs]
[  630.773180]  [] ? pte_alloc_one+0x33/0x40
[  630.773182]  [] do_vfs_ioctl+0x93/0x5a0
[  630.773183]  [] ? __do_page_fault+0x203/0x4e0
[  630.773185]  [] SyS_ioctl+0x79/0x90
[  630.773186]  [] entry_SYSCALL_64_fastpath+0x1e/0xa8
[  630.773187] ---[ end trace 2cd6167577bf3be7 ]---

It turned out that this warning does not reflect any problem and just
makes users/system administrators worry about something going wrong.
The warning happens because when we create a relocation root (which is
just a snapshot of a subvolume tree) we set its last_snapshot field (as
well as for the subvolume's tree root) to a value corresponding to the
generation of the current transaction minus 1 (we do this at
relocation.c:create_reloc_root()). This means that when we merge the
relocation tree with the corresponding subvolume tree, at
walk_down_reloc_tree() we can catch pointers (bytenr/generation pairs)
with a generation that matches the generation of the transaction where
we created the relocation root, so those pointers correspond to tree
blocks created either before or after the relocation root was created.
If they were created before the relocation root (and in the same
transaction) we hit the warning, which we can safely remove because it
means the tree blocks are accessible from both trees (the subvolume
tree and the relocation tree).

So fix this by removing the warning and adding a couple assertions that
verify the pointer generations match and that their generation matches
the value of the last_snapshot field from the relocation tree plus 1.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/relocation.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0ec8ffa..cdc1a1c 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1848,7 +1848,26 @@ again:
new_ptr_gen = 0;
}
 
-   if (WARN_ON(new_bytenr > 0 && new_bytenr == old_bytenr)) {
+   /*
+* When we create the reloc root (which is a snapshot of the
+* subvolume tree) we set its last_snapshot field (as well as
+* for the subvolume's tree root) to the value of the current
+* transaction generation minus 1 (at create_reloc_root()).
+* This means that at walk_down_reloc_tree() we can catch
+* pointers (bytenr/generation pairs) with a generation
+* matching the generation of the transaction where we created
+* the reloc root, so those pointers correspond to tree blocks
+* that were either created before or after the reloc root was
+* created. If walk_down_reloc_tree() gave us a path that points
+* to a tree block that was created (or COWed) before the reloc
+* root was created and in the 

Re: [PATCH] btrfs-progs: repair: Trickle down EEXISTS while freeing

2016-10-28 Thread Goldwyn Rodrigues


On 10/28/2016 09:42 AM, David Sterba wrote:
> On Thu, Oct 27, 2016 at 01:05:08PM -0500, Goldwyn Rodrigues wrote:
>> While deleting pending extents, we insert extents to the
>> extent_tree. However, because of corruption, this might already
>> be freed. We trickle the EEXISTS error and let the caller
>> decide what needs to be done with it. In this case, ignore, so
>> the repair may proceed.
>>
>> The primary motivation of this is to resolve this BUG_ON:
>> extent-tree.c:2731: alloc_reserved_tree_block: Assertion `ret` failed, value >> 0
>>
>> #2  assert_trace (val=0, line=2728, func=0x489510 <__func__.11558> 
>> "alloc_reserved_tree_block", 
>>  filename=0x489065 "extent-tree.c", assertion=0x48cfb5 "ret") at 
>> kerncompat.h:102
>> #3  alloc_reserved_tree_block (trans=trans@entry=0x7e7240, 
>> root=root@entry=0x6a88a0, root_objectid=2, 
>>  generation=175654, flags=0, key=key@entry=0x2bc5c70, level=0, 
>> ins=ins@entry=0x7fffcf30) at extent-tree.c:2728
>> #4  finish_current_insert (trans=trans@entry=0x7e7240, 
>> extent_root=extent_root@entry=0x6a88a0)
>> at extent-tree.c:2108
>> #5  __free_extent (trans=trans@entry=0x7e7240, root=root@entry=0x6a88a0, 
>> bytenr=57127387136, 
>>  num_bytes=, parent=parent@entry=0, 
>> root_objectid=, owner_objectid=0, 
>>  owner_offset=owner_offset@entry=0, refs_to_drop=refs_to_drop@entry=1) 
>> at extent-tree.c:2410
>> #6  del_pending_extents (trans=trans@entry=0x7e7240, extent_root=0x6a88a0) 
>> at extent-tree.c:2448
> 
> Do you have a reproducer for that?

I have a corrupt btrfs-image, not a test case though. Should I share
that? If yes, where should I upload it?

> 
>> Signed-off-by: Goldwyn Rodrigues 
> 
> Applied, thanks.
> 

Ahh, just realized. Spell check s/EEXISTS/EEXIST/

-- 
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: repair: Trickle down EEXISTS while freeing

2016-10-28 Thread David Sterba
On Thu, Oct 27, 2016 at 01:05:08PM -0500, Goldwyn Rodrigues wrote:
> While deleting pending extents, we insert extents to the
> extent_tree. However, because of corruption, this might already
> be freed. We trickle the EEXISTS error and let the caller
> decide what needs to be done with it. In this case, ignore, so
> the repair may proceed.
> 
> The primary motivation of this is to resolve this BUG_ON:
> extent-tree.c:2731: alloc_reserved_tree_block: Assertion `ret` failed, value 0
> 
> #2  assert_trace (val=0, line=2728, func=0x489510 <__func__.11558> 
> "alloc_reserved_tree_block", 
>   filename=0x489065 "extent-tree.c", assertion=0x48cfb5 "ret") at 
> kerncompat.h:102
> #3  alloc_reserved_tree_block (trans=trans@entry=0x7e7240, 
> root=root@entry=0x6a88a0, root_objectid=2, 
>   generation=175654, flags=0, key=key@entry=0x2bc5c70, level=0, 
> ins=ins@entry=0x7fffcf30) at extent-tree.c:2728
> #4  finish_current_insert (trans=trans@entry=0x7e7240, 
> extent_root=extent_root@entry=0x6a88a0)
> at extent-tree.c:2108
> #5  __free_extent (trans=trans@entry=0x7e7240, root=root@entry=0x6a88a0, 
> bytenr=57127387136, 
>   num_bytes=, parent=parent@entry=0, 
> root_objectid=, owner_objectid=0, 
>   owner_offset=owner_offset@entry=0, refs_to_drop=refs_to_drop@entry=1) 
> at extent-tree.c:2410
> #6  del_pending_extents (trans=trans@entry=0x7e7240, extent_root=0x6a88a0) at 
> extent-tree.c:2448

Do you have a reproducer for that?

> Signed-off-by: Goldwyn Rodrigues 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: don't abuse REQ_OP_* flags for btrfs_map_block

2016-10-28 Thread David Sterba
On Thu, Oct 27, 2016 at 09:27:36AM +0200, Christoph Hellwig wrote:
> btrfs_map_block supports different types of mappings, which to a large
> extent resemble block layer operations.  But they don't always do, and
> currently btrfs dangerously overlays it's own flag over the block layer
> flags.  This is just asking for a conflict, so introduce a different
> map flags enum inside of btrfs instead.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: Ignore clang complete file

2016-10-28 Thread David Sterba
On Thu, Oct 27, 2016 at 10:32:48AM +0800, Qu Wenruo wrote:
> While most guys are using ctags and cscope with vim, new completion tool
> like vim-clang_completion is gaining its popularity, due to its compiler
> level accuracy simpleness to use.
> 
> Since ctags and cscope are already in .gitignore, I see no reason to
> reject .clang_complete.

Me neither, applied. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs

2016-10-28 Thread Chris Mason
Hi Linus,

My for-linus-4.9 has two fixes in it:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
for-linus-4.9

My patch fixes the btrfs list_head abuse that we tracked down during
Dave Jones' memory corruption investigation.  With both Jens and my
patches in place, I'm no longer able to trigger problems.

Filipe is fixing a difficult old bug between snapshots, balance and
send.  Dave is cooking a few more for the next rc, but these are tested
and ready:

Chris Mason (1) commits (+6/-14):
btrfs: fix races on root_log_ctx lists

Filipe Manana (1) commits (+58/-0):
Btrfs: fix incremental send failure caused by balance

Total: (2) commits (+64/-14)

 fs/btrfs/send.c | 58 +
 fs/btrfs/tree-log.c | 20 ++
 2 files changed, 64 insertions(+), 14 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs-progs: fix build for programs including ioctl.h

2016-10-28 Thread Eric Sandeen
On 10/27/16 9:54 PM, Eric Sandeen wrote:
> On 10/13/16 12:36 PM, Eric Sandeen wrote:
>> This was reported when docker failed to build against
>> btrfs-progs v4.8.1.
>>
>> It includes ioctl.h which now calls BUILD_ASSERT(), which
>> is defined in kerncompat.h, which was not included in the
>> ioctl.h header file.
> 
> Ping?

Oh sorry I see it's in, I had searched by author and missed it.

Thanks,
-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount error raid6

2016-10-28 Thread Jan Koester
On Sonntag, 25. September 2016 11:25:40 CEST you wrote:
> Hi,
> 
> i got some errors in dmesg if i want to mount my Filesystem,
> 
> uname -a:
> Linux dibsi 4.8.0-040800rc7-generic #201609182130 SMP Mon Sep 19 01:32:13
> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> 
> dmesg:
> [   71.599464] BTRFS warning (device sde): 'recovery' is deprecated, use
> 'usebackuproot' instead
> [   71.599469] BTRFS info (device sde): trying to use backup root at mount
> time
> [   71.599477] BTRFS info (device sde): disabling disk space caching
> [   71.599487] BTRFS info (device sde): has skinny extents
> [   71.654586] BTRFS error (device sde): parent transid verify failed on
> 2280458645504 wanted 861168 found 860380
> [   71.703142] BTRFS info (device sde): bdev /dev/sda errs: wr 111, rd
> 124642, flush 0, corrupt 0, gen 0
> [   71.752921] BTRFS error (device sde): parent transid verify failed on
> 2280389693440 wanted 861168 found 860380
> [   71.772571] BTRFS error (device sde): parent transid verify failed on
> 2280389709824 wanted 861168 found 860380
> [   71.785304] BTRFS info (device sde): checking UUID tree
> [   71.853116] BTRFS error (device sde): parent transid verify failed on
> 2280458502144 wanted 861168 found 860380
> [   71.868238] BTRFS error (device sde): parent transid verify failed on
> 2280458506240 wanted 861168 found 860380
> [   71.868488] BTRFS error (device sde): parent transid verify failed on
> 2280458510336 wanted 861168 found 860380
> [   71.868977] BTRFS error (device sde): parent transid verify failed on
> 2280458620928 wanted 861168 found 860380
> [   71.869262] BTRFS error (device sde): parent transid verify failed on
> 2280458625024 wanted 861168 found 860380
> [   71.881341] BTRFS error (device sde): parent transid verify failed on
> 2280458629120 wanted 861168 found 860380
> [   71.926156] BTRFS error (device sde): parent transid verify failed on
> 2280458633216 wanted 861168 found 860380
> [  101.960210] [ cut here ]
> [  101.960277] WARNING: CPU: 0 PID: 2850 at /home/kernel/COD/linux/fs/btrfs/
> extent-tree.c:6948 __btrfs_free_extent.isra.71+0x85a/0xd40 [btrfs] [ 
> 101.960293] Modules linked in: binfmt_misc crc32c_generic btrfs xor
> raid6_pq kvm_amd kvm irqbypass snd_hda_codec_hdmi serio_raw snd_hda_intel
> snd_hda_codec k10temp snd_usb_audio snd_hda_core joydev snd_usbmidi_lib
> snd_seq_midi snd_seq_midi_event snd_hwdep snd_ctxfi snd_rawmidi snd_pcm
> snd_seq snd_seq_device i2c_piix4 snd_timer snd tpm_infineon sg soundcore
> evdev shpchp acpi_cpufreq tpm_tis tpm_tis_core tpm parport_pc ppdev sunrpc
> lp parport autofs4 ext4 crc16 jbd2 fscrypto mbcache sd_mod hid_generic
> usbhid hid ohci_pci amdkfd radeon i2c_algo_bit ttm ahci drm_kms_helper
> ohci_hcd xhci_pci libahci syscopyarea ehci_pci xhci_hcd sysfillrect libata
> ehci_hcd sysimgblt fb_sys_fops e1000e usbcore drm scsi_mod r8169 ptp mii
> pps_core usb_common wmi fjes button
> [  101.960395] CPU: 0 PID: 2850 Comm: btrfs-transacti Not tainted
> 4.8.0-040800rc7-generic #201609182130
> [  101.960397] Hardware name: Gigabyte Technology Co., Ltd. GA-970A-D3/
> GA-970A-D3, BIOS F12 09/03/2013
> [  101.960402]  00200286 198bae94 9b53fba4
> 
> [  101.960409]   9b27f6ae 00df39b62000
> 883781532d90
> [  101.960415]  fffe 88376bfd2140 8837c393a000
> 
> [  101.960421] Call Trace:
> [  101.960431]  [] ? dump_stack+0x5c/0x78
> [  101.960438]  [] ? __warn+0xbe/0xe0
> [  101.960484]  [] ?
> __btrfs_free_extent.isra.71+0x85a/0xd40 [btrfs]
> [  101.960527]  [] ? btrfs_previous_extent_item+0xe0/0x100
> [btrfs]
> [  101.960570]  [] ?
> update_block_group.isra.70+0x133/0x460 [btrfs]
> [  101.960577]  [] ? __set_page_dirty_nobuffers+0xec/0x140
> [  101.960621]  [] ?
> __btrfs_run_delayed_refs+0x4f3/0x13d0 [btrfs]
> [  101.960668]  [] ? btrfs_run_delayed_refs+0x9a/0x2b0
> [btrfs]
> [  101.960673]  [] ? sched_clock+0x5/0x10
> [  101.960723]  [] ? btrfs_commit_transaction+0x50/0xa60
> [btrfs]
> [  101.960771]  [] ? start_transaction+0x8f/0x4c0 [btrfs]
> [  101.960819]  [] ? transaction_kthread+0x1e9/0x200
> [btrfs] [  101.960866]  [] ?
> btrfs_cleanup_transaction+0x5c0/0x5c0 [btrfs]
> [  101.960872]  [] ? kthread+0xcd/0xf0
> [  101.960877]  [] ? __switch_to+0x2c1/0x7a0
> [  101.960884]  [] ? ret_from_fork+0x1f/0x40
> [  101.960890]  [] ? kthread_create_on_node+0x1a0/0x1a0
> [  101.960893] ---[ end trace 413d8788543e689c ]---
> [  101.960900] BTRFS info (device sde): leaf 950359654400 total ptrs 56 free
> space 945
> [  101.960904]  item 0 key (503649468416 168 4096) itemoff 3944 itemsize 51
> [  101.960908]  extent refs 1 gen 861173 flags 2
> [  101.960911]  tree block key (0 0 0) level 0
> [  101.960912]  tree block backref root 7
> [  101.960916]  item 1 key (503649468416 192 1107296256) itemoff 3920
> itemsize 24
> [  101.960918]  block group used 45056
> [  101.960921]  item 2 key 

Re: It takes me up to 20 reboots for the system to start sda

2016-10-28 Thread Kai Krakow
Am Fri, 28 Oct 2016 02:21:27 -0600
schrieb Bearcat Şándor :

> Thanks for the suggestions Kai.  I'm using dracut 044.
> 
> I tried using the rootdelay=2 kernel parameter to little effect other
> than slowing down my booting.
> 
> I found this, which may be related:
> https://github.com/dracutdevs/dracut/issues/149 (btrfs raid on rootdev
> is unreliably mounted).

Ah okay, then I maybe fixed it by setting grub not to use UUIDs for
mounting:

$ cat /etc/default/grub
...
GRUB_DISABLE_LINUX_UUID="true"
GRUB_DEVICE="LABEL=system"

Additionally I manually forced mounting by label (of course you need to
set a label for your filesystem). And I've set rootfstype so it won't
try all the other different filesystems first (which leads to a lot of
error messages in dmesg).

But I later switched to systemd-boot. But it also used labels for
mounting root. It looks like this:

title  Gentoo/Linux 13.0
version4.8.4-gentoo
machine-id 121b87ca633e8ac001665668001b
linux  /vmlinuz-4.8.4-gentoo
initrd /initramfs-4.8.4-gentoo.img
optionsroot=LABEL=system rootfstype=btrfs 
rootflags=compress=lzo,nossd,autodefrag,noatime zswap.enabled=1 splash 
loglevel=3

As you can see, I no longer need rootdelay or rootwait as parameters.


> On Thu, Oct 27, 2016 at 7:17 PM, Bearcat Şándor
>  wrote:
> > Folks,
> >
> > I have a btrfs raid 10 spread out among 4 ssds.  The main drive is
> > sda. When i reboot my system after the kernel loads i see "a start
> > job is running for dev-sda1.device.  I've let this sit for 3 hours
> > with no progress into the boot, and it seems to be a race condition
> > of some sort.
> >
> > If i restart around 20 times one of them will let the system
> > continue booting, though i've been lucky enough to have it happen
> > in 4. I have the following systemd unit:
> > /etc/systemd/system/local-fs-pre.target.wants/btrfs-dev-scan.service
> >
> > [Unit]
> > Description=Btrfs scan devices
> > Before=local-fs-pre.target
> > DefaultDependencies=false
> >
> > [Service]
> > Type=oneshot
> > ExecStart=/sbin/btrfs device scan
> >
> > [Install]
> > WantedBy=local-fs-pre.target
> >
> > These 4 disks were joined to /dev/sda
> >btrfs device add -f /dev/sdc /dev/sdd /dev/sde /
> >
> > followed by
> > btrfs balance start -dconvert=raid10 -mconvert=raid10 /
> >
> > my fstab contains:
> > /dev/sdf1 /boot vfat defaults,noatime,discard 0 2
> > /dev/sda / btrfs defaults,noatime,discard,compress=zlib,autodefrag
> > 0 1 /dev/sdb  /home/hometheater btrfs
> > defaults,noatime,discard,compress=zlib,autodefrag 0 2
> >
> >
> > Any ideas would be appreciated. For what it's worth i have smartd
> > running.
> >
> > Thank you,
> >
> > --
> > Bearcat M. Şándor
> > Feline Soul Systems LLC
> > Voice: 872.CAT.SOUL (872.228.7685)
> > Fax: 406.235.7070  
> 
> 
> 



-- 
Regards,
Kai

Replies to list-only preferred.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: It takes me up to 20 reboots for the system to start sda

2016-10-28 Thread Bearcat Şándor
Thanks for the suggestions Kai.  I'm using dracut 044.

I tried using the rootdelay=2 kernel parameter to little effect other
than slowing down my booting.

I found this, which may be related:
https://github.com/dracutdevs/dracut/issues/149 (btrfs raid on rootdev
is unreliably mounted).

On Thu, Oct 27, 2016 at 7:17 PM, Bearcat Şándor  wrote:
> Folks,
>
> I have a btrfs raid 10 spread out among 4 ssds.  The main drive is
> sda. When i reboot my system after the kernel loads i see "a start job
> is running for dev-sda1.device.  I've let this sit for 3 hours with no
> progress into the boot, and it seems to be a race condition of some
> sort.
>
> If i restart around 20 times one of them will let the system continue
> booting, though i've been lucky enough to have it happen in 4. I have
> the following systemd unit:
> /etc/systemd/system/local-fs-pre.target.wants/btrfs-dev-scan.service
>
> [Unit]
> Description=Btrfs scan devices
> Before=local-fs-pre.target
> DefaultDependencies=false
>
> [Service]
> Type=oneshot
> ExecStart=/sbin/btrfs device scan
>
> [Install]
> WantedBy=local-fs-pre.target
>
> These 4 disks were joined to /dev/sda
>btrfs device add -f /dev/sdc /dev/sdd /dev/sde /
>
> followed by
> btrfs balance start -dconvert=raid10 -mconvert=raid10 /
>
> my fstab contains:
> /dev/sdf1 /boot vfat defaults,noatime,discard 0 2
> /dev/sda / btrfs defaults,noatime,discard,compress=zlib,autodefrag 0 1
> /dev/sdb  /home/hometheater btrfs
> defaults,noatime,discard,compress=zlib,autodefrag 0 2
>
>
> Any ideas would be appreciated. For what it's worth i have smartd running.
>
> Thank you,
>
> --
> Bearcat M. Şándor
> Feline Soul Systems LLC
> Voice: 872.CAT.SOUL (872.228.7685)
> Fax: 406.235.7070



-- 
Bearcat M. Şándor
Feline Soul Systems LLC
Voice: 872.CAT.SOUL (872.228.7685)
Fax: 406.235.7070
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled

2016-10-28 Thread Eryu Guan
On Fri, Oct 28, 2016 at 03:05:55PM +0800, Wang Xiaoguang wrote:
> hi,
> 
> On 10/27/2016 07:25 PM, Eryu Guan wrote:
> > On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote:
> > > When enabling btrfs compression, original codes can not fill fs
> > > correctly, here we introduce _fill_fs() in common/rc, which'll keep
> > > creating and writing files until enospc error occurs. Note _fill_fs
> > > is copied from tests/generic/256, but with some minor modifications.
> > > 
> > > Signed-off-by: Wang Xiaoguang 
> > Looks fine to me overall, generic/17[1-4] and generic/256 passed on xfs,
> > btrfs and btrfs with compress. But I'd like Darrick to review it as well :)
> Could you please give me your btrfs's kernel version?
> When enabling btrfs compression run these 4 test cases, I often got
> enospc error, it seems that you didn't run into enospc errors.

I'm using 4.9-rc1 kernel, with both test_dev and scratch_dev 15G in
size, btrfs-progs v4.6

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled

2016-10-28 Thread Eryu Guan
On Fri, Oct 28, 2016 at 03:00:29PM +0800, Wang Xiaoguang wrote:
> hi,
> 
> On 10/28/2016 01:13 AM, Darrick J. Wong wrote:
> > On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote:
> > > When enabling btrfs compression, original codes can not fill fs
> > > correctly, here we introduce _fill_fs() in common/rc, which'll keep
> > > creating and writing files until enospc error occurs. Note _fill_fs
> > > is copied from tests/generic/256, but with some minor modifications.
> > > 
> > > Signed-off-by: Wang Xiaoguang 
> > > ---
> > > V2: In common/, I did't find an existing function suitable for
> > >  these 4 test cases to fill fs, so I still use _pwrite_byte() with
> > >  a big enough file length fo fill fs. Note, for btrfs, metadata space
> > >  still is not full, only data space is full, but it's OK for these
> > >  4 test cases.
> > > 
> > >  All these 4 cases pass in xfs and btrfs(without compression), if
> > >  btrfs has compression enabled, these 4 cases will fail for false
> > >  enospc error, I have sent kernel patches to fix this bug.
> > > 
> > > V3: Introduce  _fill_fs in common/rc to fill fs.
> > > ---
> > >   common/rc | 50 ++
> > >   tests/generic/171 |  4 +---
> > >   tests/generic/172 |  4 ++--
> > >   tests/generic/173 |  4 +---
> > >   tests/generic/174 |  4 +---
> > >   tests/generic/256 | 65 
> > > +--
> > >   6 files changed, 60 insertions(+), 71 deletions(-)
> > > 
> > > diff --git a/common/rc b/common/rc
> > > index 7a9fc90..0e1ac5d 100644
> > > --- a/common/rc
> > > +++ b/common/rc
> > Would you mind moving this to common/populate, where I've been (slowly)
> > collecting all the "create stuff inside the filesystem" routines?
> OK, sure.
> > 
> > (It's part of a slow-moving effort to declutter common/rc.)
> > 
> > > @@ -4003,6 +4003,56 @@ _require_xfs_mkfs_without_validation()
> > >   fi
> > >   }
> > > +# Fill a file system by repeatedly creating files in the given folder
> > > +# starting with the given file size.  Files are reduced in size when
> > > +# they can no longer fit until no more files can be created.
> > > +_fill_fs()
> > > +{
> > > + local file_size=$1
> > > + local dir=$2
> > > + local block_size=$3
> > > + local switch_user=$4
> > > + local file_count=1
> > > + local bytes_written=0
> > > +
> > > + if [ $# -ne 4 ]; then
> > > + echo "Usage: _fill_fs filesize dir blocksize"
> > > + exit 1
> > > + fi
> > > +
> > > + if [ $switch_user -eq 0 ]; then
> > > + mkdir -p $dir
> > > + else
> > > + _user_do "mkdir -p $dir"
> > > + fi
> > > +
> > > + if [ $file_size -lt $block_size ]; then
> > > + $file_size = $block_size
> > > + fi
> > > +
> > > + while [ $file_size -ge $block_size ]; do
> > > + bytes_written=0
> > > + if [ $switch_user -eq 0 ]; then
> > > + $XFS_IO_PROG -f -c "pwrite 0 $file_size" \
> > > + $dir/$file_count > /dev/null
> > If you want to speed this up some more you could pass "-b 8388608" in
> > the pwrite string so that xfs_io will allocate an 8MB memory buffer
> > internally when writing out data.
> > 
> > $XFS_IO_PROG -f -c "pwrite -b 8388608 0 $file_size" $dir/$file_count
> Thanks for this info, I'll use it.
> > 
> > > + else
> > > + _user_do "$XFS_IO_PROG -f -c \"pwrite 0 $file_size\" \
> > > + $dir/$file_count > /dev/null"
> > Also, why redirect to /dev/null?  I think helper functions generally let
> > the individual test decide where the stdout & stderr output ultimately
> > end up.  Most tests seem to dump to $seqres.full or /dev/null, but we
> > should let the test decide where potentially useful diagnostic
> > information ends up.
> Agree :)
> 
> > /me also wonders if falloc would be a faster way to consume disk blocks
> > than pwrite, but ... 
> Oh, it is. Falloc is not affected by compression.

One problem with falloc is that not all filesystems support it, e.g.
ext3 and ext2. So you have to fall back to pwrite if falloc is not
supported.

Thanks,
Eryu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled

2016-10-28 Thread Wang Xiaoguang

hi,

On 10/27/2016 07:25 PM, Eryu Guan wrote:

On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote:

When enabling btrfs compression, original codes can not fill fs
correctly, here we introduce _fill_fs() in common/rc, which'll keep
creating and writing files until enospc error occurs. Note _fill_fs
is copied from tests/generic/256, but with some minor modifications.

Signed-off-by: Wang Xiaoguang 

Looks fine to me overall, generic/17[1-4] and generic/256 passed on xfs,
btrfs and btrfs with compress. But I'd like Darrick to review it as well :)

Could you please give me your btrfs's kernel version?
When enabling btrfs compression run these 4 test cases, I often got
enospc error, it seems that you didn't run into enospc errors.




---
V2: In common/, I did't find an existing function suitable for
 these 4 test cases to fill fs, so I still use _pwrite_byte() with
 a big enough file length fo fill fs. Note, for btrfs, metadata space
 still is not full, only data space is full, but it's OK for these
 4 test cases.

 All these 4 cases pass in xfs and btrfs(without compression), if
 btrfs has compression enabled, these 4 cases will fail for false
 enospc error, I have sent kernel patches to fix this bug.

V3: Introduce  _fill_fs in common/rc to fill fs.
---
  common/rc | 50 ++
  tests/generic/171 |  4 +---
  tests/generic/172 |  4 ++--
  tests/generic/173 |  4 +---
  tests/generic/174 |  4 +---
  tests/generic/256 | 65 +--
  6 files changed, 60 insertions(+), 71 deletions(-)

diff --git a/common/rc b/common/rc
index 7a9fc90..0e1ac5d 100644
--- a/common/rc
+++ b/common/rc
@@ -4003,6 +4003,56 @@ _require_xfs_mkfs_without_validation()
fi
  }
  
+# Fill a file system by repeatedly creating files in the given folder

+# starting with the given file size.  Files are reduced in size when
+# they can no longer fit until no more files can be created.
+_fill_fs()
+{
+   local file_size=$1
+   local dir=$2
+   local block_size=$3
+   local switch_user=$4
+   local file_count=1
+   local bytes_written=0
+
+   if [ $# -ne 4 ]; then
+   echo "Usage: _fill_fs filesize dir blocksize"

The usage info here is wrong, missing the "switch user" argument.

Thanks for review, I'll fix it in v4.

Regards,
Xiaoguang Wang


Thanks,
Eryu






--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] generic: make 17[1-4] work well when btrfs compression is enabled

2016-10-28 Thread Wang Xiaoguang

hi,

On 10/28/2016 01:13 AM, Darrick J. Wong wrote:

On Wed, Oct 26, 2016 at 05:52:11PM +0800, Wang Xiaoguang wrote:

When enabling btrfs compression, original codes can not fill fs
correctly, here we introduce _fill_fs() in common/rc, which'll keep
creating and writing files until enospc error occurs. Note _fill_fs
is copied from tests/generic/256, but with some minor modifications.

Signed-off-by: Wang Xiaoguang 
---
V2: In common/, I did't find an existing function suitable for
 these 4 test cases to fill fs, so I still use _pwrite_byte() with
 a big enough file length fo fill fs. Note, for btrfs, metadata space
 still is not full, only data space is full, but it's OK for these
 4 test cases.

 All these 4 cases pass in xfs and btrfs(without compression), if
 btrfs has compression enabled, these 4 cases will fail for false
 enospc error, I have sent kernel patches to fix this bug.

V3: Introduce  _fill_fs in common/rc to fill fs.
---
  common/rc | 50 ++
  tests/generic/171 |  4 +---
  tests/generic/172 |  4 ++--
  tests/generic/173 |  4 +---
  tests/generic/174 |  4 +---
  tests/generic/256 | 65 +--
  6 files changed, 60 insertions(+), 71 deletions(-)

diff --git a/common/rc b/common/rc
index 7a9fc90..0e1ac5d 100644
--- a/common/rc
+++ b/common/rc

Would you mind moving this to common/populate, where I've been (slowly)
collecting all the "create stuff inside the filesystem" routines?

OK, sure.


(It's part of a slow-moving effort to declutter common/rc.)


@@ -4003,6 +4003,56 @@ _require_xfs_mkfs_without_validation()
fi
  }
  
+# Fill a file system by repeatedly creating files in the given folder

+# starting with the given file size.  Files are reduced in size when
+# they can no longer fit until no more files can be created.
+_fill_fs()
+{
+   local file_size=$1
+   local dir=$2
+   local block_size=$3
+   local switch_user=$4
+   local file_count=1
+   local bytes_written=0
+
+   if [ $# -ne 4 ]; then
+   echo "Usage: _fill_fs filesize dir blocksize"
+   exit 1
+   fi
+
+   if [ $switch_user -eq 0 ]; then
+   mkdir -p $dir
+   else
+   _user_do "mkdir -p $dir"
+   fi
+
+   if [ $file_size -lt $block_size ]; then
+   $file_size = $block_size
+   fi
+
+   while [ $file_size -ge $block_size ]; do
+   bytes_written=0
+   if [ $switch_user -eq 0 ]; then
+   $XFS_IO_PROG -f -c "pwrite 0 $file_size" \
+   $dir/$file_count > /dev/null

If you want to speed this up some more you could pass "-b 8388608" in
the pwrite string so that xfs_io will allocate an 8MB memory buffer
internally when writing out data.

$XFS_IO_PROG -f -c "pwrite -b 8388608 0 $file_size" $dir/$file_count

Thanks for this info, I'll use it.



+   else
+   _user_do "$XFS_IO_PROG -f -c \"pwrite 0 $file_size\" \
+   $dir/$file_count > /dev/null"

Also, why redirect to /dev/null?  I think helper functions generally let
the individual test decide where the stdout & stderr output ultimately
end up.  Most tests seem to dump to $seqres.full or /dev/null, but we
should let the test decide where potentially useful diagnostic
information ends up.

Agree :)


/me also wonders if falloc would be a faster way to consume disk blocks
than pwrite, but ... 

Oh, it is. Falloc is not affected by compression.


+   fi
+
+   if [ -f $dir/$file_count ]; then
+   bytes_written=$(stat -c '%s' $dir/$file_count)
+   fi
+
+   # If there was no room to make the file, then divide it in
+   # half, and keep going
+   if [ $bytes_written -lt $file_size ]; then
+   file_size=$((file_size / 2))
+   fi
+   file_count=$((file_count + 1))
+   done
+}
+
  init_rc
  
  

diff --git a/tests/generic/171 b/tests/generic/171
index a69f798..fde8ef2 100755
--- a/tests/generic/171
+++ b/tests/generic/171
@@ -75,9 +75,7 @@ _cp_reflink $testdir/bigfile $testdir/clonefile
  sync
  
  echo "Allocate the rest of the space"

-nr_free=$(stat -f -c '%f' $testdir)
-touch $testdir/file0 $testdir/file1
-_pwrite_byte 0x61 0 $((blksz * nr_free)) $testdir/eat_my_space >> $seqres.full 
2>&1
+_fill_fs $((32 * 1024 * 1024)) $testdir/space $blksz 0 >> $seqres.full 2>&1

Given that _fill_fs has exponential backoff to try to really fill the fs,
why not call it with $((blksz * nr_free)) instead of a fixed 32MB?

OK, I'll send v4 soon.
Thanks for all your comments.

Regards,
Xiaoguang Wang


--D


  sync
  
  echo "CoW the big file"

diff --git a/tests/generic/172