Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2016-02-16 Thread Qu Wenruo



Ángel González wrote on 2016/02/16 23:21 +0100:




Which should be my next steps?



Try btrfs-progs 4.4 to see if all these false alert goes a way.

Thanks,
Qu


Thanks!
Those "errors" are indeed gone after updating btrfs-progs from 4.3.1 to
4.4. Sorry for the fuss.


It's strange though if it was supposed to only happen with non-skinny
metadata, since I didn't manually specify any flags, and supposedly
skinny is the default since 3.18 (the btrfs partition was created with
a newer version).


If you're really interesting in whether your fs has skinny metadata 
enabled, you can check btrfs-show-super output.


Like the following output indicates skinny metadata:
--
incompat_flags  0x161
( MIXED_BACKREF |
  BIG_METADATA |
  EXTENDED_IREF |
  SKINNY_METADATA ) <

Re: [PATCH] btrfs: Avoid BUG_ON()s because of ENOMEM caused by kmalloc() failure

2016-02-16 Thread Satoru Takeuchi

On 2016/02/16 2:53, David Sterba wrote:

On Mon, Feb 15, 2016 at 02:38:09PM +0900, Satoru Takeuchi wrote:

There are some BUG_ON()'s after kmalloc() as follows.

=
foo = kmalloc();
BUG_ON(!foo);   /* -ENOMEM case */
=

A Docker + memory cgroup user hit these BUG_ON()s.

https://bugzilla.kernel.org/show_bug.cgi?id=112101

Since it's very hard to handle these ENOMEMs properly,
preventing these kmalloc() failures to avoid these
BUG_ON()s for now, are a bit better than the current
implementation anyway.


Beware that the NOFAIL semantics is can cause deadlocks if it's on the
critical writeback path or and can be reentered from itself through the
reclaim. Unless you're sure that this is not the case, please do not add
them just because it would seemingly fix the allocation failures.


About the all cases I changed, kmalloc()s can block
since gfp_flags_allow_blocking() are true. Then no locks
are acquired here and deadlocks don't happen.

Am I missing something?



In the docker example, the memory is limited by cgroups so the NOFAIL
mode can exhaust all reserves and just loop endlessly waiting for the
OOM killer to get some memory or just waiting without any chance to
progress.


I consider triggering OOM killer and killing processes
in a cgroup are better than killing whole system.

About the possibility of endless loop, there are many
such problems in the whole kernel. Of course it can be
said to Btrfs.

==
$ grep -rnH __GFP_NOFAIL fs/btrfs/
fs/btrfs/extent-tree.c:5970: GFP_NOFS | __GFP_NOFAIL);
fs/btrfs/extent-tree.c:6043: bytenr + num_bytes - 1, GFP_NOFS | __GFP_NOFAIL);
fs/btrfs/extent_io.c:4643: eb = kmem_cache_zalloc(extent_buffer_cache, 
GFP_NOFS|__GFP_NOFAIL);
fs/btrfs/extent_io.c:4909: p = find_or_create_page(mapping, index, 
GFP_NOFS|__GFP_NOFAIL);
==

I understand fixing these problems cooperate with
memory cgroup guys is the best in the long run.
However, I consider bypassing this problem for now
is better than the current implementation.

Thanks,
Satoru


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 6 full, but there is still space left on some devices

2016-02-16 Thread Duncan
Dan Blazejewski posted on Tue, 16 Feb 2016 15:20:12 -0500 as excerpted:

> A little background: I started using BTRFS over a year ago, in RAID 1
> with mixed size drives. A few months ago, I started replacing the disks
> with 4 TB drives, and eventually switched over to RAID 6. I am currently
> running a 6x4TB RAID6 drive configuration, which should give me ~14.5 TB
> usable, but I'm only getting around 11.
> 
> The weird thing is that It seems to completely fill 4/6 of the disks,
> while leaving lots of space free on 2 of the disks. I've tried full
> filesystem balances, yet the problem continues.
> 
> # btrfs fi show
> 
> Label: none  uuid: 78733087-d597-4301-8efa-8e1df800b108
> Total devices 6 FS bytes used 11.59TiB
> devid1 size 3.64TiB used 3.64TiB path /dev/sdd
> devid2 size 3.64TiB used 3.64TiB path /dev/sdg
> devid3 size 3.64TiB used 3.64TiB path /dev/sdf
> devid5 size 3.64TiB used 2.92TiB path /dev/sda
> devid6 size 3.64TiB used 1.48TiB path /dev/sdb
> devid7 size 3.64TiB used 3.64TiB path /dev/sdc
> 
> btrfs-progs v4.2.3
> 
> 
> 
> # btrfs fi df /mnt/data
> 
> Data, RAID6: total=11.67TiB, used=11.58TiB
> System, RAID6: total=64.00MiB, used=1.70MiB
> Metadata, RAID6: total=15.58GiB, used=13.89GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> # btrfs fi usage /mnt/data
> 
> WARNING: RAID56 detected, not implemented

Your btrfs-progs is old and I don't see any indication of kernel version 
at all, but I'll guess it's old as well.  Particularly for raid56 mode, 
which still isn't to the maturity level of the rest of btrfs, using 
current kernel and btrfs-progs is *very* strongly recommended.

Among other things, current userspace 4.4 btrfs fi usage should support 
raid56 mode properly, now.  Also, with newer userspace and kernel, btrfs 
balance supports the stripes= filter, which appears to be what you're 
looking for, to rebalance to full-width stripes anything that's not yet 
full width, thereby evening out your usage.

A full balance /should/ do it as well, I believe, but with raid56 support 
still not yet at the maturity level of btrfs in general, it's likely your 
version is old and buggy in that regard.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Send receive errors

2016-02-16 Thread Kenny MacDermid
Hello,

I use snapshots as backups, and send them to other locations with a
parent. It's very hit or miss if any one of them will actual work.

An example of the latest error:

ERROR: rmdir usr/lib/modules/4.3.3.201512282134-1-grsec/build/.tmp_versions 
failed: No such file or directory

Typically using an earlier snapshot as the parent will work.

Does anyone have any tips on how to make them more reliable?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2016-02-16 Thread Ángel González


> > Which should be my next steps?
> > 
> 
> Try btrfs-progs 4.4 to see if all these false alert goes a way.
> 
> Thanks,
> Qu

Thanks!
Those "errors" are indeed gone after updating btrfs-progs from 4.3.1 to
4.4. Sorry for the fuss.


It's strange though if it was supposed to only happen with non-skinny
metadata, since I didn't manually specify any flags, and supposedly
skinny is the default since 3.18 (the btrfs partition was created with
a newer version).


Thanks for your support
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID 6 full, but there is still space left on some devices

2016-02-16 Thread Dan Blazejewski
Hello,

I've searched high and low about my issue, but have been unable to
turn up anything like what I'm seeing right now.

A little background: I started using BTRFS over a year ago, in RAID 1
with mixed size drives. A few months ago, I started replacing the
disks with 4 TB drives, and eventually switched over to RAID 6. I am
currently running a 6x4TB RAID6 drive configuration, which should give
me ~14.5 TB
usable, but I'm only getting around 11.

The weird thing is that It seems to completely fill 4/6 of the disks,
while leaving lots of space free on 2 of the disks. I've tried full
filesystem balances, yet the problem continues.

# btrfs fi show

Label: none  uuid: 78733087-d597-4301-8efa-8e1df800b108
Total devices 6 FS bytes used 11.59TiB
devid1 size 3.64TiB used 3.64TiB path /dev/sdd
devid2 size 3.64TiB used 3.64TiB path /dev/sdg
devid3 size 3.64TiB used 3.64TiB path /dev/sdf
devid5 size 3.64TiB used 2.92TiB path /dev/sda
devid6 size 3.64TiB used 1.48TiB path /dev/sdb
devid7 size 3.64TiB used 3.64TiB path /dev/sdc

btrfs-progs v4.2.3



# btrfs fi df /mnt/data

Data, RAID6: total=11.67TiB, used=11.58TiB
System, RAID6: total=64.00MiB, used=1.70MiB
Metadata, RAID6: total=15.58GiB, used=13.89GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
dan@Morpheus:/mnt/data/temp$ sudo btrfs fi usage /mnt/data



# btrfs fi usage /mnt/data

WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
WARNING: RAID56 detected, not implemented
Overall:
Device size:  21.83TiB
Device allocated:0.00B
Device unallocated:   21.83TiB
Device missing:  0.00B
Used:0.00B
Free (estimated):0.00B  (min: 8.00EiB)
Data ratio:   0.00
Metadata ratio:   0.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,RAID6: Size:11.67TiB, Used:11.58TiB
   /dev/sda2.92TiB
   /dev/sdb1.48TiB
   /dev/sdc3.63TiB
   /dev/sdd3.63TiB
   /dev/sdf3.63TiB
   /dev/sdg3.63TiB

Metadata,RAID6: Size:15.58GiB, Used:13.89GiB
   /dev/sda4.05GiB
   /dev/sdb1.50GiB
   /dev/sdc5.01GiB
   /dev/sdd5.01GiB
   /dev/sdf5.01GiB
   /dev/sdg5.01GiB

System,RAID6: Size:64.00MiB, Used:1.70MiB
   /dev/sda   16.00MiB
   /dev/sdb   16.00MiB
   /dev/sdc   16.00MiB
   /dev/sdd   16.00MiB
   /dev/sdf   16.00MiB
   /dev/sdg   16.00MiB

Unallocated:
   /dev/sda  733.65GiB
   /dev/sdb2.15TiB
   /dev/sdc1.02MiB
   /dev/sdd1.02MiB
   /dev/sdf1.02MiB
   /dev/sdg1.02MiB




Can anyone shed some light on why a full balance (sudo btrfs balance
start /mnt/data) doesnt seem to straighten this out? Any and all help
is appreciated.


Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Docs]? Only one Subvolume with DUP (or different parameters)?

2016-02-16 Thread Hugo Mills
On Tue, Feb 16, 2016 at 08:25:47PM +0100, Christian Völker wrote:
> Hi Guys,
> 
> sorry for the simple question and I assume every developer here laughs
> about this question.
> 
> Anyway:
> 
> I have read loads of documents but did not find an answer for sure. Even
> though I assume I am right.
> 
> On a btrfs filesystem created; is it possible to have subvolumes with
> data duplication and another subvolume without (resp. with just metadata
> duplication)?

   No.

   It may happen at some point, but it's not possible right now.

   Hugo.

> I have some large filesystems currently with ext4 and I am thinking of
> changing to btrfs. Some of the data is more important than others. So I
> want to have data duplication on the important files (sorted in a mount
> point) and without for the other subvolume.
> 
> So I want to have the advantage of redundancy of important files
> combined with the flexibility of the volume manager and shared disk space.
> 
> Possible?
> 
> 
> GReetings
> 
> Christian
> 

-- 
Hugo Mills | "I don't like the look of it, I tell you."
hugo@... carfax.org.uk | "Well, stop looking at it, then."
http://carfax.org.uk/  |
PGP: E2AB1DE4  | The Goons


signature.asc
Description: Digital signature


[Docs]? Only one Subvolume with DUP (or different parameters)?

2016-02-16 Thread Christian Völker
Hi Guys,

sorry for the simple question and I assume every developer here laughs
about this question.

Anyway:

I have read loads of documents but did not find an answer for sure. Even
though I assume I am right.

On a btrfs filesystem created; is it possible to have subvolumes with
data duplication and another subvolume without (resp. with just metadata
duplication)?

I have some large filesystems currently with ext4 and I am thinking of
changing to btrfs. Some of the data is more important than others. So I
want to have data duplication on the important files (sorted in a mount
point) and without for the other subvolume.

So I want to have the advantage of redundancy of important files
combined with the flexibility of the volume manager and shared disk space.

Possible?


GReetings

Christian

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BZ#101951, Overlayfs on top of btrfs causes kernel oops + freeze

2016-02-16 Thread Filipe Manana
On Tue, Feb 16, 2016 at 4:08 PM, Colin Ian King
 wrote:
> On 16/02/16 15:51, Filipe Manana wrote:
>> On Tue, Feb 16, 2016 at 3:38 PM, Colin Ian King
>>  wrote:
>>> Hi there,
>>>
>>> bug: https://bugzilla.kernel.org/show_bug.cgi?id=101951 and also
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1532145
>>>
>>> Commit 4bacc9c9234c7c8eec44f5ed4e960d9f96fa0f01 ("overlayfs: Make f_path
>>> always point to the overlay and f_inode to the underlay") resulted in an
>>> issue when using a combination of btrfs and overlayfs.  This is
>>> noticeable when doing a fsync() on a file in a chroot with overlayfs on
>>> top of btrfs; we hit a kernel oops in btrfs_sync_file() on
>>> atomic_inc(>log_batch) because root is NULL.
>>>
>>> I've debugged this further and found that in btrfs_sync_file():
>>>
>>> struct inode *inode = d_inode(dentry);
>>>
>>> does not return the inode I expected when using the stacked overlay fs,
>>> where as:
>>>
>>> struct inode *inode = file_inode(file);
>>>
>>> does.
>>
>> See the discussion at
>> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg48131.html
>>
>> You can get along with file_inode() in btrfs_sync_file(), but not
>> later the fsync code path where we traverse the hierarchy up using
>> dentries.
>> More details on that thread.
>
> Ah, good. So was there any resolution on a way forward for a fix?

Nop.

>
>>
>>>
>>> However, I'm not well at all well versed in btrfs, so I am not confident
>>> this is a actually correct.  Any comments?
>>>
>>> Colin
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BZ#101951, Overlayfs on top of btrfs causes kernel oops + freeze

2016-02-16 Thread Colin Ian King
On 16/02/16 16:11, Filipe Manana wrote:
> On Tue, Feb 16, 2016 at 4:08 PM, Colin Ian King
>  wrote:
>> On 16/02/16 15:51, Filipe Manana wrote:
>>> On Tue, Feb 16, 2016 at 3:38 PM, Colin Ian King
>>>  wrote:
 Hi there,

 bug: https://bugzilla.kernel.org/show_bug.cgi?id=101951 and also
 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1532145

 Commit 4bacc9c9234c7c8eec44f5ed4e960d9f96fa0f01 ("overlayfs: Make f_path
 always point to the overlay and f_inode to the underlay") resulted in an
 issue when using a combination of btrfs and overlayfs.  This is
 noticeable when doing a fsync() on a file in a chroot with overlayfs on
 top of btrfs; we hit a kernel oops in btrfs_sync_file() on
 atomic_inc(>log_batch) because root is NULL.

 I've debugged this further and found that in btrfs_sync_file():

 struct inode *inode = d_inode(dentry);

 does not return the inode I expected when using the stacked overlay fs,
 where as:

 struct inode *inode = file_inode(file);

 does.
>>>
>>> See the discussion at
>>> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg48131.html
>>>
>>> You can get along with file_inode() in btrfs_sync_file(), but not
>>> later the fsync code path where we traverse the hierarchy up using
>>> dentries.
>>> More details on that thread.
>>
>> Ah, good. So was there any resolution on a way forward for a fix?
> 
> Nop.
> 
OK, so chroots don't work, that's a bit of a show stopper :-/

>>
>>>

 However, I'm not well at all well versed in btrfs, so I am not confident
 this is a actually correct.  Any comments?

 Colin
 --
 To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BZ#101951, Overlayfs on top of btrfs causes kernel oops + freeze

2016-02-16 Thread Colin Ian King
On 16/02/16 15:51, Filipe Manana wrote:
> On Tue, Feb 16, 2016 at 3:38 PM, Colin Ian King
>  wrote:
>> Hi there,
>>
>> bug: https://bugzilla.kernel.org/show_bug.cgi?id=101951 and also
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1532145
>>
>> Commit 4bacc9c9234c7c8eec44f5ed4e960d9f96fa0f01 ("overlayfs: Make f_path
>> always point to the overlay and f_inode to the underlay") resulted in an
>> issue when using a combination of btrfs and overlayfs.  This is
>> noticeable when doing a fsync() on a file in a chroot with overlayfs on
>> top of btrfs; we hit a kernel oops in btrfs_sync_file() on
>> atomic_inc(>log_batch) because root is NULL.
>>
>> I've debugged this further and found that in btrfs_sync_file():
>>
>> struct inode *inode = d_inode(dentry);
>>
>> does not return the inode I expected when using the stacked overlay fs,
>> where as:
>>
>> struct inode *inode = file_inode(file);
>>
>> does.
> 
> See the discussion at
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg48131.html
> 
> You can get along with file_inode() in btrfs_sync_file(), but not
> later the fsync code path where we traverse the hierarchy up using
> dentries.
> More details on that thread.

Ah, good. So was there any resolution on a way forward for a fix?

> 
>>
>> However, I'm not well at all well versed in btrfs, so I am not confident
>> this is a actually correct.  Any comments?
>>
>> Colin
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BZ#101951, Overlayfs on top of btrfs causes kernel oops + freeze

2016-02-16 Thread Filipe Manana
On Tue, Feb 16, 2016 at 3:38 PM, Colin Ian King
 wrote:
> Hi there,
>
> bug: https://bugzilla.kernel.org/show_bug.cgi?id=101951 and also
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1532145
>
> Commit 4bacc9c9234c7c8eec44f5ed4e960d9f96fa0f01 ("overlayfs: Make f_path
> always point to the overlay and f_inode to the underlay") resulted in an
> issue when using a combination of btrfs and overlayfs.  This is
> noticeable when doing a fsync() on a file in a chroot with overlayfs on
> top of btrfs; we hit a kernel oops in btrfs_sync_file() on
> atomic_inc(>log_batch) because root is NULL.
>
> I've debugged this further and found that in btrfs_sync_file():
>
> struct inode *inode = d_inode(dentry);
>
> does not return the inode I expected when using the stacked overlay fs,
> where as:
>
> struct inode *inode = file_inode(file);
>
> does.

See the discussion at
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg48131.html

You can get along with file_inode() in btrfs_sync_file(), but not
later the fsync code path where we traverse the hierarchy up using
dentries.
More details on that thread.

>
> However, I'm not well at all well versed in btrfs, so I am not confident
> this is a actually correct.  Any comments?
>
> Colin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


BZ#101951, Overlayfs on top of btrfs causes kernel oops + freeze

2016-02-16 Thread Colin Ian King
Hi there,

bug: https://bugzilla.kernel.org/show_bug.cgi?id=101951 and also
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1532145

Commit 4bacc9c9234c7c8eec44f5ed4e960d9f96fa0f01 ("overlayfs: Make f_path
always point to the overlay and f_inode to the underlay") resulted in an
issue when using a combination of btrfs and overlayfs.  This is
noticeable when doing a fsync() on a file in a chroot with overlayfs on
top of btrfs; we hit a kernel oops in btrfs_sync_file() on
atomic_inc(>log_batch) because root is NULL.

I've debugged this further and found that in btrfs_sync_file():

struct inode *inode = d_inode(dentry);

does not return the inode I expected when using the stacked overlay fs,
where as:

struct inode *inode = file_inode(file);

does.

However, I'm not well at all well versed in btrfs, so I am not confident
this is a actually correct.  Any comments?

Colin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Btrfs fix for direct IO error reporting

2016-02-16 Thread fdmanana
From: Filipe Manana 

Hi Chris,

Please consider the following fix for an upcoming 4.5 release candidate.
It fixes a problem where if the bio for a direct IO request fails, we end
reporting success to userspace. For example, for a direct IO write of 64K,
if the block layer notifies us that an IO error happened with our bio, we
end up returning the value 64K to user space instead of -EIO, making the
application think that its write request succeeded.

This is reproducible with the new generic test cases in xfstests: 271,
272, 276 and 278 (which use dm's error target to simulate IO errors).

Thanks.

The following changes since commit bc4ef7592f657ae81b017207a1098817126ad4cb:

  btrfs: properly set the termination value of ctx->pos in readdir (2016-02-11 
07:01:59 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git for-chris-4.5

for you to fetch changes up to 1636d1d77ef4e01e57f706a4cae3371463896136:

  Btrfs: fix direct IO requests not reporting IO error to user space 
(2016-02-16 03:41:26 +)


Filipe Manana (1):
  Btrfs: fix direct IO requests not reporting IO error to user space

 fs/btrfs/inode.c | 2 ++
 1 file changed, 2 insertions(+)

-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: fix direct IO requests not reporting IO error to user space

2016-02-16 Thread fdmanana
From: Filipe Manana 

If a bio for a direct IO request fails, we were not setting the error in
the parent bio (the main DIO bio), making us not return the error to
user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO()
return the number of bytes issued for IO and not the error a bio created
and submitted by btrfs_submit_direct() got from the block layer.
This essentially happens because when we call:

   dio_end_io(dio_bio, bio->bi_error);

It does not set dio_bio->bi_error to the value of the second argument.
So just add this missing assignment in endio callbacks, just as we do in
the error path at btrfs_submit_direct() when we fail to clone the dio bio
or allocate its private object. This follows the convention of what is
done with other similar APIs such as bio_endio() where the caller is
responsible for setting the bi_error field in the bio it passes as an
argument to bio_endio().

This was detected by the new generic test cases in xfstests: 271, 272,
276 and 278. Which essentially setup a dm error target, then load the
error table, do a direct IO write and unload the error table. They
expect the write to fail with -EIO, which was not getting reported
when testing against btrfs.

Cc: sta...@vger.kernel.org  # 4.3+
Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio")
Signed-off-by: Filipe Manana 
---

V2: Updated commit message to reflect affected kernel versions.

 fs/btrfs/inode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 600bf0d..e0ad8b2 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7985,6 +7985,7 @@ static void btrfs_endio_direct_read(struct bio *bio)
 
kfree(dip);
 
+   dio_bio->bi_error = bio->bi_error;
dio_end_io(dio_bio, bio->bi_error);
 
if (io_bio->end_io)
@@ -8039,6 +8040,7 @@ static void btrfs_endio_direct_write(struct bio *bio)
 
kfree(dip);
 
+   dio_bio->bi_error = bio->bi_error;
dio_end_io(dio_bio, bio->bi_error);
bio_put(bio);
 }
-- 
2.7.0.rc3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Deadlock while removing device, kernel 4.4.1

2016-02-16 Thread Psalle
This is a test system so I'm reporting in case this is unknown but no 
data at risk.


This filesystem was created with a device (well, actually partition) 
/dev/sdb3, then /dev/sdc{2,3,4} were added, and finally I attempted to 
remove /dev/sdb3. No profiles were passed at any point.


Briefly after starting the remove, which seemed to proceed fine 
according to fi show, I started a rsync involving around 8GB from 
another fs into the one being reshaped. Not sure if this could have been 
related; rsync never transferred anything. Source was a degraded raid5 
with six devices, one of them missing.


Soon everything requiring disk access froze. This was with latest ubuntu 
stable upstream, i.e. 4.4.1-040401-generic


I rebooted without problems to mount the filesystems. As I write, I'm 
doing the same process with latest 15.10 kernel 4.2.0-27-generic, for 
the moment things going smoothly.


Login as root I captured the dmesg. Here is the final bit:

[  600.114436] INFO: task D-Bus thread:7692 blocked for more than 120 
seconds.
[  600.114438]   Tainted: P   OE   4.4.1-040401-generic 
#201601311534
[  600.114440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  600.114442] D-Bus threadD 88007e4bfde8 0  7692   2842 
0x
[  600.114446]  88007e4bfde8  81e11500 
8800b0b65940
[  600.114450]  88007e4c 8800bf509e68 8800bf509e80 
88007e4bff58
[  600.114454]  8800b0b65940 88007e4bfe00 817f9b15 
8800b0b65940

[  600.114458] Call Trace:
[  600.114461]  [] schedule+0x35/0x80
[  600.114464]  [] rwsem_down_read_failed+0xe0/0x140
[  600.114467]  [] ? 
schedule_hrtimeout_range_clock+0x19/0x40

[  600.114471]  [] call_rwsem_down_read_failed+0x14/0x30
[  600.114474]  [] ? down_read+0x20/0x30
[  600.114477]  [] __do_page_fault+0x375/0x400
[  600.114480]  [] do_page_fault+0x22/0x30
[  600.114483]  [] page_fault+0x28/0x30
[  600.114487] INFO: task BrowserBlocking:7697 blocked for more than 120 
seconds.
[  600.114489]   Tainted: P   OE   4.4.1-040401-generic 
#201601311534
[  600.114491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  600.114493] BrowserBlocking D 88003565fbe0 0  7697   2842 
0x
[  600.114497]  88003565fbe0 0058c6dff62b 88011abf8000 
8800ae816600
[  600.114501]  88003566 ff00 8800ae816600 
8800ae816600
[  600.114505]  8800a35c1c70 88003565fbf8 817f9b15 
8800a35c1cd8

[  600.114509] Call Trace:
[  600.114512]  [] schedule+0x35/0x80
[  600.114534]  [] btrfs_tree_read_lock+0xe6/0x140 [btrfs]
[  600.114538]  [] ? wake_atomic_t_function+0x60/0x60
[  600.114554]  [] btrfs_read_lock_root_node+0x34/0x50 
[btrfs]

[  600.114569]  [] btrfs_search_slot+0x73f/0x9f0 [btrfs]
[  600.114574]  [] ? crypto_shash_update+0x30/0xe0
[  600.114593]  [] 
btrfs_check_dir_item_collision+0x77/0x120 [btrfs]

[  600.114614]  [] btrfs_rename2+0x130/0x7b0 [btrfs]
[  600.114618]  [] ? generic_permission+0x110/0x190
[  600.114622]  [] vfs_rename+0x54a/0x870
[  600.114626]  [] ? security_path_rename+0x20/0xd0
[  600.114630]  [] SyS_rename+0x38b/0x3d0
[  600.114634]  [] entry_SYSCALL_64_fastpath+0x16/0x75

There's more before this but it looks similar.

Known issue?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Unable to mount BTRFS home dir anymore

2016-02-16 Thread Duncan
Bhasker C V posted on Tue, 16 Feb 2016 08:24:24 +0100 as excerpted:

>  Help with recovery of BTRFS home directory data.
>  I have been using BTRFS happily for an year now. It has worked across
> power failures and many such situations.
> 
>  Last week, however, the filesystem could not be mounted after a power
>  failure.
> None of the following methods were helpful
> 
> 1) I tried ro,recovery,nospace_cache,nospace_cache option of mount 2) I
> tried btrfs-zero-log -y -v 3) btrfsck  --repair --init-csum-tree
> 
> btrfsck does a SIGSEGV in the end.
> 
> Please can someone help me by telling me how to proceed ?
> 
> Kernel: 4.2.0

First, kernel 4.2 series is not an LTS and is already out of mainline 
current-kernel stable support, so the recommendation would be to either 
upgrade to 4.4 current, which is also an LTS series, or downgrade to the 
previous 4.1 LTS series.  An alternative, of course, would be to continue 
to use your distro kernel if they support it, but in that case, you 
should probably look to your distro for btrfs support as well, since this 
list tends to track the mainline kernel series and we really don't track 
what stability patches random distros might or might not backport to 
their random non-mainline-LTS-series kernels, and thus don't really know 
their stability status.

Similarly for btrfs userspace (btrfs-progs), where the rule of thumb is 
to use at least the latest release matching your kernel series, which if 
you follow the kernel recommendations, will avoid getting too far behind 
on userspace as well.

Tho once you're actually needing to work with an offline filesystem, 
using btrfs check, etc to try to fix it, or btrfs restore to restore 
files from it, the latest userspace is recommended, as it will have the 
latest patches and thus be most likely to successfully recover your data.


Second, as the sysadmin's first rule of backups states in its simplest 
form, if you don't have at least one backup, you're defining the data 
without that backup as worth less than the time and resources necessary 
to do that backup.

And of course, that's the general rule, as it applies to fully mature and 
stable filesystems.  Btrfs however, is still stabilizing and maturing, 
and while stable enough for daily use if you have backups, it's not yet 
fully stable and mature, so the sysadmin's first rule of backups applies 
even more strongly on btrfs than it does in the normal, fully stable and 
mature filesystem, case.

As such, you can be happy, as you either have a backup to restore your 
files from, or you had already by your actions defined those files as of 
only trivial value, with your time and the resources necessary to do that 
backup of more value than the data you weren't backing up.  So even if 
you lose the data, you saved what was self-evidently more valuable to 
you, the time and resources that you'd have otherwise spent doing that 
backup, and thus can be happy that you saved the real important stuff. 
=:^)

So no sweat.  Just restore from your backups, or if you didn't have them, 
the data was self-evidently of only trivial, throw-away value, in any 
case. =:^)

Meanwhile, third, now assuming your data was valuable enough to be backed 
up, and you do have backups to use if you have to, but they weren't 
necessarily current, and you'd prefer to avoid redoing the lost work 
between the time of your backup and the time the filesystem crashed on 
you, if at all possible...

In this case, btrfs restore is the tool most likely to help you recover 
the data in as close a state to current as possible.  Btrfs restore works 
with the /unmounted/ filesystem, attempting to find and pull files off 
it, saving them to some other mounted filesystem (which doesn't have to 
be btrfs) as it recovers them.

Again, you'll want to be using current btrfs-tools (4.4 as I type this) 
if possible, as it has the best chance at restoring files.  For btrfs 
restore, the kernel version doesn't matter, as userspace is doing all the 
work using its own code.

Ideally, you'll simply be able to point btrfs restore at the bad device
(s) and tell it where to put the files as it restores them, and it'll go 
from there.  However, if that doesn't work, there's still a chance to 
make it work manually, by finding suitable old root nodes (which btrfs 
keeps around due to copy-on-write) using btrfs-find-root, and pointing 
btrfs restore at them using its -t  option.  This does tend to 
get a bit down and dirty technical, however, and some potential users 
find they can't handle it at that technical level, and have to give up.

There's a page on the wiki covering the procedure, tho it's not 
necessarily current and you may have to read between the lines a bit.

https://btrfs.wiki.kernel.org/index.php/Restore

[Unfortunately, I'm getting sec_error_ocsp_bad_signature for the OCSP 
response for the wiki ATM.  Here's a couple archive links to it, courtesy 
of the resurrect-page plugin I run.]

https://archive.is/PPkKP

Re: [GIT PULL] Fujitsu for 4.5

2016-02-16 Thread David Sterba
On Mon, Feb 15, 2016 at 12:07:01PM +0800, Zhao Lei wrote:
> Hi, David Sterba
> 
> Thanks for notice me, sorry for reply late.
> 
> > From: David Sterba [mailto:dste...@suse.cz]
> > Sent: Wednesday, February 10, 2016 6:14 PM
> > To: Zhao Lei 
> > Cc: 'Chris Mason' ; 'btrfs' 
> > Subject: Re: [GIT PULL] Fujitsu for 4.5
> > 
> > On Wed, Jan 13, 2016 at 05:28:12PM +0800, Zhao Lei wrote:
> > > This is collection of some bug fix, enhance and cleanup from fujitsu
> > > against btrfs for v4.5, mainly for reada, plus some small fix and cleanup
> > > for scrub and raid56.
> > >
> > > All patchs are in btrfs-maillist, rebased on top of integration-4.5.
> > 
> > I was trying to isolate safe fixes for 4.5 but saw hangs (same as Chris
> > reported) and was not able to find the right followups.
> > 
> The problem is discussed in btrfs maillist from:
> http://www.spinics.net/lists/linux-btrfs/msg51275.html
> to
> http://www.spinics.net/lists/linux-btrfs/msg51538.html
> 
> It is fixed now.
> 
> > Can you please collect all your readahead patches sent recently? I got
> > lost. Make it a git branch and let me know, I'll add it to for-next and
> > send pull request for 4.6 later.
> I collected all reada patchs in:
> https://github.com/zhaoleidd/btrfs.git integration-4.5

There's patch

btrfs: Continue write in case of can_not_nocow

that does not count as readahead fix, so I'll pull it via different
branch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/15] btrfs: rename flags for vol args v2

2016-02-16 Thread David Sterba
On Tue, Feb 16, 2016 at 05:18:12PM +0800, Anand Jain wrote:
> 
>   Just checked next/delete-by-id-v3.
>   You may consider to update progs as well.
> 
> Reviewed-by: Anand Jain 

Thanks for the reviews, I'll update the patches and push to next. Progs
update will follow.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/15] btrfs: rename flags for vol args v2

2016-02-16 Thread Anand Jain



..

On 02/16/2016 01:34 AM, David Sterba wrote:

Rename BTRFS_DEVICE_BY_ID so it's more descriptive that we specify the
device by id, it'll be part of the public API. The mask of supported
flags is also renamed, only for internal use.

The error code for unknown flags is EOPNOTSUPP, fixed.

Signed-off-by: David Sterba 
---
  fs/btrfs/ioctl.c   | 8 
  include/uapi/linux/btrfs.h | 7 ---
  2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 57fb05960435..6bcd7700b9fd 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2689,8 +2689,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void 
__user *arg)
}

/* Check for compatibility reject unknown flags */
-   if (vol_args->flags & ~BTRFS_VOL_ARG_V2_FLAGS)
-   return -ENOTTY;
+   if (vol_args->flags & ~BTRFS_VOL_ARG_V2_FLAGS_SUPPORTED)
+   return -EOPNOTSUPP;

if (atomic_xchg(>fs_info->mutually_exclusive_operation_running,
1)) {
@@ -2699,7 +2699,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void 
__user *arg)
}

mutex_lock(>fs_info->volume_mutex);
-   if (vol_args->flags & BTRFS_DEVICE_BY_ID) {
+   if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID) {
ret = btrfs_rm_device(root, NULL, vol_args->devid);
} else {
vol_args->name[BTRFS_PATH_NAME_MAX] = '\0';
@@ -2709,7 +2709,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void 
__user *arg)
atomic_set(>fs_info->mutually_exclusive_operation_running, 0);

if (!ret) {
-   if (vol_args->flags & BTRFS_DEVICE_BY_ID)
+   if (vol_args->flags & BTRFS_DEVICE_SPEC_BY_ID)
btrfs_info(root->fs_info, "device deleted: id %llu",
vol_args->devid);
else
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 396a4efca775..3975e683af72 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -36,12 +36,13 @@ struct btrfs_ioctl_vol_args {
  #define BTRFS_SUBVOL_CREATE_ASYNC (1ULL << 0)
  #define BTRFS_SUBVOL_RDONLY   (1ULL << 1)
  #define BTRFS_SUBVOL_QGROUP_INHERIT   (1ULL << 2)
-#define BTRFS_DEVICE_BY_ID (1ULL << 3)
-#define BTRFS_VOL_ARG_V2_FLAGS \
+#define BTRFS_DEVICE_SPEC_BY_ID(1ULL << 3)
+


 Just checked next/delete-by-id-v3.
 You may consider to update progs as well.

Reviewed-by: Anand Jain 

Thanks, Anand




+#define BTRFS_VOL_ARG_V2_FLAGS_SUPPORTED   \
(BTRFS_SUBVOL_CREATE_ASYNC |\
BTRFS_SUBVOL_RDONLY |   \
BTRFS_SUBVOL_QGROUP_INHERIT |   \
-   BTRFS_DEVICE_BY_ID)
+   BTRFS_DEVICE_SPEC_BY_ID)

  #define BTRFS_FSID_SIZE 16
  #define BTRFS_UUID_SIZE 16


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] btrfs: rename btrfs_find_device_by_user_input

2016-02-16 Thread Anand Jain


Reviewed-by: Anand Jain 

Thanks, Anand


On 02/16/2016 01:34 AM, David Sterba wrote:

For clarity how we are going to find the device, let's call it a device
specifier, devspec for short. Also rename the arguments that are a
leftover from previous function purpose.

Signed-off-by: David Sterba 
---
  fs/btrfs/dev-replace.c |  2 +-
  fs/btrfs/volumes.c | 17 ++---
  fs/btrfs/volumes.h |  4 ++--
  3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 3e2616d151d9..1731f92b6247 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -322,7 +322,7 @@ int btrfs_dev_replace_start(struct btrfs_root *root,

/* the disk copy procedure reuses the scrub code */
mutex_lock(_info->volume_mutex);
-   ret = btrfs_find_device_by_user_input(root, args->start.srcdevid,
+   ret = btrfs_find_device_by_devspec(root, args->start.srcdevid,
args->start.srcdev_name,
_device);
if (ret) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8fee24f92574..05d9bc0cdd49 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1779,7 +1779,7 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path, u64 devid)
if (ret)
goto out;

-   ret = btrfs_find_device_by_user_input(root, devid, device_path,
+   ret = btrfs_find_device_by_devspec(root, devid, device_path,
);
if (ret)
goto out;
@@ -2065,23 +2065,26 @@ int btrfs_find_device_missing_or_by_path(struct 
btrfs_root *root,
}
  }

-int btrfs_find_device_by_user_input(struct btrfs_root *root, u64 srcdevid,
-char *srcdev_name,
+/*
+ * Lookup a device given by device id, or the path if the id is 0.
+ */
+int btrfs_find_device_by_devspec(struct btrfs_root *root, u64 devid,
+char *devpath,
 struct btrfs_device **device)
  {
int ret;

-   if (srcdevid) {
+   if (devid) {
ret = 0;
-   *device = btrfs_find_device(root->fs_info, srcdevid, NULL,
+   *device = btrfs_find_device(root->fs_info, devid, NULL,
NULL);
if (!*device)
ret = -ENOENT;
} else {
-   if (!srcdev_name || !srcdev_name[0])
+   if (!devpath || !devpath[0])
return -EINVAL;

-   ret = btrfs_find_device_missing_or_by_path(root, srcdev_name,
+   ret = btrfs_find_device_missing_or_by_path(root, devpath,
   device);
}
return ret;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index a13a538cb01e..febdb7bc9370 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -448,8 +448,8 @@ void btrfs_close_extra_devices(struct btrfs_fs_devices 
*fs_devices, int step);
  int btrfs_find_device_missing_or_by_path(struct btrfs_root *root,
 char *device_path,
 struct btrfs_device **device);
-int btrfs_find_device_by_user_input(struct btrfs_root *root, u64 srcdevid,
-char *srcdev_name,
+int btrfs_find_device_by_devspec(struct btrfs_root *root, u64 devid,
+char *devpath,
 struct btrfs_device **device);
  struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
const u64 *devid,


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] btrfs: use existing device constraints table btrfs_raid_array

2016-02-16 Thread Anand Jain



yep required optimization. Deleting from my todo list.

Reviewed-by: Anand Jain 


On 02/16/2016 01:34 AM, David Sterba wrote:

We should avoid duplicating the device constraints, let's use the
btrfs_raid_array in btrfs_check_raid_min_devices.

Signed-off-by: David Sterba 
---
  fs/btrfs/volumes.c | 23 +--
  1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a67249582a6f..8fee24f92574 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1730,6 +1730,7 @@ static int btrfs_check_raid_min_devices(struct 
btrfs_fs_info *fs_info,
  {
u64 all_avail;
unsigned seq;
+   int i;

do {
seq = read_seqbegin(_info->profiles_lock);
@@ -1739,22 +1740,16 @@ static int btrfs_check_raid_min_devices(struct 
btrfs_fs_info *fs_info,
fs_info->avail_metadata_alloc_bits;
} while (read_seqretry(_info->profiles_lock, seq));

-   if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices < 4) {
-   return BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET;
-   }
-
-   if ((all_avail & BTRFS_BLOCK_GROUP_RAID1) && num_devices < 2) {
-   return BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET;
-   }
+   for (i = 0; i < BTRFS_NR_RAID_TYPES; i++) {
+   if (!(all_avail & btrfs_raid_group[i]))
+   continue;

-   if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) &&
-   fs_info->fs_devices->rw_devices < 2) {
-   return BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET;
-   }
+   if (num_devices < btrfs_raid_array[i].devs_min) {
+   int ret = btrfs_raid_mindev_error[i];

-   if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) &&
-   fs_info->fs_devices->rw_devices < 3) {
-   return BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET;
+   if (ret)
+   return ret;
+   }
}

return 0;


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Major HDD performance degradation on btrfs receive

2016-02-16 Thread Duncan
Nazar Mokrynskyi posted on Tue, 16 Feb 2016 05:44:30 +0100 as excerpted:

> I have 2 SSD with BTRFS filesystem (RAID) on them and several
> subvolumes. Each 15 minutes I'm creating read-only snapshot of
> subvolumes /root, /home and /web inside /backup.
> After this I'm searching for last common subvolume on /backup_hdd,
> sending difference between latest common snapshot and simply latest
> snapshot to /backup_hdd.
> On top of all above there is snapshots rotation, so that /backup
> contains much less snapshots than /backup_hdd.

One thing thing that you imply, but don't actually make explicit except 
in the btrfs command output and mount options listing, is that /backup_hdd 
is a mountpoint for a second entirely independent btrfs (LABEL=Backup), 
while /backup is a subvolume on the primary / btrfs.  Knowing that is 
quite helpful in figuring out exactly what you're doing. =:^)

Further, implied, but not explicit since some folks use hdd when 
referring to ssds as well, is that the /backup_hdd hdd is spinning rust, 
tho you do make it explicit that the primary btrfs is on ssds.

> I'm using this setup for last 7 months or so and this is luckily the
> longest period when I had no problems with BTRFS at all.
> However, last 2+ months btrfs receive command loads HDD so much that I
> can't even get list of directories in it.
> This happens even if diff between snapshots is really small.
> HDD contains 2 filesystems - mentioned BTRFS and ext4 for other files,
> so I can't even play mp3 file from ext4 filesystem while btrfs receive
> is running.
> Since I'm running everything each 15 minutes this is a real headache.

The *big* question is how many snapshots you have on LABEL=Backup, since 
you mention rotating backups in /backup, but don't mention rotating/
thinning backups on LABEL=Backup, and do explicitly state that it has far 
more snapshots, and with four snapshots an hour, they'll build up rather 
fast if you aren't thinning them.

The rest of this post assumes that's the issue, since you didn't mention 
thinning out the snapshots on LABEL=Backup.  If you're already familiar 
with the snapshot scaling issue and snapshot caps and thinning 
recommendations regularly posted here, feel free to skip the below as 
it'll simply be review. =:^)

Btrfs has scaling issues when there's too many snapshots.  The 
recommendation I've been using is a target of no more than 250 snapshots 
per subvolume, with a target of no more than eight subvolumes and ideally 
no more than four subvolumes being snapshotted per filesystem, which 
doing the math leads to an overall filesystem target snapshot cap of 
1000-2000, and definitely no more than 3000, tho by that point the 
scaling issues are beginning to kick in and you'll feel it in lost 
performance, particularly on spinning rust, when doing btrfs maintenance 
such as snapshotting, send/receive, balance, check, etc.

Unfortunately, many people post here complaining about performance issues 
when they're running 10K+ or even 100K+ snapshots per filesystem and the 
various btrfs maintenance commands have almost ground to a halt. =:^(

You say you're snapshotting three subvolumes, / /home and /web, at 15 
minute intervals.  That's 3*4=12 snapshots per hour, 12*24=288 snapshots 
per day.  If all those are on LABEL=Backup, you're hitting the 250 
snapshots per subvolume target in 250/4/24 = ... just over 2 and a half 
days.  And you're hitting the total per-filesystem snapshots target cap 
in 2000/288= ... just under seven days.

If you've been doing that for 7 months with no thinning, that's 
7*30*288= ... over 60K snapshots!  No *WONDER* you're seeing performance 
issues!

Meanwhile, say you need a file from a snapshot from six months ago.  Are 
you *REALLY* going to care, or even _know_, exactly what 15 minute 
snapshot it was?  And even if you do, just digging thru 60K+ snapshots... 
OK, so we'll assume you sort them by snapshotted subvolume so only have 
to dig thru 20K+ snapshots... just digging thru 20K snapshots to find the 
exact 15-minute snapshot you need... is quite a bit of work!

Instead, suppose you have a "reasonable" thinning program.  First, do you 
really need _FOUR_ snapshots an hour to LABEL=Backup?  Say you make it 
every 20 minutes, three an hour instead of four.  That already kills a 
third of them.  Then, say you take them every 15 or 20 minutes, but only 
send one per hour to LABEL=Backup.  (Or if you want, do them every 15 
minutes and send only ever other one, half-hourly to LABEL=Backup.  The 
point is to keep it both something you're comfortable with but also more 
reasonable.)

For illustration, I'll say you send once an hour.  That's 3*24=72 
snapshots per day, 24/day per subvolume, already a great improvement over 
the 96/day/subvolume and 288/day total you're doing now.

If then once a day, you thin down the third day back to every other hour, 
you'll have 2-3 days worth of hourly snapshots on LABEL=backup, so upto 
72 hourly snapshots per 

Re: [PATCH 12/15] btrfs: indtroduce raid-type to error-code table, for minimum device constraint

2016-02-16 Thread Anand Jain



Nice fix. thanks

Reviewed-by: Anand Jain 


On 02/16/2016 01:34 AM, David Sterba wrote:

Signed-off-by: David Sterba 
---
  fs/btrfs/volumes.c | 15 +++
  fs/btrfs/volumes.h |  2 +-
  2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ae94e06f3e61..a67249582a6f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -118,6 +118,21 @@ const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES] = {
[BTRFS_RAID_RAID6]  = BTRFS_BLOCK_GROUP_RAID6,
  };

+/*
+ * Table to convert BTRFS_RAID_* to the error code if minimum number of devices
+ * condition is not met. Zero means there's no corresponding
+ * BTRFS_ERROR_DEV_*_NOT_MET value.
+ */
+const int btrfs_raid_mindev_error[BTRFS_NR_RAID_TYPES] = {
+   [BTRFS_RAID_RAID10] = BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET,
+   [BTRFS_RAID_RAID1]  = BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
+   [BTRFS_RAID_DUP]= 0,
+   [BTRFS_RAID_RAID0]  = 0,
+   [BTRFS_RAID_SINGLE] = 0,
+   [BTRFS_RAID_RAID5]  = BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
+   [BTRFS_RAID_RAID6]  = BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
+};
+
  static int init_first_rw_device(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct btrfs_device *device);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index c73d027e2f8b..a13a538cb01e 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -340,7 +340,7 @@ struct btrfs_raid_attr {
  };

  extern const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES];
-
+extern const int btrfs_raid_mindev_error[BTRFS_NR_RAID_TYPES];
  extern const u64 btrfs_raid_group[BTRFS_NR_RAID_TYPES];

  struct map_lookup {


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/15] btrfs: pass number of devices to btrfs_check_raid_min_devices

2016-02-16 Thread Anand Jain



looks good.

Reviewed-by: Anand Jain 
Tested-by: Anand Jain 

Thanks.

On 02/16/2016 01:34 AM, David Sterba wrote:

Before this patch, btrfs_check_raid_min_devices would do an off-by-one
check of the constraints and not the miminmum check, as its name
suggests. This is not a problem if the only caller is device remove, but
would be confusing for others.

Add an argument with the exact number and let the caller(s) decide if
this needs any adjustments, like when device replace is running.

Signed-off-by: David Sterba 
---
  fs/btrfs/volumes.c | 35 ---
  1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4fa4a836a072..ae94e06f3e61 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1705,20 +1705,17 @@ static int btrfs_rm_dev_item(struct btrfs_root *root,
return ret;
  }

-static int btrfs_check_raid_min_devices(struct btrfs_fs_info *fs_info)
+/*
+ * Verify that @num_devices satisfies the RAID profile constraints in the whole
+ * filesystem. It's up to the caller to adjust that number regarding eg. device
+ * replace.
+ */
+static int btrfs_check_raid_min_devices(struct btrfs_fs_info *fs_info,
+   u64 num_devices)
  {
u64 all_avail;
-   u64 num_devices;
unsigned seq;

-   num_devices = fs_info->fs_devices->num_devices;
-   btrfs_dev_replace_lock(_info->dev_replace);
-   if (btrfs_dev_replace_is_ongoing(_info->dev_replace)) {
-   WARN_ON(num_devices < 1);
-   num_devices--;
-   }
-   btrfs_dev_replace_unlock(_info->dev_replace);
-
do {
seq = read_seqbegin(_info->profiles_lock);

@@ -1727,21 +1724,21 @@ static int btrfs_check_raid_min_devices(struct 
btrfs_fs_info *fs_info)
fs_info->avail_metadata_alloc_bits;
} while (read_seqretry(_info->profiles_lock, seq));

-   if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices <= 4) {
+   if ((all_avail & BTRFS_BLOCK_GROUP_RAID10) && num_devices < 4) {
return BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET;
}

-   if ((all_avail & BTRFS_BLOCK_GROUP_RAID1) && num_devices <= 2) {
+   if ((all_avail & BTRFS_BLOCK_GROUP_RAID1) && num_devices < 2) {
return BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET;
}

if ((all_avail & BTRFS_BLOCK_GROUP_RAID5) &&
-   fs_info->fs_devices->rw_devices <= 2) {
+   fs_info->fs_devices->rw_devices < 2) {
return BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET;
}

if ((all_avail & BTRFS_BLOCK_GROUP_RAID6) &&
-   fs_info->fs_devices->rw_devices <= 3) {
+   fs_info->fs_devices->rw_devices < 3) {
return BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET;
}

@@ -1760,7 +1757,15 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path, u64 devid)

mutex_lock(_mutex);

-   ret = btrfs_check_raid_min_devices(root->fs_info);
+   num_devices = root->fs_info->fs_devices->num_devices;
+   btrfs_dev_replace_lock(>fs_info->dev_replace);
+   if (btrfs_dev_replace_is_ongoing(>fs_info->dev_replace)) {
+   WARN_ON(num_devices < 1);
+   num_devices--;
+   }
+   btrfs_dev_replace_unlock(>fs_info->dev_replace);
+
+   ret = btrfs_check_raid_min_devices(root->fs_info, num_devices - 1);
if (ret)
goto out;



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/15] btrfs: rename __check_raid_min_devices

2016-02-16 Thread Anand Jain


 thanks.

Reviewed-by: Anand Jain 


On 02/16/2016 01:34 AM, David Sterba wrote:

Underscores are for special functions, use the full prefix for better
stacktrace recognition.

Signed-off-by: David Sterba 
---
  fs/btrfs/volumes.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 20af20b0eaee..4fa4a836a072 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1705,7 +1705,7 @@ static int btrfs_rm_dev_item(struct btrfs_root *root,
return ret;
  }

-static int __check_raid_min_devices(struct btrfs_fs_info *fs_info)
+static int btrfs_check_raid_min_devices(struct btrfs_fs_info *fs_info)
  {
u64 all_avail;
u64 num_devices;
@@ -1760,7 +1760,7 @@ int btrfs_rm_device(struct btrfs_root *root, char 
*device_path, u64 devid)

mutex_lock(_mutex);

-   ret = __check_raid_min_devices(root->fs_info);
+   ret = btrfs_check_raid_min_devices(root->fs_info);
if (ret)
goto out;



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix build warning

2016-02-16 Thread Sudip Mukherjee
We were getting build warning about:
fs/btrfs/extent-tree.c:7021:34: warning: ‘used_bg’ may be used
uninitialized in this function

It is not a valid warning as used_bg is never used uninitilized since
locked is initially false so we can never be in the section where
'used_bg' is used. But gcc is not able to understand that and we can
initialize it while declaring to silence the warning.

Signed-off-by: Sudip Mukherjee 
---
 fs/btrfs/extent-tree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e2287c7..f24e4c3 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7018,7 +7018,7 @@ btrfs_lock_cluster(struct btrfs_block_group_cache 
*block_group,
   struct btrfs_free_cluster *cluster,
   int delalloc)
 {
-   struct btrfs_block_group_cache *used_bg;
+   struct btrfs_block_group_cache *used_bg = NULL;
bool locked = false;
 again:
spin_lock(>refill_lock);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html