[PATCH] btrfs: add regression test for setxattr on subvolume directory

2017-01-24 Thread Omar Sandoval
From: Omar Sandoval 

This is a regression test for "Btrfs: disable xattr operations on
subvolume directories". On v4.9, it will result in an aborted
transaction.

Signed-off-by: Omar Sandoval 
---
 tests/btrfs/047 | 69 +
 tests/btrfs/047.out |  2 ++
 tests/btrfs/group   |  1 +
 3 files changed, 72 insertions(+)
 create mode 100644 tests/btrfs/047
 create mode 100644 tests/btrfs/047.out

diff --git a/tests/btrfs/047 b/tests/btrfs/047
new file mode 100644
index ..c3222f8b
--- /dev/null
+++ b/tests/btrfs/047
@@ -0,0 +1,69 @@
+#! /bin/bash
+# FS QA Test 047
+#
+# Test that we can't set xattrs on read-only subvolume placeholder directories.
+# Regression test for Btrfs: disable xattr operations on subvolume directories.
+#
+#---
+# Copyright (c) 2017 Facebook.  All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/attr
+
+# remove previous $seqres.full before test
+rm -f $seqres.full
+
+# real QA test starts here
+
+_supported_fs btrfs
+_supported_os Linux
+_require_attrs
+_require_scratch
+
+_scratch_mkfs >/dev/null 2>&1
+_scratch_mount
+
+_run_btrfs_util_prog subvolume create "$SCRATCH_MNT/parent"
+_run_btrfs_util_prog subvolume create "$SCRATCH_MNT/parent/child"
+_run_btrfs_util_prog subvolume snapshot "$SCRATCH_MNT/parent" 
"$SCRATCH_MNT/snapshot"
+
+$SETFATTR_PROG -n user.test -v foo "$SCRATCH_MNT/snapshot/child" |& 
_filter_scratch
+
+# The original bug resulted in bogus delayed inodes being inserted, so run the
+# delayed inodes by doing a commit.
+_run_btrfs_util_prog filesystem sync "$SCRATCH_MNT"
+
+status=0
+exit
diff --git a/tests/btrfs/047.out b/tests/btrfs/047.out
new file mode 100644
index ..48555e90
--- /dev/null
+++ b/tests/btrfs/047.out
@@ -0,0 +1,2 @@
+QA output created by 047
+setfattr: SCRATCH_MNT/snapshot/child: Operation not supported
diff --git a/tests/btrfs/group b/tests/btrfs/group
index 88fb8db4..69451c6b 100644
--- a/tests/btrfs/group
+++ b/tests/btrfs/group
@@ -49,6 +49,7 @@
 044 auto quick send
 045 auto quick send
 046 auto quick send
+047 auto quick snapshot attr
 048 auto quick
 049 auto quick
 050 auto quick send
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 07:53:06PM -0700, Chris Murphy wrote:
> On Tue, Jan 24, 2017 at 3:50 PM, Omar Sandoval  wrote:
> 
> > Got this to repro after installing systemd-container. It's happening on
> > lsetxattr() to set the SELinux context on /var/lib/machines, which is a
> > subvolume. Looking into it now. Thanks for all of the help, Chris.
> 
> Aha! So the snapshot part was a goose chase, it has nothing to do with
> that, really. Because I'm taking a snapshot of root, the nested
> /var/lib/machines subvolume is not in that snapshot, so now it has to
> be created by systemd at next boot and the proper selinux label set on
> it.
> 
> It means there's something different about subvolumes and directories
> when it comes to xattrs, and the xattr patch I found in bisect is
> exposing the difference, hence things getting tripped up.

The snapshots were actually the key -- the error was because setxattr()
was being allowed on the read-only directory created in the place of the
/var/lib/machines subvolume. Not sure if you saw the patch I sent up,
but this should fix it: https://patchwork.kernel.org/patch/9536307/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update

2017-01-24 Thread Christoph Anton Mitterer
On Wed, 2017-01-25 at 12:16 +0800, Qu Wenruo wrote:
> New patches are out now.
> 
> Although I just updated 
> 0001-btrfs-progs-lowmem-check-Fix-wrong-block-group-check.patch to
> fix 
> all similar bugs.
> 
> You could get it from github:
> https://github.com/adam900710/btrfs-progs/tree/lowmem_fixes

Sure, will take a while, though (hopefully get it done tomorrow)


> Unfortunately, I didn't find the cause of the remaining error of
> that 
> missing csum.
> And considering the size of your fs, btrfs-image is not possible, so
> I'm 
> afraid you need to test the patches every time it updates.

No worries :-)


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature


Re: btrfs check lowmem vs original

2017-01-24 Thread Qu Wenruo

At 01/24/2017 05:14 AM, Chris Murphy wrote:

OK so all of these pass original check, but have problems reported by
lowmem. Separate notes about each inline.


Thanks for your images!

It really helps a lot.

I tested my patches against these images.
Feel free to test them:
https://github.com/adam900710/btrfs-progs/tree/lowmem_fixes



~500MiB each, these three are data volumes, first two are raid1, third
one is single.
https://drive.google.com/open?id=0B_2Asp8DGjJ9Z3UzWnFKT3A0clU
https://drive.google.com/open?id=0B_2Asp8DGjJ9V0ROdHNoMW1BVE0
https://drive.google.com/open?id=0B_2Asp8DGjJ9Zmd1LXl6MU5WeXc


RAID1 ones are not stable for us to check, as it already contains some 
chunk tree error after recovery.
The single one I didn't download, after 324M one encounters some 
btrfs-image recovery problem.




19MiB, about 15 minutes old, rootfs, OS installation only
https://drive.google.com/open?id=0B_2Asp8DGjJ9TF9LVkFlcDBzOG8


Passed now.



55MiB, about 1 month old, rootfs, not much activity
https://drive.google.com/open?id=0B_2Asp8DGjJ9bkJFc01qcVJxNnM


Passed too.



324MiB, about 5 months old, used as rootfs, all read-write snapshots
used as rootfs are forced readonly, a regression previously reported
without any dev response
https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA
http://www.spinics.net/lists/linux-btrfs/msg61817.html
https://bugzilla.kernel.org/show_bug.cgi?id=191761


Recovery caused quite a lot of false alert on chunk tree.
Still digging if the remaining errors are valid or not.

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update

2017-01-24 Thread Qu Wenruo



At 01/25/2017 08:46 AM, Christoph Anton Mitterer wrote:

On Wed, 2017-01-25 at 08:44 +0800, Qu Wenruo wrote:

Thanks for the test,


You're welcome... I'm happy if I can help :)

Just tell me once you think you found something, and I'll repeat the
testing.


Cheers,
Chr
is.


New patches are out now.

Although I just updated 
0001-btrfs-progs-lowmem-check-Fix-wrong-block-group-check.patch to fix 
all similar bugs.


You could get it from github:
https://github.com/adam900710/btrfs-progs/tree/lowmem_fixes

Unfortunately, I didn't find the cause of the remaining error of that 
missing csum.
And considering the size of your fs, btrfs-image is not possible, so I'm 
afraid you need to test the patches every time it updates.


Sorry for that,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 3:50 PM, Omar Sandoval  wrote:

> Got this to repro after installing systemd-container. It's happening on
> lsetxattr() to set the SELinux context on /var/lib/machines, which is a
> subvolume. Looking into it now. Thanks for all of the help, Chris.

Aha! So the snapshot part was a goose chase, it has nothing to do with
that, really. Because I'm taking a snapshot of root, the nested
/var/lib/machines subvolume is not in that snapshot, so now it has to
be created by systemd at next boot and the proper selinux label set on
it.

It means there's something different about subvolumes and directories
when it comes to xattrs, and the xattr patch I found in bisect is
exposing the difference, hence things getting tripped up.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: disable xattr operations on subvolume directories

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 06:38:02PM -0800, Omar Sandoval wrote:
> From: Omar Sandoval 
> 
> When you snapshot a subvolume containing a subvolume, you get a
> placeholder read-only directory where the subvolume would be. These
> directory inodes have ->i_ops set to btrfs_dir_ro_inode_operations.
> Previously, this didn't include the xattr operation callbacks. The
> conversion to xattr_handlers missed this case, leading to bogus attempts
> to set xattrs on these inodes. This manifested itself as failures when
> running delayed inodes.
> 
> To fix this, clear the IOP_XATTR in ->i_opflags on these inodes.
> 
> Fixes: 6c6ef9f26e59 ("xattr: Stop calling {get,set,remove}xattr inode 
> operations")
> Cc: Andreas Gruenbacher 
> Reported-by: Chris Murphy 
> Signed-off-by: Omar Sandoval 
> ---
> Applies to v4.10-rc4. Chris, this fixes the issue for me, could you please 
> test
> it out? Andreas, does this make sense? I'll try to cook up an xfstest for 
> this.
> 
>  fs/btrfs/inode.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 4e024260ad71..3dacf0786428 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -3835,10 +3835,12 @@ static int btrfs_read_locked_inode(struct inode 
> *inode)
>   break;
>   case S_IFDIR:
>   inode->i_fop = _dir_file_operations;
> - if (root == fs_info->tree_root)
> + if (root == fs_info->tree_root) {
>   inode->i_op = _dir_ro_inode_operations;
> - else
> + inode->i_opflags &= ~IOP_XATTR;
> + } else {
>   inode->i_op = _dir_inode_operations;
> + }
>   break;
>   case S_IFLNK:
>   inode->i_op = _symlink_inode_operations;
> @@ -5710,6 +5712,7 @@ static struct inode *new_simple_dir(struct super_block 
> *s,
>  
>   inode->i_ino = BTRFS_EMPTY_SUBVOL_DIR_OBJECTID;
>   inode->i_op = _dir_ro_inode_operations;
> + inode->i_opflags &= ~IOP_XATTR;
>   inode->i_fop = _dir_operations;
>   inode->i_mode = S_IFDIR | S_IRUGO | S_IWUSR | S_IXUGO;
>   inode->i_mtime = current_time(inode);
> -- 
> 2.11.0
> 

Forgot to cc stable, but 4.9 needs this.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: disable xattr operations on subvolume directories

2017-01-24 Thread Omar Sandoval
From: Omar Sandoval 

When you snapshot a subvolume containing a subvolume, you get a
placeholder read-only directory where the subvolume would be. These
directory inodes have ->i_ops set to btrfs_dir_ro_inode_operations.
Previously, this didn't include the xattr operation callbacks. The
conversion to xattr_handlers missed this case, leading to bogus attempts
to set xattrs on these inodes. This manifested itself as failures when
running delayed inodes.

To fix this, clear the IOP_XATTR in ->i_opflags on these inodes.

Fixes: 6c6ef9f26e59 ("xattr: Stop calling {get,set,remove}xattr inode 
operations")
Cc: Andreas Gruenbacher 
Reported-by: Chris Murphy 
Signed-off-by: Omar Sandoval 
---
Applies to v4.10-rc4. Chris, this fixes the issue for me, could you please test
it out? Andreas, does this make sense? I'll try to cook up an xfstest for this.

 fs/btrfs/inode.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4e024260ad71..3dacf0786428 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3835,10 +3835,12 @@ static int btrfs_read_locked_inode(struct inode *inode)
break;
case S_IFDIR:
inode->i_fop = _dir_file_operations;
-   if (root == fs_info->tree_root)
+   if (root == fs_info->tree_root) {
inode->i_op = _dir_ro_inode_operations;
-   else
+   inode->i_opflags &= ~IOP_XATTR;
+   } else {
inode->i_op = _dir_inode_operations;
+   }
break;
case S_IFLNK:
inode->i_op = _symlink_inode_operations;
@@ -5710,6 +5712,7 @@ static struct inode *new_simple_dir(struct super_block *s,
 
inode->i_ino = BTRFS_EMPTY_SUBVOL_DIR_OBJECTID;
inode->i_op = _dir_ro_inode_operations;
+   inode->i_opflags &= ~IOP_XATTR;
inode->i_fop = _dir_operations;
inode->i_mode = S_IFDIR | S_IRUGO | S_IWUSR | S_IXUGO;
inode->i_mtime = current_time(inode);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update

2017-01-24 Thread Qu Wenruo



At 01/25/2017 12:54 AM, Christoph Anton Mitterer wrote:

Hey Qu.

I was giving your patches a try, again on the very same fs (which saw
however writes in the meantime), from my initial report.

btrfs-progs v4.9 WITHOUT patch:
***
# btrfs check /dev/nbd0 ; echo $?
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
found 7519512838144 bytes used err is 0
total csum bytes: 7330834320
total tree bytes: 10902437888
total fs tree bytes: 2019704832
total extent tree bytes: 1020149760
btree space waste bytes: 925714197
file data blocks allocated: 7509228494848
 referenced 7630551511040
0

# btrfs check --mode=lowmem /dev/nbd0 ; echo $?
checking extents
ERROR: block group[74117545984 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[239473786880 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[500393050112 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[581997428736 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[626557714432 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[668433645568 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[948680261632 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[982503129088 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items 
used 1074266112
ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items 
used 1074266112
ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items 
used 1084751872
ERROR: block group[8116980678656 1073741824] used 1073741824 but extent items 
used 0
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
found 7519512838144 bytes used err is -5
total csum bytes: 7330834320
total tree bytes: 10902437888
total fs tree bytes: 2019704832
total extent tree bytes: 1020149760
btree space waste bytes: 925714197
file data blocks allocated: 7509228494848
 referenced 7630551511040
1

=> so the fs would still show the symptoms


Then, with no RW mount to the fs in between, 

Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update

2017-01-24 Thread Christoph Anton Mitterer
On Wed, 2017-01-25 at 08:44 +0800, Qu Wenruo wrote:
> Thanks for the test,

You're welcome... I'm happy if I can help :)

Just tell me once you think you found something, and I'll repeat the
testing.


Cheers,
Chr
is.

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v3 5/6] btrfs-progs: convert: Switch to new rollback function

2017-01-24 Thread Qu Wenruo



At 01/25/2017 12:37 AM, David Sterba wrote:

On Tue, Jan 24, 2017 at 08:44:00AM +0800, Qu Wenruo wrote:



At 01/24/2017 01:54 AM, David Sterba wrote:

On Mon, Dec 19, 2016 at 02:56:41PM +0800, Qu Wenruo wrote:

Since we have the whole facilities needed to rollback, switch to the new
rollback.


Sorry, the change from patch 4 to patch 5 seems too big to grasp for me,
reviewing is really hard and I'm not sure I could even do that. My
concern is namely about patch 5/6 that throws out a lot of code that
does not obviously map to the new code.

I can try again to see if there are points where the patch could be
split, but at the moment the patchset is too scary to merge.



So this implies the current implementation is not good enough for review.


I'd say the code hasn't been cleaned up for a long time so it's not good
enough for adding new features and doing broader fixes. The v2 rework
has fixed quite an important issue, but for other issues I'd rather get
smaller patches that eg. prepare the code for the final change.
Something that I can review without needing to reread the whole convert
and refresh memories about all details.


I'll try to extract more more set operation and make the core part more
refined, with more ascii art comment for it.


The ascii diagrams help, the overall convert design could be also better
documented etc. At the moment I'd rather spend some time on cleaning up
the sources but also don't want to block the fixes you've been sending.
I need to think about that more.


Feel free to block the rework.

I'll start from sending out basic documentations explaining the logic 
behind convert/rollback, which should help review.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix wrong argument for btrfs_lookup_ordered_range

2017-01-24 Thread Liu Bo
Commit "d0b7da88 Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units"
did this, but btrfs_lookup_ordered_range expects a 'length' rather than a
'page_end'.

Signed-off-by: Liu Bo 
---
Is this a candidate for stable?

 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 4e02426..366cf0b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9023,7 +9023,7 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, struct 
vm_fault *vmf)
 * we can't set the delalloc bits if there are pending ordered
 * extents.  Drop our locks and wait for them to finish
 */
-   ordered = btrfs_lookup_ordered_range(inode, page_start, page_end);
+   ordered = btrfs_lookup_ordered_range(inode, page_start, PAGE_SIZE);
if (ordered) {
unlock_extent_cached(io_tree, page_start, page_end,
 _state, GFP_NOFS);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 01:48:12PM -0700, Chris Murphy wrote:
> journal_debug.log booted with patched kernel and params
> systemd.log_level=debug rd.debug, and then output to file with
> journalctl -b -o short-monotonic
> https://drive.google.com/open?id=0B_2Asp8DGjJ9MjRDWi0tZ0x4V2s
> 
> I'm uncertain of the immediacy of some event resulting in another but...
> 
> 
> [8.159581] f25h systemd-tmpfiles[720]: Running create action for
> entry d /var/run/pptp
> [8.159703] f25h systemd-tmpfiles[720]: Created directory "/var/run/pptp".
> [8.159825] f25h systemd-tmpfiles[720]: "/var/run/pptp" has right mode 
> 40750
> [8.159944] f25h systemd-tmpfiles[720]: Running remove action for
> entry d /var/run/pptp
> [8.160070] f25h systemd-tmpfiles[720]: Running create action for
> entry d /var/run/radvd
> [8.160186] f25h systemd-tmpfiles[720]: Created directory "/var/run/radvd".
> [8.160312] f25h systemd-tmpfiles[720]: "/var/run/radvd" has right mode 
> 40755
> [8.171770] f25h kernel: [ cut here ]
> [8.174028] f25h kernel: WARNING: CPU: 2 PID: 720 at
> fs/btrfs/delayed-inode.c:55
> btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
> [8.176316] f25h kernel: ino 2 is out of range
> 
> 
> Is it the creation of these directories triggering the error? Anyway,
> there's a bunch of systemd-tmpfiles activity prior to this, and right
> before the btrfs_get_or_create_delayed_node call trace.
> 
> 
> The final call trace is cut off, I guess when systemd switches from
> /run to /var and flushes, if the fs goes read only and can't write all
> of what's flushed, it results in journal data loss. If the rest of the
> output is useful I can switch systemd to only use volatile storage to
> avoid this problem.

Got this to repro after installing systemd-container. It's happening on
lsetxattr() to set the SELinux context on /var/lib/machines, which is a
subvolume. Looking into it now. Thanks for all of the help, Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-check: Fix bitflipped keys from bad RAM

2017-01-24 Thread Otto Kekäläinen
2016-06-28 23:11 GMT+03:00 Otto Kekäläinen :
> Hello!
>
> A patch with this subject was submitted in May 2014:
> http://www.spinics.net/lists/linux-btrfs/msg33777.html
>
> I don't see it among any of the ~360 open issues at
> https://bugzilla.kernel.org/buglist.cgi?bug_status=NEW_status=ASSIGNED_status=REOPENED=btrfs
>
> Unless somebody objects, I'll file it as a NEW issue with patch.
>
>
> I think the work done by Hugo Mills for this one is important and it
> would be a pity if those patches were forgotten. It is one of those
> things where btrfs could outperform zfs, which has not bitflip
> recovery. Btrfs could have one, and it would be great.
>
> I personally came across a machine with a bitflipped index and I would
> love to test these patches. I am however reluctant to invest time in
> it if there is no issue in the bug tracker and no visible progress.
> Without proper tracking all debugging/feedback would go in vain.

I eventually went ahead and filed this as
https://bugzilla.kernel.org/show_bug.cgi?id=17447

It would be very cool if the bitflip recovery patch was merged. It is
one of those
things where btrfs could outperform zfs.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
journal_debug.log booted with patched kernel and params
systemd.log_level=debug rd.debug, and then output to file with
journalctl -b -o short-monotonic
https://drive.google.com/open?id=0B_2Asp8DGjJ9MjRDWi0tZ0x4V2s

I'm uncertain of the immediacy of some event resulting in another but...


[8.159581] f25h systemd-tmpfiles[720]: Running create action for
entry d /var/run/pptp
[8.159703] f25h systemd-tmpfiles[720]: Created directory "/var/run/pptp".
[8.159825] f25h systemd-tmpfiles[720]: "/var/run/pptp" has right mode 40750
[8.159944] f25h systemd-tmpfiles[720]: Running remove action for
entry d /var/run/pptp
[8.160070] f25h systemd-tmpfiles[720]: Running create action for
entry d /var/run/radvd
[8.160186] f25h systemd-tmpfiles[720]: Created directory "/var/run/radvd".
[8.160312] f25h systemd-tmpfiles[720]: "/var/run/radvd" has right mode 40755
[8.171770] f25h kernel: [ cut here ]
[8.174028] f25h kernel: WARNING: CPU: 2 PID: 720 at
fs/btrfs/delayed-inode.c:55
btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
[8.176316] f25h kernel: ino 2 is out of range


Is it the creation of these directories triggering the error? Anyway,
there's a bunch of systemd-tmpfiles activity prior to this, and right
before the btrfs_get_or_create_delayed_node call trace.


The final call trace is cut off, I guess when systemd switches from
/run to /var and flushes, if the fs goes read only and can't write all
of what's flushed, it results in journal data loss. If the rest of the
output is useful I can switch systemd to only use volatile storage to
avoid this problem.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Planned feature status

2017-01-24 Thread Hugo Mills
On Tue, Jan 24, 2017 at 01:37:21PM -0700, Stephen Wiebelhaus wrote:
> I know that setting different RAID level per subvolume is planned
> for the future, but I can't find documentation on the Wiki as to
> what priority the feature is. I can find docs on some user submitted
> feature requests, but it seems since this is something that was
> planned longer ago it's not documented. Can someone tell me where to
> find a list of feature priorities or when this might be done.

   There isn't such a list (or at least, not a publicly acknowledged
one).

   Hugo.

-- 
Hugo Mills | Great films about cricket: Umpire of the Rising Sun
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Planned feature status

2017-01-24 Thread Stephen Wiebelhaus
I know that setting different RAID level per subvolume is planned for 
the future, but I can't find documentation on the Wiki as to what 
priority the feature is. I can find docs on some user submitted feature 
requests, but it seems since this is something that was planned longer 
ago it's not documented. Can someone tell me where to find a list of 
feature priorities or when this might be done.


Thank you,

Stephen

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 1:27 PM, Omar Sandoval  wrote:
> On Tue, Jan 24, 2017 at 01:24:51PM -0700, Chris Murphy wrote:
>> On Tue, Jan 24, 2017 at 1:10 PM, Omar Sandoval  wrote:
>>
>> > Hm, still no luck, maybe it's a Server vs Workstation thing? I'll try
>> > installing Workstation. In the meantime, I noticed that in both of the
>> > traces, systemd-tmpfiles was the process that tripped the WARN_ONCE().
>> > Could you dump the contents of /{etc,run,usr/lib}/tmpfiles.d somewhere?
>>
>> This is just ls -lZ for those directories, not sure how else to dump them.
>>
>> https://drive.google.com/open?id=0B_2Asp8DGjJ9NFJUUUpuV2lxcG8
>
> Sorry, I mean the actual contents, could you just shove them in a
> tarball? New test kernel is building right now, hopefully that does
> it...

https://drive.google.com/open?id=0B_2Asp8DGjJ9MjVOT3NXeWh3eVU

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 01:24:51PM -0700, Chris Murphy wrote:
> On Tue, Jan 24, 2017 at 1:10 PM, Omar Sandoval  wrote:
> 
> > Hm, still no luck, maybe it's a Server vs Workstation thing? I'll try
> > installing Workstation. In the meantime, I noticed that in both of the
> > traces, systemd-tmpfiles was the process that tripped the WARN_ONCE().
> > Could you dump the contents of /{etc,run,usr/lib}/tmpfiles.d somewhere?
> 
> This is just ls -lZ for those directories, not sure how else to dump them.
> 
> https://drive.google.com/open?id=0B_2Asp8DGjJ9NFJUUUpuV2lxcG8

Sorry, I mean the actual contents, could you just shove them in a
tarball? New test kernel is building right now, hopefully that does
it...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 1:10 PM, Omar Sandoval  wrote:

> Hm, still no luck, maybe it's a Server vs Workstation thing? I'll try
> installing Workstation. In the meantime, I noticed that in both of the
> traces, systemd-tmpfiles was the process that tripped the WARN_ONCE().
> Could you dump the contents of /{etc,run,usr/lib}/tmpfiles.d somewhere?

This is just ls -lZ for those directories, not sure how else to dump them.

https://drive.google.com/open?id=0B_2Asp8DGjJ9NFJUUUpuV2lxcG8


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 01:13:47PM -0700, Chris Murphy wrote:
> OK I've reproduced it in a virt-manager VM with Fedora Rawhide from a
> week old ISO, which is using btrfs-progs 4.9 and kernel 4.10-rc3.
> 
> Get this though. The problem doesn't happen with boot param selinux=0,
> either in the VM or one of the laptops. Whereas the problem happens
> still with enforcing=0. So this could be an selinux xattr that's
> involved.
> 
> Chris Murphy

That makes sense, I don't have SELinux compiled into my test kernel.
I'll fix that and try again.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
OK I've reproduced it in a virt-manager VM with Fedora Rawhide from a
week old ISO, which is using btrfs-progs 4.9 and kernel 4.10-rc3.

Get this though. The problem doesn't happen with boot param selinux=0,
either in the VM or one of the laptops. Whereas the problem happens
still with enforcing=0. So this could be an selinux xattr that's
involved.

Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 12:19:29PM -0700, Chris Murphy wrote:
> I can reproduce it on another laptop with Fedora 25. I haven't tried
> to reproduce it in a VM. In this case, it's a single partition Btrfs
> volume, completely stock, and is about 5 weeks old, no crashes or
> forced shutdowns. The HP uses a Samsung NVMe SSD, whereas on this
> Macbook Pro it's a Samsung SATA SSD.
> 
> dmesg
> https://drive.google.com/open?id=0B_2Asp8DGjJ9cjhSRUJxc1k3NVE
> 
> 
> Chris Murphy

Hm, still no luck, maybe it's a Server vs Workstation thing? I'll try
installing Workstation. In the meantime, I noticed that in both of the
traces, systemd-tmpfiles was the process that tripped the WARN_ONCE().
Could you dump the contents of /{etc,run,usr/lib}/tmpfiles.d somewhere?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.

2017-01-24 Thread Adam Borowski
On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote:
> If I remove 'ro' from the option, I cannot get the filesystem mounted
> because of the following error:
> 
> BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not
> allowed
> 
> So I am stuck. I can only mount the filesystem as read-only, which prevents
> me to add a disk.

A known problem: you get only one shot at fixing the filesystem, but that's
not because of some damage but because the check whether the fs is in a
shape is good enough to mount is oversimplistic.

Here's a patch, if you apply it and recompile, you'll be able to mount
degraded rw.

Note that it removes a safety harness: here, the harness got tangled up and
keeps you from recovering when it shouldn't, but it _has_ valid uses that.


Meow!
-- 
Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
  ./configure --host=zx-spectrum --build=pdp11
>From 1367d3da6b0189797f6090b11d8716a1cc136593 Mon Sep 17 00:00:00 2001
From: Adam Borowski 
Date: Mon, 23 Jan 2017 19:03:20 +0100
Subject: [PATCH] [NOT-FOR-MERGING] btrfs: make "too many missing devices"
 check non-fatal

It breaks degraded mounts of multi-device filesystems that have any single
blocks, which are naturally created if it has been mounted degraded before.
Obviously, any further device loss will result in data loss, but the user
has already specified -odegraded so that's understood.

For a real fix, we'd want to check whether any of single blocks are missing,
as that would allow telling apart broken JBOD filesystems from bona-fide
degraded RAIDs.

(This patch is for the benefit of folks who'd have to recreate a filesystem
just because it got degraded.)
---
 fs/btrfs/disk-io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 18004169552c..1b25b9e24662 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3060,10 +3060,9 @@ int open_ctree(struct super_block *sb,
 	 fs_info->num_tolerated_disk_barrier_failures &&
 	!(sb->s_flags & MS_RDONLY)) {
 		btrfs_warn(fs_info,
-"missing devices (%llu) exceeds the limit (%d), writeable mount is not allowed",
+"missing devices (%llu) exceeds the limit (%d), add more or risk data loss",
 			fs_info->fs_devices->missing_devices,
 			fs_info->num_tolerated_disk_barrier_failures);
-		goto fail_sysfs;
 	}
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
-- 
2.11.0



Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
I can reproduce it on another laptop with Fedora 25. I haven't tried
to reproduce it in a VM. In this case, it's a single partition Btrfs
volume, completely stock, and is about 5 weeks old, no crashes or
forced shutdowns. The HP uses a Samsung NVMe SSD, whereas on this
Macbook Pro it's a Samsung SATA SSD.

dmesg
https://drive.google.com/open?id=0B_2Asp8DGjJ9cjhSRUJxc1k3NVE


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid1: cannot add disk to replace faulty because can only mount fs as read-only.

2017-01-24 Thread Hans Deragon

Greetings,


Warning: Btrfs user here; no knowledge of the inside working of btrfs. 
If I am in the wrong mailing list, please redirect me and accept my 
apologies.


At home, lacking of disks and free SATA ports, I created a raid1 btrfs 
filesystem by converting an existing single btrfs instance into a 
degraded raid1, then added the other driver. The exact commands I used 
have been lost.


Worked well, until one of my drive died. Total death; the OS does not 
detect it anymore. I bought another drive, but alas, I cannot add it:


# btrfs replace start -B 2 /dev/sdd /mnt/brtfs-raid1-b
ERROR: ioctl(DEV_REPLACE_START) failed on "/mnt/brtfs-raid1-b": 
Read-only file system


Here is the command I used to mount it:

mount -t btrfs -o ro,degraded,recovery,nosuid,nodev,nofail,x-gvfs-show 
/dev/disk/by-uuid/975bdbb3-9a9c-4a72-ad67-6cda545fda5e 
/mnt/brtfs-raid1-b


If I remove 'ro' from the option, I cannot get the filesystem mounted 
because of the following error:


BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not 
allowed


So I am stuck. I can only mount the filesystem as read-only, which 
prevents me to add a disk.


It seams related to bug: 
https://bugzilla.kernel.org/show_bug.cgi?id=60594


I am using Ubuntu 16.04 LTS with kernel 4.4.0-59-generic. Is there any 
hope to add a disk? Else, can I recreate a raid1 with only one disk and 
add another, but never suffer from the same problem again? I did not 
lost any data, but I do have some serious downtime because of this. I 
wish that if a drive fail, the btrfs filesystem still mounts rw and 
leave the OS running, but warns the user of a failing disk and easily 
allow the addition of a new drive to reintroduce redundancy. However, 
this scenarios seams impossible with the current state of affair. Am I 
right?



Best regards and thank you for your contribution to the open source 
movement,

Hans Deragon
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 11:56 AM, Omar Sandoval  wrote:
> Yup, definitely doesn't look like memory corruption. I set up a Fedora
> VM yesterday to try to repro with basically those same steps but it
> didn't happen. I'll try again, but is there anything special about your
> Fedora installation?

Default mkfs. Default mount options.

However, due to subsequent suboptimal situation (installing Windows 10
after Fedora), this Btrfs volume is actually a two device volume: two
partitions with Windows 10 in between them.

[chris@f25h ~]$ sudo btrfs fi show
Label: 'fedora'  uuid: c45caf39-a048-4c44-90c9-535dc8003c71
Total devices 2 FS bytes used 51.37GiB
devid1 size 25.00GiB used 21.03GiB path /dev/nvme0n1p4
devid2 size 48.83GiB used 43.00GiB path /dev/nvme0n1p6

[chris@f25h ~]$ sudo gdisk -l /dev/nvme0n1
[...snip...]
Number  Start (sector)End (sector)  Size   Code  Name
   12048  411647   200.0 MiB   EF00  EFI System Partition
   2  411648 2508799   1024.0 MiB  8300
   3 250880016873471   6.8 GiB 8200
   41687347269302271   25.0 GiB8300  Linux filesystem
   569302272   229046271   76.2 GiB0700  Microsoft basic data
   6   229046272   331446271   48.8 GiB8300  Linux filesystem
   7   331446272   500118158   80.4 GiB8E00  Linux LVM

p4 was made to small when adding in Windows; so I shrank Windows to
make p6, and then added p6 to p4. Hence p4 and p6 are the same Btrfs
volume (single profile for metadata and data).


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
OK I just did a 2nd boot - all the same everything as the previous
dmesg.log (patch kernel snapshot). Identical. But not identical
results: timing wise the problem happens much sooner, at 14s the fs
goes read only instead of 30+ seconds. And I also get this line:

[   14.039931] systemd-journald[488]: Failed to set ACL on
/var/log/journal/864b0a611b104692b266377c7d4c7a39/user-1000.journal,
ignoring: Operation not permitted

entire dmesg for this boot:
https://drive.google.com/open?id=0B_2Asp8DGjJ9LTRzZjA2SThtbkE

Did it fail to set the ACL because the file system is readonly? Or was
trying to set the ACL what triggers the problem - then it goes
readonly and then we see the the complaint that the ACL failed to be
set? There's close proximity timing wise to the first Btrfs error
messages, with the most recent patch, and systemd audit messages.
Systemd-journald is switching from /run to /var, flushing the journal
to disk within about 2 seconds of the first Btrfs error. And systemd
does chattr +C on its logs by default now (and I think it's not user
changeable behavior so I can't test if it's related).

Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 11:37:43AM -0700, Chris Murphy wrote:
> On Tue, Jan 24, 2017 at 10:49 AM, Omar Sandoval  wrote:
> > On Mon, Jan 23, 2017 at 08:51:24PM -0700, Chris Murphy wrote:
> >> On Mon, Jan 23, 2017 at 5:05 PM, Omar Sandoval  wrote:
> >> > Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
> >> > That's probably us failing btrfs_lookup_inode(), but just to make sure,
> >> > could you apply the updated diff at the same link as before
> >> > (https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
> >> > that's the case, I'm even more confused about what xattrs have to do
> >> > with it.
> >>
> >> [   35.015363] __btrfs_update_delayed_inode(): inode is missing
> >
> > Okay, like I expected...
> >
> >> [   35.015372] btrfs_update_delayed_inode(ino=2) -> -2
> >
> > Wtf? Inode numbers should be >=256. I updated the diff a third time to
> > catch where that came from. If we're lucky, the backtrace should have
> > the exact culprit. If we're unlucky, there might be memory corruption
> > involved.
> 
> Now two traces. This one is new, and follows a bunch of xattr related stuff...
> 
> [6.861504] WARNING: CPU: 3 PID: 690 at fs/btrfs/delayed-inode.c:55
> btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
> [6.862833] ino 2 is out of range
> 
> Then this:
> [7.016061] __btrfs_update_delayed_inode(): inode is missing
> [7.017149] btrfs_update_delayed_inode() failed
> [7.018233] __btrfs_commit_inode_delayed_items(ino=2, flags=3) -> -2
> 
> And finally what we've already seen:
> [   34.930890] WARNING: CPU: 0 PID: 396 at
> fs/btrfs/delayed-inode.c:1194 __btrfs_run_delayed_items+0x1d0/0x670
> [btrfs]
> 
> Complete dmesg osandov-9f223b-3_dmesg.log
> https://drive.google.com/open?id=0B_2Asp8DGjJ9bnpNamIydklraTQ
> 

Aha, so it is xattrs! Here's the full warning trace:

[6.860185] [ cut here ]
[6.861504] WARNING: CPU: 3 PID: 690 at fs/btrfs/delayed-inode.c:55 
btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
[6.862833] ino 2 is out of range
[6.862842] Modules linked in:
[6.864213]  xfs libcrc32c arc4 iwlmvm intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp mac80211 snd_soc_skl kvm_intel snd_soc_skl_ipc kvm 
snd_hda_codec_hdmi snd_soc_sst_ipc irqbypass snd_soc_sst_dsp crct10dif_pclmul 
iTCO_wdt crc32_pclmul snd_hda_codec_conexant snd_hda_ext_core 
snd_hda_codec_generic ghash_clmulni_intel iTCO_vendor_support snd_soc_sst_match 
intel_cstate snd_soc_core iwlwifi i2c_designware_platform i2c_designware_core 
hp_wmi sparse_keymap snd_hda_intel intel_uncore snd_hda_codec cfg80211 
snd_hwdep snd_hda_core snd_seq snd_seq_device uvcvideo intel_rapl_perf snd_pcm 
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core joydev 
videodev idma64 snd_timer btusb hci_uart i2c_i801 snd i2c_smbus media btrtl 
btbcm soundcore btqca btintel mei_me mei bluetooth shpchp 
processor_thermal_device
[6.869661]  intel_pch_thermal intel_lpss_pci intel_soc_dts_iosf ucsi wmi 
hp_accel pinctrl_sunrisepoint lis3lv02d pinctrl_intel int3403_thermal rfkill 
input_polldev hp_wireless intel_lpss_acpi int340x_thermal_zone nfsd 
int3400_thermal intel_lpss tpm_crb acpi_thermal_rel acpi_pad tpm_tis 
tpm_tis_core tpm auth_rpcgss nfs_acl lockd grace sunrpc btrfs i915 xor raid6_pq 
i2c_algo_bit drm_kms_helper drm crc32c_intel nvme serio_raw nvme_core i2c_hid 
video fjes
[6.874780] CPU: 3 PID: 690 Comm: systemd-tmpfile Not tainted 4.9.0+ #2
[6.876294] Hardware name: HP HP Spectre Notebook/81A0, BIOS F.30 12/15/2016
[6.877820]  9bc341187a78 923ed9ed 9bc341187ac8 

[6.879316]  9bc341187ab8 920a1d9b 0037921cafcb 
0002
[6.880836]  8c4126d62000 8c413170b0b0 ff02 
8c4129a8f300
[6.882364] Call Trace:
[6.883861]  [] dump_stack+0x63/0x86
[6.885355]  [] __warn+0xcb/0xf0
[6.886888]  [] warn_slowpath_fmt+0x5f/0x80
[6.888415]  [] ? 
btrfs_get_or_create_delayed_node+0x126/0x1e0 [btrfs]
[6.889979]  [] 
btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
[6.891498]  [] btrfs_delayed_update_inode+0x27/0x420 
[btrfs]
[6.893023]  [] ? current_fs_time+0x23/0x30
[6.894602]  [] btrfs_update_inode+0x8d/0x100 [btrfs]
[6.896122]  [] ? current_time+0x36/0x70
[6.897681]  [] __btrfs_setxattr+0xe3/0x120 [btrfs]
[6.899212]  [] btrfs_xattr_handler_set+0x36/0x40 [btrfs]
[6.900690]  [] __vfs_setxattr+0x6b/0x90
[6.902182]  [] __vfs_setxattr_noperm+0x72/0x1b0
[6.903622]  [] vfs_setxattr+0xa7/0xb0
[6.905078]  [] setxattr+0x160/0x180
[6.906515]  [] ? __check_object_size+0xff/0x1d6
[6.907894]  [] ? strncpy_from_user+0x4d/0x170
[6.909231]  [] ? getname_flags+0x6f/0x1f0
[6.910590]  [] path_setxattr+0xb3/0xe0
[6.911913]  [] SyS_lsetxattr+0x11/0x20
[6.913211]  [] do_syscall_64+0x67/0x180
[6.914557]  [] entry_SYSCALL64_slow_path+0x25/0x25
[ 

Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 10:49 AM, Omar Sandoval  wrote:
> On Mon, Jan 23, 2017 at 08:51:24PM -0700, Chris Murphy wrote:
>> On Mon, Jan 23, 2017 at 5:05 PM, Omar Sandoval  wrote:
>> > Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
>> > That's probably us failing btrfs_lookup_inode(), but just to make sure,
>> > could you apply the updated diff at the same link as before
>> > (https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
>> > that's the case, I'm even more confused about what xattrs have to do
>> > with it.
>>
>> [   35.015363] __btrfs_update_delayed_inode(): inode is missing
>
> Okay, like I expected...
>
>> [   35.015372] btrfs_update_delayed_inode(ino=2) -> -2
>
> Wtf? Inode numbers should be >=256. I updated the diff a third time to
> catch where that came from. If we're lucky, the backtrace should have
> the exact culprit. If we're unlucky, there might be memory corruption
> involved.

Now two traces. This one is new, and follows a bunch of xattr related stuff...

[6.861504] WARNING: CPU: 3 PID: 690 at fs/btrfs/delayed-inode.c:55
btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
[6.862833] ino 2 is out of range

Then this:
[7.016061] __btrfs_update_delayed_inode(): inode is missing
[7.017149] btrfs_update_delayed_inode() failed
[7.018233] __btrfs_commit_inode_delayed_items(ino=2, flags=3) -> -2

And finally what we've already seen:
[   34.930890] WARNING: CPU: 0 PID: 396 at
fs/btrfs/delayed-inode.c:1194 __btrfs_run_delayed_items+0x1d0/0x670
[btrfs]

Complete dmesg osandov-9f223b-3_dmesg.log
https://drive.google.com/open?id=0B_2Asp8DGjJ9bnpNamIydklraTQ

Also, to do these tests, I'm making a new rw snapshot each time so
that the new kernel modules are in the snapshot. e.g.

1. subvolumes 'home' and 'root' are originally created with 'btrfs sub
create' and then filled, and these work OK with all kernels.
2. build kernel with patch
3. 'btrfs sub snap root root.test8'  and also 'btrfs sub snap home home.test8'
4. sudo vi root.test8/etc/fstab to update the entry for / so that
subvol=root is now subvol=root.test8, and also update for /home
5. sudo vi /boot/efi/EFI/fedora/grub.cfg to update the command line,
rootflags=subvol=root becomes rootflags=subvol=root.test8

So the fact the kernel works on subvolume root, but consistently does
not work on each brand new snapshot, is suspiciously unlike what I'd
expect for memory corruption; unless the memory corruption has already
"tainted" the file system in a way that neither btrfs check or scrub
can find; and this "taintedness" of the file system doesn't manifest
until there's a snapshot being used and with a particular kernel with
the xattr patch?

Pretty weird.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Mon, Jan 23, 2017 at 08:51:24PM -0700, Chris Murphy wrote:
> On Mon, Jan 23, 2017 at 5:05 PM, Omar Sandoval  wrote:
> > Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
> > That's probably us failing btrfs_lookup_inode(), but just to make sure,
> > could you apply the updated diff at the same link as before
> > (https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
> > that's the case, I'm even more confused about what xattrs have to do
> > with it.
> 
> [   35.015363] __btrfs_update_delayed_inode(): inode is missing

Okay, like I expected...

> [   35.015372] btrfs_update_delayed_inode(ino=2) -> -2

Wtf? Inode numbers should be >=256. I updated the diff a third time to
catch where that came from. If we're lucky, the backtrace should have
the exact culprit. If we're unlucky, there might be memory corruption
involved.

> osandov-9f223b_2-dmesg.log
> https://drive.google.com/open?id=0B_2Asp8DGjJ9UnNSRXpualprWHM
> 
> 
> -- 
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] btrfs-progs: tests: add test for --sync option of qgroup show

2017-01-24 Thread David Sterba
On Thu, Dec 15, 2016 at 01:33:05PM +0900, Tsutomu Itoh wrote:
> Simple test script for the following patch.
> 
>btrfs-progs: qgroup: add sync option to 'qgroup show'
> 
> Signed-off-by: Tsutomu Itoh 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] Lowmem mode fsck fixes with fsck-tests framework update

2017-01-24 Thread Christoph Anton Mitterer
Hey Qu.

I was giving your patches a try, again on the very same fs (which saw
however writes in the meantime), from my initial report.

btrfs-progs v4.9 WITHOUT patch:
***
# btrfs check /dev/nbd0 ; echo $?
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
found 7519512838144 bytes used err is 0
total csum bytes: 7330834320
total tree bytes: 10902437888
total fs tree bytes: 2019704832
total extent tree bytes: 1020149760
btree space waste bytes: 925714197
file data blocks allocated: 7509228494848
 referenced 7630551511040
0

# btrfs check --mode=lowmem /dev/nbd0 ; echo $?
checking extents
ERROR: block group[74117545984 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[239473786880 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[500393050112 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[581997428736 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[626557714432 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[668433645568 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[948680261632 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[982503129088 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1039411445760 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1054443831296 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[1190809042944 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1279392743424 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1481256206336 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[1620842643456 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[1914511032320 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3055361720320 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[3216422993920 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[3670615785472 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3801612288000 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[3828455833600 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[4250973241344 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4261710659584 1073741824] used 1073741824 but extent items 
used 1074266112
ERROR: block group[4392707162112 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4558063403008 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4607455526912 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4635372814336 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4640204652544 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[4642352136192 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[4681006841856 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5063795802112 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5171169984512 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[5216267141120 1073741824] used 1073741824 but extent items 
used 1207959552
ERROR: block group[5290355326976 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[5445511020544 1073741824] used 1073741824 but extent items 
used 1074266112
ERROR: block group[6084387405824 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6104788500480 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6878956355584 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[6997067956224 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[7702516334592 1073741824] used 1073741824 but extent items 
used 0
ERROR: block group[8051482427392 1073741824] used 1073741824 but extent items 
used 1084751872
ERROR: block group[8116980678656 1073741824] used 1073741824 but extent items 
used 0
ERROR: errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
Checking filesystem on /dev/nbd0
UUID: 326d292d-f97b-43ca-b1e8-c722d3474719
found 7519512838144 bytes used err is -5
total csum bytes: 7330834320
total tree bytes: 10902437888
total fs tree bytes: 2019704832
total extent tree bytes: 1020149760
btree space waste bytes: 925714197
file data blocks allocated: 7509228494848
 referenced 7630551511040
1

=> so the fs would still show the symptoms


Then, with no RW mount to the fs in between, 4.9 with the following of
your patches:

Re: [PATCH v3 1/2] btrfs-progs: qgroup: add sync option to 'qgroup show'

2017-01-24 Thread David Sterba
On Thu, Dec 15, 2016 at 01:29:28PM +0900, Tsutomu Itoh wrote:
> The 'qgroup show' command does not synchronize filesystem.
> Therefore, 'qgroup show' may not display the correct value unless
> synchronized with 'filesystem sync' command etc.
> 
> So add the '--sync' option so that we can choose whether or not
> to synchronize when executing the command.
> 
> Signed-off-by: Tsutomu Itoh 

1 and 2 applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/6] btrfs-progs: convert: Switch to new rollback function

2017-01-24 Thread David Sterba
On Tue, Jan 24, 2017 at 08:44:00AM +0800, Qu Wenruo wrote:
> 
> 
> At 01/24/2017 01:54 AM, David Sterba wrote:
> > On Mon, Dec 19, 2016 at 02:56:41PM +0800, Qu Wenruo wrote:
> >> Since we have the whole facilities needed to rollback, switch to the new
> >> rollback.
> >
> > Sorry, the change from patch 4 to patch 5 seems too big to grasp for me,
> > reviewing is really hard and I'm not sure I could even do that. My
> > concern is namely about patch 5/6 that throws out a lot of code that
> > does not obviously map to the new code.
> >
> > I can try again to see if there are points where the patch could be
> > split, but at the moment the patchset is too scary to merge.
> >
> 
> So this implies the current implementation is not good enough for review.

I'd say the code hasn't been cleaned up for a long time so it's not good
enough for adding new features and doing broader fixes. The v2 rework
has fixed quite an important issue, but for other issues I'd rather get
smaller patches that eg. prepare the code for the final change.
Something that I can review without needing to reread the whole convert
and refresh memories about all details.

> I'll try to extract more more set operation and make the core part more 
> refined, with more ascii art comment for it.

The ascii diagrams help, the overall convert design could be also better
documented etc. At the moment I'd rather spend some time on cleaning up
the sources but also don't want to block the fixes you've been sending.
I need to think about that more.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: Introduce kernel sizes to cleanup large intermediate number

2017-01-24 Thread David Sterba
On Tue, Jan 24, 2017 at 11:03:05AM +0800, Qu Wenruo wrote:
> Large numbers like (1024 * 1024 * 1024) may cost reader/reviewer to
> waste one second to convert to 1G.
> 
> Introduce kernel include/linux/sizes.h to replace any intermediate
> number larger than 4096 (not including 4096) to SZ_*.
> 
> Signed-off-by: Qu Wenruo 

Applied thanks. Changes are everywhere but fairly easy to sort out
during merges.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID56 status?

2017-01-24 Thread Niccolò Belli

+1

On martedì 24 gennaio 2017 00:31:42 CET, Christoph Anton Mitterer wrote:

On Mon, 2017-01-23 at 18:18 -0500, Chris Mason wrote:

We've been focusing on the single-drive use cases internally.  This
year 
that's changing as we ramp up more users in different places.  
Performance/stability work and raid5/6 are the top of my list right

now.

+1

Would be nice to get some feedback on what happens behind the scenes...
 actually I think a regular btrfs development blog could be generally a
nice thing :)

Cheers,
Chris.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs-progs: Fix disable backtrace assert error

2017-01-24 Thread David Sterba
On Tue, Jan 24, 2017 at 12:55:22PM +0100, David Sterba wrote:
> On Thu, Jan 19, 2017 at 12:00:03PM +0800, Qu Wenruo wrote:
> > Due to commit 00e769d04c2c83029d6c71(btrfs-progs: Correct value printed
> > by assertions/BUG_ON/WARN_ON), which changed the assert_trace()
> > parameter, the condition passed to assert/WARN_ON/BUG_ON are logical
> > notted for backtrace enabled and disabled case.
> > 
> > Such behavior makes us easier to pass value wrong, and in fact it did
> > cause us to pass wrong condition for ASSERT().
> > 
> > Instead of passing different conditions for ASSERT/WARN_ON/BUG_ON()
> > manually, this patch will use ASSERT() to implement the resting
> > ASSERT/WARN_ON/BUG() for disable backtrace case, and use assert_trace()
> > to implement ASSERT() and BUG_ON(), to allow them to print correct
> > value.
> > 
> > Also, move WARN_ON() out of the ifdef branch, as it's completely the
> > same for both branches.
> > 
> > Cc: Goldwyn Rodrigues 
> > Signed-off-by: Qu Wenruo 
> 
> Applied, thanks.

And FYI, I've added a trace dump for BUG_ON and removed the value
negation from ASSERT.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] btrfs-progs: Fix disable backtrace assert error

2017-01-24 Thread David Sterba
On Thu, Jan 19, 2017 at 12:00:03PM +0800, Qu Wenruo wrote:
> Due to commit 00e769d04c2c83029d6c71(btrfs-progs: Correct value printed
> by assertions/BUG_ON/WARN_ON), which changed the assert_trace()
> parameter, the condition passed to assert/WARN_ON/BUG_ON are logical
> notted for backtrace enabled and disabled case.
> 
> Such behavior makes us easier to pass value wrong, and in fact it did
> cause us to pass wrong condition for ASSERT().
> 
> Instead of passing different conditions for ASSERT/WARN_ON/BUG_ON()
> manually, this patch will use ASSERT() to implement the resting
> ASSERT/WARN_ON/BUG() for disable backtrace case, and use assert_trace()
> to implement ASSERT() and BUG_ON(), to allow them to print correct
> value.
> 
> Also, move WARN_ON() out of the ifdef branch, as it's completely the
> same for both branches.
> 
> Cc: Goldwyn Rodrigues 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] btrfs lockdep annotation

2017-01-24 Thread Christian Borntraeger
On 01/24/2017 11:22 AM, Filipe Manana wrote:
> On Tue, Jan 24, 2017 at 9:01 AM, Christian Borntraeger
>  wrote:
>> Chris,
>>
>> since my bug report about this did not result in any fix and since
> 
> It was fixed and the fix landed in 4.10-rc4:

Thanks, I missed that last pull.

> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=781feef7e6befafd4d9787d1f7ada1f9ccd504e4
> 
>> this disables lockdep before the the code that I want to debug runs
>> here is my attempt to fix it.
>> Please double check if the subclass looks right. It seems to work
>> for me but I do not know enough about btrfs to decide if this is
>> right or not.
>>
>> Christian Borntraeger ():
>>   btrfs: add lockdep annotation for btrfs_log_inode
>>
>>  fs/btrfs/tree-log.c| 2 +-
>>
>> --
>> 2.7.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] btrfs-progs: quota: fix printing during wait mode

2017-01-24 Thread David Sterba
On Tue, Jan 17, 2017 at 08:56:39PM -0500, je...@suse.com wrote:
> From: Jeff Mahoney 
> 
> If we call "btrfs quota rescan -w", it will attempt to start the rescan
> operation, wait for it, and then print the "quota rescan started" message.
> The wait could last an arbitrary amount of time, so printing it after
> the wait isn't very helpful.
> 
> This patch reworks how we print the rescan started message as well as the
> printing of the messages, including adding an error message for status
> query failures (which could be EPERM/EFAULT/ENOMEM, not just no rescan
> in progress) and wait failures.
> 
> Signed-off-by: Jeff Mahoney 

Applied, thanks.

I need to think about the 2nd patch, the command line option naming. The
semantics is clear.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] btrfs lockdep annotation

2017-01-24 Thread Filipe Manana
On Tue, Jan 24, 2017 at 9:01 AM, Christian Borntraeger
 wrote:
> Chris,
>
> since my bug report about this did not result in any fix and since

It was fixed and the fix landed in 4.10-rc4:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=781feef7e6befafd4d9787d1f7ada1f9ccd504e4

> this disables lockdep before the the code that I want to debug runs
> here is my attempt to fix it.
> Please double check if the subclass looks right. It seems to work
> for me but I do not know enough about btrfs to decide if this is
> right or not.
>
> Christian Borntraeger ():
>   btrfs: add lockdep annotation for btrfs_log_inode
>
>  fs/btrfs/tree-log.c| 2 +-
>
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"People will forget what you said,
 people will forget what you did,
 but people will never forget how you made them feel."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs rescue chunk-recover segfaults

2017-01-24 Thread Duncan
Simon Waid posted on Mon, 23 Jan 2017 09:42:28 +0100 as excerpted:


> I have a btrfs raid5 array that has become unmountable.

[As a list regular and btrfs user myself, not a dev, but I try to help 
with replies where I can in ordered to allow the devs and real experts to 
put their time to better use where I can't help.]

As stated on the btrfs wiki and in the mkfs.btrfs manpage, btrfs raid56 
hasn't stabilized and is now known to have critical defects as originally 
implemented, that make it unfit for the purposes most people normally use 
parity-raid for.  It's not recommended except for testing with purely 
sacrificial data that might potentially be eaten by the test.

Thus, anyone on btrfs raid56 mode should only been testing with 
effectively throw-away data, either because it's backed up and can be 
easily retrieved from that backup, or because it really /is/ throw-away 
data, making the damage from losing that testing filesystem minimal.

As such, if you're done with testing, the fastest and most efficient way 
back to production is to forget about the broken filesystem, and blow it 
away with a mkfs to a new filesystem, either some other btrfs mode or 
something other than the still maturing btrfs entirely, your choice.  
Then you can restore from backups if the data was worth having them.

Tho of course blowing it away does mean it can't be used as a lab 
specimen to perhaps help find and fix some of the problems that do affect 
raid56 mode at this time.

Qu Wenruo in particular, and others, have been gradually working thru at 
least some of the raid56 mode bugs, tho it's still possible the current 
code is beyond hope and may need entirely rewritten to properly 
stabilize.  If you don't have to get the space the filesystem was taking 
directly back in service and can build and work with the newest code 
possibly including patches they ask you to apply, you may be able to use 
your deployment as a lab specimen to help them test their newest recovery 
code and possibly help fix additional bugs in the process.

However, even then, don't expect that you'll necessarily recover most of 
what was on the filesystem, as raid56 mode really is seriously bugged 
ATM, and it's quite possible that the data has already been wiped out by 
those bugs.  Mostly, you're simply continuing to use the filesystem as an 
in-the-wild test deployment gone bad, now testing diagnostics and 
possible recovery, not necessarily with a good chance of recovering the 
data, but that's OK, since btrfs raid56 mode was never out of unstable 
testing-only mode in the first place, so any data put on it always was 
effectively sacrificial data, known to be potentially eaten by the 
testing itself.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs: add lockdep annotation for btrfs_log_inode

2017-01-24 Thread Christian Borntraeger
Add a proper subclass to get rid of the following lockdep
error.

 [ INFO: possible recursive locking detected ]
 4.9.0+ #279 Not tainted
 -
 vim/4801 is trying to acquire lock:
  (>log_mutex){+.+...}, at: [<03ff82057592>]
btrfs_log_inode+0x182/0xfa8 [btrfs]

  (>log_mutex){+.+...}, at: [<03ff82057592>]
btrfs_log_inode+0x182/0xfa8 [btrfs]

  Possible unsafe locking scenario:
CPU0

   lock(>log_mutex);
   lock(>log_mutex);

 *** DEADLOCK ***
  May be due to missing lock nesting notation
 3 locks held by vim/4801:
  #0:  (>s_type->i_mutex_key#15){+.+.+.}, at: [<03ff81fc274c>]
btrfs_sync_file+0x204/0x728 [btrfs]
  #1:  (sb_internal#2){.+.+..}, at: [<03ff81fa38e0>]
start_transaction+0x318/0x770 [btrfs]
  #2:  (>log_mutex){+.+...}, at: [<03ff82057592>]

[...]
 Call Trace:
 ([<00115ffc>] show_trace+0xe4/0x108)
  [<001160f8>] show_stack+0x68/0xe0
  [<00652d52>] dump_stack+0x9a/0xd8
  [<00209bb0>] __lock_acquire+0xac8/0x1bd0
  [<0020b3c6>] lock_acquire+0x106/0x4a0
  [<00a1fb36>] mutex_lock_nested+0xa6/0x428
  [<03ff82057592>] btrfs_log_inode+0x182/0xfa8 [btrfs]
  [<03ff82057c76>] btrfs_log_inode+0x866/0xfa8 [btrfs]
  [<03ff81ffe278>] btrfs_log_inode_parent+0x218/0x988 [btrfs]
  [<03ff81aa>] btrfs_log_dentry_safe+0x7a/0xa0 [btrfs]
  [<03ff81fc29b6>] btrfs_sync_file+0x46e/0x728 [btrfs]
  [<0044aeee>] do_fsync+0x5e/0x90
  [<0044b2ba>] SyS_fsync+0x32/0x40
  [<00a26786>] system_call+0xd6/0x288

Signed-off-by: Christian Borntraeger 
---
 fs/btrfs/tree-log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 3d33c4e..a3ec717 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4648,7 +4648,7 @@ static int btrfs_log_inode(struct btrfs_trans_handle 
*trans,
return ret;
}
 
-   mutex_lock(_I(inode)->log_mutex);
+   mutex_lock_nested(_I(inode)->log_mutex, inode_only);
 
/*
 * a brute force approach to making sure we get the most uptodate
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/1] btrfs lockdep annotation

2017-01-24 Thread Christian Borntraeger
Chris,

since my bug report about this did not result in any fix and since
this disables lockdep before the the code that I want to debug runs
here is my attempt to fix it.
Please double check if the subclass looks right. It seems to work
for me but I do not know enough about btrfs to decide if this is
right or not.

Christian Borntraeger ():
  btrfs: add lockdep annotation for btrfs_log_inode

 fs/btrfs/tree-log.c| 2 +-

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html