Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-25 Thread Chris Murphy
On Wed, Jan 25, 2017 at 3:58 PM, Omar Sandoval  wrote:
> On Wed, Jan 25, 2017 at 03:55:33PM -0700, Chris Murphy wrote:
>> On Tue, Jan 24, 2017 at 9:42 PM, Omar Sandoval  wrote:
>> > On Tue, Jan 24, 2017 at 07:53:06PM -0700, Chris Murphy wrote:
>> >> On Tue, Jan 24, 2017 at 3:50 PM, Omar Sandoval  
>> >> wrote:
>> >>
>> >> > Got this to repro after installing systemd-container. It's happening on
>> >> > lsetxattr() to set the SELinux context on /var/lib/machines, which is a
>> >> > subvolume. Looking into it now. Thanks for all of the help, Chris.
>> >>
>> >> Aha! So the snapshot part was a goose chase, it has nothing to do with
>> >> that, really. Because I'm taking a snapshot of root, the nested
>> >> /var/lib/machines subvolume is not in that snapshot, so now it has to
>> >> be created by systemd at next boot and the proper selinux label set on
>> >> it.
>> >>
>> >> It means there's something different about subvolumes and directories
>> >> when it comes to xattrs, and the xattr patch I found in bisect is
>> >> exposing the difference, hence things getting tripped up.
>> >
>> > The snapshots were actually the key -- the error was because setxattr()
>> > was being allowed on the read-only directory created in the place of the
>> > /var/lib/machines subvolume. Not sure if you saw the patch I sent up,
>> > but this should fix it: https://patchwork.kernel.org/patch/9536307/
>>
>> Got it.
>>
>> I couldn't get it to apply to 4.9, but it goes on 4.10-rc5 and the
>> problem is fixed.
>
> Sweet, can I add your tested-by?

Yep!

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-25 Thread Omar Sandoval
On Wed, Jan 25, 2017 at 03:55:33PM -0700, Chris Murphy wrote:
> On Tue, Jan 24, 2017 at 9:42 PM, Omar Sandoval  wrote:
> > On Tue, Jan 24, 2017 at 07:53:06PM -0700, Chris Murphy wrote:
> >> On Tue, Jan 24, 2017 at 3:50 PM, Omar Sandoval  wrote:
> >>
> >> > Got this to repro after installing systemd-container. It's happening on
> >> > lsetxattr() to set the SELinux context on /var/lib/machines, which is a
> >> > subvolume. Looking into it now. Thanks for all of the help, Chris.
> >>
> >> Aha! So the snapshot part was a goose chase, it has nothing to do with
> >> that, really. Because I'm taking a snapshot of root, the nested
> >> /var/lib/machines subvolume is not in that snapshot, so now it has to
> >> be created by systemd at next boot and the proper selinux label set on
> >> it.
> >>
> >> It means there's something different about subvolumes and directories
> >> when it comes to xattrs, and the xattr patch I found in bisect is
> >> exposing the difference, hence things getting tripped up.
> >
> > The snapshots were actually the key -- the error was because setxattr()
> > was being allowed on the read-only directory created in the place of the
> > /var/lib/machines subvolume. Not sure if you saw the patch I sent up,
> > but this should fix it: https://patchwork.kernel.org/patch/9536307/
> 
> Got it.
> 
> I couldn't get it to apply to 4.9, but it goes on 4.10-rc5 and the
> problem is fixed.

Sweet, can I add your tested-by?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-25 Thread Chris Murphy
On Tue, Jan 24, 2017 at 9:42 PM, Omar Sandoval  wrote:
> On Tue, Jan 24, 2017 at 07:53:06PM -0700, Chris Murphy wrote:
>> On Tue, Jan 24, 2017 at 3:50 PM, Omar Sandoval  wrote:
>>
>> > Got this to repro after installing systemd-container. It's happening on
>> > lsetxattr() to set the SELinux context on /var/lib/machines, which is a
>> > subvolume. Looking into it now. Thanks for all of the help, Chris.
>>
>> Aha! So the snapshot part was a goose chase, it has nothing to do with
>> that, really. Because I'm taking a snapshot of root, the nested
>> /var/lib/machines subvolume is not in that snapshot, so now it has to
>> be created by systemd at next boot and the proper selinux label set on
>> it.
>>
>> It means there's something different about subvolumes and directories
>> when it comes to xattrs, and the xattr patch I found in bisect is
>> exposing the difference, hence things getting tripped up.
>
> The snapshots were actually the key -- the error was because setxattr()
> was being allowed on the read-only directory created in the place of the
> /var/lib/machines subvolume. Not sure if you saw the patch I sent up,
> but this should fix it: https://patchwork.kernel.org/patch/9536307/

Got it.

I couldn't get it to apply to 4.9, but it goes on 4.10-rc5 and the
problem is fixed.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 07:53:06PM -0700, Chris Murphy wrote:
> On Tue, Jan 24, 2017 at 3:50 PM, Omar Sandoval  wrote:
> 
> > Got this to repro after installing systemd-container. It's happening on
> > lsetxattr() to set the SELinux context on /var/lib/machines, which is a
> > subvolume. Looking into it now. Thanks for all of the help, Chris.
> 
> Aha! So the snapshot part was a goose chase, it has nothing to do with
> that, really. Because I'm taking a snapshot of root, the nested
> /var/lib/machines subvolume is not in that snapshot, so now it has to
> be created by systemd at next boot and the proper selinux label set on
> it.
> 
> It means there's something different about subvolumes and directories
> when it comes to xattrs, and the xattr patch I found in bisect is
> exposing the difference, hence things getting tripped up.

The snapshots were actually the key -- the error was because setxattr()
was being allowed on the read-only directory created in the place of the
/var/lib/machines subvolume. Not sure if you saw the patch I sent up,
but this should fix it: https://patchwork.kernel.org/patch/9536307/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 3:50 PM, Omar Sandoval  wrote:

> Got this to repro after installing systemd-container. It's happening on
> lsetxattr() to set the SELinux context on /var/lib/machines, which is a
> subvolume. Looking into it now. Thanks for all of the help, Chris.

Aha! So the snapshot part was a goose chase, it has nothing to do with
that, really. Because I'm taking a snapshot of root, the nested
/var/lib/machines subvolume is not in that snapshot, so now it has to
be created by systemd at next boot and the proper selinux label set on
it.

It means there's something different about subvolumes and directories
when it comes to xattrs, and the xattr patch I found in bisect is
exposing the difference, hence things getting tripped up.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 01:48:12PM -0700, Chris Murphy wrote:
> journal_debug.log booted with patched kernel and params
> systemd.log_level=debug rd.debug, and then output to file with
> journalctl -b -o short-monotonic
> https://drive.google.com/open?id=0B_2Asp8DGjJ9MjRDWi0tZ0x4V2s
> 
> I'm uncertain of the immediacy of some event resulting in another but...
> 
> 
> [8.159581] f25h systemd-tmpfiles[720]: Running create action for
> entry d /var/run/pptp
> [8.159703] f25h systemd-tmpfiles[720]: Created directory "/var/run/pptp".
> [8.159825] f25h systemd-tmpfiles[720]: "/var/run/pptp" has right mode 
> 40750
> [8.159944] f25h systemd-tmpfiles[720]: Running remove action for
> entry d /var/run/pptp
> [8.160070] f25h systemd-tmpfiles[720]: Running create action for
> entry d /var/run/radvd
> [8.160186] f25h systemd-tmpfiles[720]: Created directory "/var/run/radvd".
> [8.160312] f25h systemd-tmpfiles[720]: "/var/run/radvd" has right mode 
> 40755
> [8.171770] f25h kernel: [ cut here ]
> [8.174028] f25h kernel: WARNING: CPU: 2 PID: 720 at
> fs/btrfs/delayed-inode.c:55
> btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
> [8.176316] f25h kernel: ino 2 is out of range
> 
> 
> Is it the creation of these directories triggering the error? Anyway,
> there's a bunch of systemd-tmpfiles activity prior to this, and right
> before the btrfs_get_or_create_delayed_node call trace.
> 
> 
> The final call trace is cut off, I guess when systemd switches from
> /run to /var and flushes, if the fs goes read only and can't write all
> of what's flushed, it results in journal data loss. If the rest of the
> output is useful I can switch systemd to only use volatile storage to
> avoid this problem.

Got this to repro after installing systemd-container. It's happening on
lsetxattr() to set the SELinux context on /var/lib/machines, which is a
subvolume. Looking into it now. Thanks for all of the help, Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
journal_debug.log booted with patched kernel and params
systemd.log_level=debug rd.debug, and then output to file with
journalctl -b -o short-monotonic
https://drive.google.com/open?id=0B_2Asp8DGjJ9MjRDWi0tZ0x4V2s

I'm uncertain of the immediacy of some event resulting in another but...


[8.159581] f25h systemd-tmpfiles[720]: Running create action for
entry d /var/run/pptp
[8.159703] f25h systemd-tmpfiles[720]: Created directory "/var/run/pptp".
[8.159825] f25h systemd-tmpfiles[720]: "/var/run/pptp" has right mode 40750
[8.159944] f25h systemd-tmpfiles[720]: Running remove action for
entry d /var/run/pptp
[8.160070] f25h systemd-tmpfiles[720]: Running create action for
entry d /var/run/radvd
[8.160186] f25h systemd-tmpfiles[720]: Created directory "/var/run/radvd".
[8.160312] f25h systemd-tmpfiles[720]: "/var/run/radvd" has right mode 40755
[8.171770] f25h kernel: [ cut here ]
[8.174028] f25h kernel: WARNING: CPU: 2 PID: 720 at
fs/btrfs/delayed-inode.c:55
btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
[8.176316] f25h kernel: ino 2 is out of range


Is it the creation of these directories triggering the error? Anyway,
there's a bunch of systemd-tmpfiles activity prior to this, and right
before the btrfs_get_or_create_delayed_node call trace.


The final call trace is cut off, I guess when systemd switches from
/run to /var and flushes, if the fs goes read only and can't write all
of what's flushed, it results in journal data loss. If the rest of the
output is useful I can switch systemd to only use volatile storage to
avoid this problem.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 1:27 PM, Omar Sandoval  wrote:
> On Tue, Jan 24, 2017 at 01:24:51PM -0700, Chris Murphy wrote:
>> On Tue, Jan 24, 2017 at 1:10 PM, Omar Sandoval  wrote:
>>
>> > Hm, still no luck, maybe it's a Server vs Workstation thing? I'll try
>> > installing Workstation. In the meantime, I noticed that in both of the
>> > traces, systemd-tmpfiles was the process that tripped the WARN_ONCE().
>> > Could you dump the contents of /{etc,run,usr/lib}/tmpfiles.d somewhere?
>>
>> This is just ls -lZ for those directories, not sure how else to dump them.
>>
>> https://drive.google.com/open?id=0B_2Asp8DGjJ9NFJUUUpuV2lxcG8
>
> Sorry, I mean the actual contents, could you just shove them in a
> tarball? New test kernel is building right now, hopefully that does
> it...

https://drive.google.com/open?id=0B_2Asp8DGjJ9MjVOT3NXeWh3eVU

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 01:24:51PM -0700, Chris Murphy wrote:
> On Tue, Jan 24, 2017 at 1:10 PM, Omar Sandoval  wrote:
> 
> > Hm, still no luck, maybe it's a Server vs Workstation thing? I'll try
> > installing Workstation. In the meantime, I noticed that in both of the
> > traces, systemd-tmpfiles was the process that tripped the WARN_ONCE().
> > Could you dump the contents of /{etc,run,usr/lib}/tmpfiles.d somewhere?
> 
> This is just ls -lZ for those directories, not sure how else to dump them.
> 
> https://drive.google.com/open?id=0B_2Asp8DGjJ9NFJUUUpuV2lxcG8

Sorry, I mean the actual contents, could you just shove them in a
tarball? New test kernel is building right now, hopefully that does
it...
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 1:10 PM, Omar Sandoval  wrote:

> Hm, still no luck, maybe it's a Server vs Workstation thing? I'll try
> installing Workstation. In the meantime, I noticed that in both of the
> traces, systemd-tmpfiles was the process that tripped the WARN_ONCE().
> Could you dump the contents of /{etc,run,usr/lib}/tmpfiles.d somewhere?

This is just ls -lZ for those directories, not sure how else to dump them.

https://drive.google.com/open?id=0B_2Asp8DGjJ9NFJUUUpuV2lxcG8


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 01:13:47PM -0700, Chris Murphy wrote:
> OK I've reproduced it in a virt-manager VM with Fedora Rawhide from a
> week old ISO, which is using btrfs-progs 4.9 and kernel 4.10-rc3.
> 
> Get this though. The problem doesn't happen with boot param selinux=0,
> either in the VM or one of the laptops. Whereas the problem happens
> still with enforcing=0. So this could be an selinux xattr that's
> involved.
> 
> Chris Murphy

That makes sense, I don't have SELinux compiled into my test kernel.
I'll fix that and try again.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
OK I've reproduced it in a virt-manager VM with Fedora Rawhide from a
week old ISO, which is using btrfs-progs 4.9 and kernel 4.10-rc3.

Get this though. The problem doesn't happen with boot param selinux=0,
either in the VM or one of the laptops. Whereas the problem happens
still with enforcing=0. So this could be an selinux xattr that's
involved.

Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 12:19:29PM -0700, Chris Murphy wrote:
> I can reproduce it on another laptop with Fedora 25. I haven't tried
> to reproduce it in a VM. In this case, it's a single partition Btrfs
> volume, completely stock, and is about 5 weeks old, no crashes or
> forced shutdowns. The HP uses a Samsung NVMe SSD, whereas on this
> Macbook Pro it's a Samsung SATA SSD.
> 
> dmesg
> https://drive.google.com/open?id=0B_2Asp8DGjJ9cjhSRUJxc1k3NVE
> 
> 
> Chris Murphy

Hm, still no luck, maybe it's a Server vs Workstation thing? I'll try
installing Workstation. In the meantime, I noticed that in both of the
traces, systemd-tmpfiles was the process that tripped the WARN_ONCE().
Could you dump the contents of /{etc,run,usr/lib}/tmpfiles.d somewhere?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
I can reproduce it on another laptop with Fedora 25. I haven't tried
to reproduce it in a VM. In this case, it's a single partition Btrfs
volume, completely stock, and is about 5 weeks old, no crashes or
forced shutdowns. The HP uses a Samsung NVMe SSD, whereas on this
Macbook Pro it's a Samsung SATA SSD.

dmesg
https://drive.google.com/open?id=0B_2Asp8DGjJ9cjhSRUJxc1k3NVE


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 11:56 AM, Omar Sandoval  wrote:
> Yup, definitely doesn't look like memory corruption. I set up a Fedora
> VM yesterday to try to repro with basically those same steps but it
> didn't happen. I'll try again, but is there anything special about your
> Fedora installation?

Default mkfs. Default mount options.

However, due to subsequent suboptimal situation (installing Windows 10
after Fedora), this Btrfs volume is actually a two device volume: two
partitions with Windows 10 in between them.

[chris@f25h ~]$ sudo btrfs fi show
Label: 'fedora'  uuid: c45caf39-a048-4c44-90c9-535dc8003c71
Total devices 2 FS bytes used 51.37GiB
devid1 size 25.00GiB used 21.03GiB path /dev/nvme0n1p4
devid2 size 48.83GiB used 43.00GiB path /dev/nvme0n1p6

[chris@f25h ~]$ sudo gdisk -l /dev/nvme0n1
[...snip...]
Number  Start (sector)End (sector)  Size   Code  Name
   12048  411647   200.0 MiB   EF00  EFI System Partition
   2  411648 2508799   1024.0 MiB  8300
   3 250880016873471   6.8 GiB 8200
   41687347269302271   25.0 GiB8300  Linux filesystem
   569302272   229046271   76.2 GiB0700  Microsoft basic data
   6   229046272   331446271   48.8 GiB8300  Linux filesystem
   7   331446272   500118158   80.4 GiB8E00  Linux LVM

p4 was made to small when adding in Windows; so I shrank Windows to
make p6, and then added p6 to p4. Hence p4 and p6 are the same Btrfs
volume (single profile for metadata and data).


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
OK I just did a 2nd boot - all the same everything as the previous
dmesg.log (patch kernel snapshot). Identical. But not identical
results: timing wise the problem happens much sooner, at 14s the fs
goes read only instead of 30+ seconds. And I also get this line:

[   14.039931] systemd-journald[488]: Failed to set ACL on
/var/log/journal/864b0a611b104692b266377c7d4c7a39/user-1000.journal,
ignoring: Operation not permitted

entire dmesg for this boot:
https://drive.google.com/open?id=0B_2Asp8DGjJ9LTRzZjA2SThtbkE

Did it fail to set the ACL because the file system is readonly? Or was
trying to set the ACL what triggers the problem - then it goes
readonly and then we see the the complaint that the ACL failed to be
set? There's close proximity timing wise to the first Btrfs error
messages, with the most recent patch, and systemd audit messages.
Systemd-journald is switching from /run to /var, flushing the journal
to disk within about 2 seconds of the first Btrfs error. And systemd
does chattr +C on its logs by default now (and I think it's not user
changeable behavior so I can't test if it's related).

Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Tue, Jan 24, 2017 at 11:37:43AM -0700, Chris Murphy wrote:
> On Tue, Jan 24, 2017 at 10:49 AM, Omar Sandoval  wrote:
> > On Mon, Jan 23, 2017 at 08:51:24PM -0700, Chris Murphy wrote:
> >> On Mon, Jan 23, 2017 at 5:05 PM, Omar Sandoval  wrote:
> >> > Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
> >> > That's probably us failing btrfs_lookup_inode(), but just to make sure,
> >> > could you apply the updated diff at the same link as before
> >> > (https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
> >> > that's the case, I'm even more confused about what xattrs have to do
> >> > with it.
> >>
> >> [   35.015363] __btrfs_update_delayed_inode(): inode is missing
> >
> > Okay, like I expected...
> >
> >> [   35.015372] btrfs_update_delayed_inode(ino=2) -> -2
> >
> > Wtf? Inode numbers should be >=256. I updated the diff a third time to
> > catch where that came from. If we're lucky, the backtrace should have
> > the exact culprit. If we're unlucky, there might be memory corruption
> > involved.
> 
> Now two traces. This one is new, and follows a bunch of xattr related stuff...
> 
> [6.861504] WARNING: CPU: 3 PID: 690 at fs/btrfs/delayed-inode.c:55
> btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
> [6.862833] ino 2 is out of range
> 
> Then this:
> [7.016061] __btrfs_update_delayed_inode(): inode is missing
> [7.017149] btrfs_update_delayed_inode() failed
> [7.018233] __btrfs_commit_inode_delayed_items(ino=2, flags=3) -> -2
> 
> And finally what we've already seen:
> [   34.930890] WARNING: CPU: 0 PID: 396 at
> fs/btrfs/delayed-inode.c:1194 __btrfs_run_delayed_items+0x1d0/0x670
> [btrfs]
> 
> Complete dmesg osandov-9f223b-3_dmesg.log
> https://drive.google.com/open?id=0B_2Asp8DGjJ9bnpNamIydklraTQ
> 

Aha, so it is xattrs! Here's the full warning trace:

[6.860185] [ cut here ]
[6.861504] WARNING: CPU: 3 PID: 690 at fs/btrfs/delayed-inode.c:55 
btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
[6.862833] ino 2 is out of range
[6.862842] Modules linked in:
[6.864213]  xfs libcrc32c arc4 iwlmvm intel_rapl x86_pkg_temp_thermal 
intel_powerclamp coretemp mac80211 snd_soc_skl kvm_intel snd_soc_skl_ipc kvm 
snd_hda_codec_hdmi snd_soc_sst_ipc irqbypass snd_soc_sst_dsp crct10dif_pclmul 
iTCO_wdt crc32_pclmul snd_hda_codec_conexant snd_hda_ext_core 
snd_hda_codec_generic ghash_clmulni_intel iTCO_vendor_support snd_soc_sst_match 
intel_cstate snd_soc_core iwlwifi i2c_designware_platform i2c_designware_core 
hp_wmi sparse_keymap snd_hda_intel intel_uncore snd_hda_codec cfg80211 
snd_hwdep snd_hda_core snd_seq snd_seq_device uvcvideo intel_rapl_perf snd_pcm 
videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core joydev 
videodev idma64 snd_timer btusb hci_uart i2c_i801 snd i2c_smbus media btrtl 
btbcm soundcore btqca btintel mei_me mei bluetooth shpchp 
processor_thermal_device
[6.869661]  intel_pch_thermal intel_lpss_pci intel_soc_dts_iosf ucsi wmi 
hp_accel pinctrl_sunrisepoint lis3lv02d pinctrl_intel int3403_thermal rfkill 
input_polldev hp_wireless intel_lpss_acpi int340x_thermal_zone nfsd 
int3400_thermal intel_lpss tpm_crb acpi_thermal_rel acpi_pad tpm_tis 
tpm_tis_core tpm auth_rpcgss nfs_acl lockd grace sunrpc btrfs i915 xor raid6_pq 
i2c_algo_bit drm_kms_helper drm crc32c_intel nvme serio_raw nvme_core i2c_hid 
video fjes
[6.874780] CPU: 3 PID: 690 Comm: systemd-tmpfile Not tainted 4.9.0+ #2
[6.876294] Hardware name: HP HP Spectre Notebook/81A0, BIOS F.30 12/15/2016
[6.877820]  9bc341187a78 923ed9ed 9bc341187ac8 

[6.879316]  9bc341187ab8 920a1d9b 0037921cafcb 
0002
[6.880836]  8c4126d62000 8c413170b0b0 ff02 
8c4129a8f300
[6.882364] Call Trace:
[6.883861]  [] dump_stack+0x63/0x86
[6.885355]  [] __warn+0xcb/0xf0
[6.886888]  [] warn_slowpath_fmt+0x5f/0x80
[6.888415]  [] ? 
btrfs_get_or_create_delayed_node+0x126/0x1e0 [btrfs]
[6.889979]  [] 
btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
[6.891498]  [] btrfs_delayed_update_inode+0x27/0x420 
[btrfs]
[6.893023]  [] ? current_fs_time+0x23/0x30
[6.894602]  [] btrfs_update_inode+0x8d/0x100 [btrfs]
[6.896122]  [] ? current_time+0x36/0x70
[6.897681]  [] __btrfs_setxattr+0xe3/0x120 [btrfs]
[6.899212]  [] btrfs_xattr_handler_set+0x36/0x40 [btrfs]
[6.900690]  [] __vfs_setxattr+0x6b/0x90
[6.902182]  [] __vfs_setxattr_noperm+0x72/0x1b0
[6.903622]  [] vfs_setxattr+0xa7/0xb0
[6.905078]  [] setxattr+0x160/0x180
[6.906515]  [] ? __check_object_size+0xff/0x1d6
[6.907894]  [] ? strncpy_from_user+0x4d/0x170
[6.909231]  [] ? getname_flags+0x6f/0x1f0
[6.910590]  [] path_setxattr+0xb3/0xe0
[6.911913]  [] SyS_lsetxattr+0x11/0x20
[6.913211]  [] do_syscall_64+0x67/0x180
[6.914557]  [] entry_SYSCALL64_slow_path+0x25/0x25
[ 

Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Chris Murphy
On Tue, Jan 24, 2017 at 10:49 AM, Omar Sandoval  wrote:
> On Mon, Jan 23, 2017 at 08:51:24PM -0700, Chris Murphy wrote:
>> On Mon, Jan 23, 2017 at 5:05 PM, Omar Sandoval  wrote:
>> > Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
>> > That's probably us failing btrfs_lookup_inode(), but just to make sure,
>> > could you apply the updated diff at the same link as before
>> > (https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
>> > that's the case, I'm even more confused about what xattrs have to do
>> > with it.
>>
>> [   35.015363] __btrfs_update_delayed_inode(): inode is missing
>
> Okay, like I expected...
>
>> [   35.015372] btrfs_update_delayed_inode(ino=2) -> -2
>
> Wtf? Inode numbers should be >=256. I updated the diff a third time to
> catch where that came from. If we're lucky, the backtrace should have
> the exact culprit. If we're unlucky, there might be memory corruption
> involved.

Now two traces. This one is new, and follows a bunch of xattr related stuff...

[6.861504] WARNING: CPU: 3 PID: 690 at fs/btrfs/delayed-inode.c:55
btrfs_get_or_create_delayed_node+0x16a/0x1e0 [btrfs]
[6.862833] ino 2 is out of range

Then this:
[7.016061] __btrfs_update_delayed_inode(): inode is missing
[7.017149] btrfs_update_delayed_inode() failed
[7.018233] __btrfs_commit_inode_delayed_items(ino=2, flags=3) -> -2

And finally what we've already seen:
[   34.930890] WARNING: CPU: 0 PID: 396 at
fs/btrfs/delayed-inode.c:1194 __btrfs_run_delayed_items+0x1d0/0x670
[btrfs]

Complete dmesg osandov-9f223b-3_dmesg.log
https://drive.google.com/open?id=0B_2Asp8DGjJ9bnpNamIydklraTQ

Also, to do these tests, I'm making a new rw snapshot each time so
that the new kernel modules are in the snapshot. e.g.

1. subvolumes 'home' and 'root' are originally created with 'btrfs sub
create' and then filled, and these work OK with all kernels.
2. build kernel with patch
3. 'btrfs sub snap root root.test8'  and also 'btrfs sub snap home home.test8'
4. sudo vi root.test8/etc/fstab to update the entry for / so that
subvol=root is now subvol=root.test8, and also update for /home
5. sudo vi /boot/efi/EFI/fedora/grub.cfg to update the command line,
rootflags=subvol=root becomes rootflags=subvol=root.test8

So the fact the kernel works on subvolume root, but consistently does
not work on each brand new snapshot, is suspiciously unlike what I'd
expect for memory corruption; unless the memory corruption has already
"tainted" the file system in a way that neither btrfs check or scrub
can find; and this "taintedness" of the file system doesn't manifest
until there's a snapshot being used and with a particular kernel with
the xattr patch?

Pretty weird.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-24 Thread Omar Sandoval
On Mon, Jan 23, 2017 at 08:51:24PM -0700, Chris Murphy wrote:
> On Mon, Jan 23, 2017 at 5:05 PM, Omar Sandoval  wrote:
> > Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
> > That's probably us failing btrfs_lookup_inode(), but just to make sure,
> > could you apply the updated diff at the same link as before
> > (https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
> > that's the case, I'm even more confused about what xattrs have to do
> > with it.
> 
> [   35.015363] __btrfs_update_delayed_inode(): inode is missing

Okay, like I expected...

> [   35.015372] btrfs_update_delayed_inode(ino=2) -> -2

Wtf? Inode numbers should be >=256. I updated the diff a third time to
catch where that came from. If we're lucky, the backtrace should have
the exact culprit. If we're unlucky, there might be memory corruption
involved.

> osandov-9f223b_2-dmesg.log
> https://drive.google.com/open?id=0B_2Asp8DGjJ9UnNSRXpualprWHM
> 
> 
> -- 
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Chris Murphy
On Mon, Jan 23, 2017 at 5:05 PM, Omar Sandoval  wrote:
> Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
> That's probably us failing btrfs_lookup_inode(), but just to make sure,
> could you apply the updated diff at the same link as before
> (https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
> that's the case, I'm even more confused about what xattrs have to do
> with it.

[   35.015363] __btrfs_update_delayed_inode(): inode is missing
[   35.015372] btrfs_update_delayed_inode(ino=2) -> -2


osandov-9f223b_2-dmesg.log
https://drive.google.com/open?id=0B_2Asp8DGjJ9UnNSRXpualprWHM


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Omar Sandoval
On Mon, Jan 23, 2017 at 04:48:54PM -0700, Chris Murphy wrote:
> On Mon, Jan 23, 2017 at 3:04 PM, Omar Sandoval  wrote:
> > On Mon, Jan 23, 2017 at 02:55:21PM -0700, Chris Murphy wrote:
> >> On Mon, Jan 23, 2017 at 2:50 PM, Chris Murphy
> >> > I haven't found the commit for that patch, so maybe it's something
> >> > with the combination of that patch and the previous commit.
> >>
> >> I think that's provably not the case based on the bisect log, because
> >> I hit the problem with kernel that has only the commit, as well as the
> >> commit plus the updated patch. So the patch neither causes nor fixes
> >> the problem I'm experiencing.
> >>
> >> If it's useful the btrfs-image is here; mentioned in a previous
> >> thread, this image mounts find, btrfs check --mode=original has no
> >> complaints, but btrfs check --mode=lowmem has complaints. There's no
> >> problem using the parent subvolume as rootfs. Only snapshots of that
> >> subvolume result in the problem.
> >> https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA
> >
> > What I meant to ask was if there were false positives/false negatives in
> > booting from the subvolume that would lead you to doubt the results of
> > the git bisect, but it sounds like it's 100% reproducible for you?
> 
> OH I see. Yes. It happens within 60 seconds, during startup or shortly
> after gnome-shell login. So it's really clear that it definitely
> happens or does not happen. But it requires two things to be true:
> kernel version 4.9 or higher *and* a normal rw snapshot is used as
> rootfs. If either of those things is not true, then the problem
> doesn't happen.
> 
> I've also tried building a 4.9 kernel with CONFIG_BTRFS_DEBUG and
> CONFIG_BTRFS_FS_CHECK_INTEGRITY with check_int, but the results are
> the same - no additional debug info shown by dmesg.
> 
> 
> > I'll take a look at the image. In the meantime, could you try booting
> > with https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4
> > applied on top of 4.9 so we can hopefully narrow it down? It'd also be
> > great to know if it always fails the same way or if it varies.
> 
> Appears to always fail the same way.
> 
> [chris@f25h ~]$ dmesg | grep -i btrfs
> [2.705333] Btrfs loaded, crc32c=crc32c-intel
> [2.705905] BTRFS: device label fedora devid 1 transid 113458 
> /dev/nvme0n1p4
> [2.764563] BTRFS: device label fedora devid 2 transid 113458 
> /dev/nvme0n1p6
> [3.990957] BTRFS info (device nvme0n1p6): disk space caching is enabled
> [3.990988] BTRFS info (device nvme0n1p6): has skinny extents
> [4.010618] BTRFS info (device nvme0n1p6): detected SSD devices,
> enabling SSD mode
> [4.551046] BTRFS info (device nvme0n1p6): disk space caching is enabled
> [   13.906182] btrfs_update_delayed_inode() -> -2
> [   13.906261] WARNING: CPU: 0 PID: 488 at
> fs/btrfs/delayed-inode.c:1179 __btrfs_run_delayed_items+0x1b7/0x660
> [btrfs]
> [   13.906266] BTRFS: Transaction aborted (error -2)
> [   13.906460]  tpm_tis acpi_thermal_rel lis3lv02d tpm_tis_core
> input_polldev acpi_pad wmi nfs_acl hp_wireless tpm lockd grace sunrpc
> btrfs i915 xor raid6_pq i2c_algo_bit drm_kms_helper drm crc32c_intel
> nvme serio_raw nvme_core i2c_hid video fjes
> [   13.906635]  [] ?
> __btrfs_release_delayed_node+0x70/0x1c0 [btrfs]
> [   13.906690]  []
> __btrfs_run_delayed_items+0x1b7/0x660 [btrfs]
> [   13.906743]  [] btrfs_run_delayed_items+0x13/0x20 [btrfs]
> [   13.906793]  []
> btrfs_commit_transaction+0x23a/0xa20 [btrfs]
> [   13.906853]  [] ?
> btrfs_wait_ordered_range+0x7c/0x100 [btrfs]
> [   13.906910]  [] btrfs_sync_file+0x2fb/0x3e0 [btrfs]
> [   13.906970] BTRFS: error (device nvme0n1p6) in
> __btrfs_run_delayed_items:1179: errno=-2 No such entry
> [   13.906976] BTRFS info (device nvme0n1p6): forced readonly
> [   13.906982] BTRFS warning (device nvme0n1p6): Skipping commit of
> aborted transaction.
> [   13.906989] BTRFS: error (device nvme0n1p6) in
> cleanup_transaction:1850: errno=-2 No such entry
> [   13.907943] BTRFS info (device nvme0n1p6): delayed_refs has NO entry
> 
> Complete dmesg tags/v4.9 + osandov/9f223b
> https://drive.google.com/open?id=0B_2Asp8DGjJ9N1g5Wm9lVHpGWG8

Thanks! Hmm, okay, so it's coming from btrfs_update_delayed_inode()...
That's probably us failing btrfs_lookup_inode(), but just to make sure,
could you apply the updated diff at the same link as before
(https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4)? If
that's the case, I'm even more confused about what xattrs have to do
with it.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Chris Murphy
On Mon, Jan 23, 2017 at 3:04 PM, Omar Sandoval  wrote:
> On Mon, Jan 23, 2017 at 02:55:21PM -0700, Chris Murphy wrote:
>> On Mon, Jan 23, 2017 at 2:50 PM, Chris Murphy
>> > I haven't found the commit for that patch, so maybe it's something
>> > with the combination of that patch and the previous commit.
>>
>> I think that's provably not the case based on the bisect log, because
>> I hit the problem with kernel that has only the commit, as well as the
>> commit plus the updated patch. So the patch neither causes nor fixes
>> the problem I'm experiencing.
>>
>> If it's useful the btrfs-image is here; mentioned in a previous
>> thread, this image mounts find, btrfs check --mode=original has no
>> complaints, but btrfs check --mode=lowmem has complaints. There's no
>> problem using the parent subvolume as rootfs. Only snapshots of that
>> subvolume result in the problem.
>> https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA
>
> What I meant to ask was if there were false positives/false negatives in
> booting from the subvolume that would lead you to doubt the results of
> the git bisect, but it sounds like it's 100% reproducible for you?

OH I see. Yes. It happens within 60 seconds, during startup or shortly
after gnome-shell login. So it's really clear that it definitely
happens or does not happen. But it requires two things to be true:
kernel version 4.9 or higher *and* a normal rw snapshot is used as
rootfs. If either of those things is not true, then the problem
doesn't happen.

I've also tried building a 4.9 kernel with CONFIG_BTRFS_DEBUG and
CONFIG_BTRFS_FS_CHECK_INTEGRITY with check_int, but the results are
the same - no additional debug info shown by dmesg.


> I'll take a look at the image. In the meantime, could you try booting
> with https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4
> applied on top of 4.9 so we can hopefully narrow it down? It'd also be
> great to know if it always fails the same way or if it varies.

Appears to always fail the same way.

[chris@f25h ~]$ dmesg | grep -i btrfs
[2.705333] Btrfs loaded, crc32c=crc32c-intel
[2.705905] BTRFS: device label fedora devid 1 transid 113458 /dev/nvme0n1p4
[2.764563] BTRFS: device label fedora devid 2 transid 113458 /dev/nvme0n1p6
[3.990957] BTRFS info (device nvme0n1p6): disk space caching is enabled
[3.990988] BTRFS info (device nvme0n1p6): has skinny extents
[4.010618] BTRFS info (device nvme0n1p6): detected SSD devices,
enabling SSD mode
[4.551046] BTRFS info (device nvme0n1p6): disk space caching is enabled
[   13.906182] btrfs_update_delayed_inode() -> -2
[   13.906261] WARNING: CPU: 0 PID: 488 at
fs/btrfs/delayed-inode.c:1179 __btrfs_run_delayed_items+0x1b7/0x660
[btrfs]
[   13.906266] BTRFS: Transaction aborted (error -2)
[   13.906460]  tpm_tis acpi_thermal_rel lis3lv02d tpm_tis_core
input_polldev acpi_pad wmi nfs_acl hp_wireless tpm lockd grace sunrpc
btrfs i915 xor raid6_pq i2c_algo_bit drm_kms_helper drm crc32c_intel
nvme serio_raw nvme_core i2c_hid video fjes
[   13.906635]  [] ?
__btrfs_release_delayed_node+0x70/0x1c0 [btrfs]
[   13.906690]  []
__btrfs_run_delayed_items+0x1b7/0x660 [btrfs]
[   13.906743]  [] btrfs_run_delayed_items+0x13/0x20 [btrfs]
[   13.906793]  []
btrfs_commit_transaction+0x23a/0xa20 [btrfs]
[   13.906853]  [] ?
btrfs_wait_ordered_range+0x7c/0x100 [btrfs]
[   13.906910]  [] btrfs_sync_file+0x2fb/0x3e0 [btrfs]
[   13.906970] BTRFS: error (device nvme0n1p6) in
__btrfs_run_delayed_items:1179: errno=-2 No such entry
[   13.906976] BTRFS info (device nvme0n1p6): forced readonly
[   13.906982] BTRFS warning (device nvme0n1p6): Skipping commit of
aborted transaction.
[   13.906989] BTRFS: error (device nvme0n1p6) in
cleanup_transaction:1850: errno=-2 No such entry
[   13.907943] BTRFS info (device nvme0n1p6): delayed_refs has NO entry

Complete dmesg tags/v4.9 + osandov/9f223b
https://drive.google.com/open?id=0B_2Asp8DGjJ9N1g5Wm9lVHpGWG8



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Omar Sandoval
On Mon, Jan 23, 2017 at 02:55:21PM -0700, Chris Murphy wrote:
> On Mon, Jan 23, 2017 at 2:50 PM, Chris Murphy
> > I haven't found the commit for that patch, so maybe it's something
> > with the combination of that patch and the previous commit.
> 
> I think that's provably not the case based on the bisect log, because
> I hit the problem with kernel that has only the commit, as well as the
> commit plus the updated patch. So the patch neither causes nor fixes
> the problem I'm experiencing.
> 
> If it's useful the btrfs-image is here; mentioned in a previous
> thread, this image mounts find, btrfs check --mode=original has no
> complaints, but btrfs check --mode=lowmem has complaints. There's no
> problem using the parent subvolume as rootfs. Only snapshots of that
> subvolume result in the problem.
> https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA

What I meant to ask was if there were false positives/false negatives in
booting from the subvolume that would lead you to doubt the results of
the git bisect, but it sounds like it's 100% reproducible for you?

I'll take a look at the image. In the meantime, could you try booting
with https://gist.github.com/osandov/9f223bda27f3e1cd1ab9c1bd634c51a4
applied on top of 4.9 so we can hopefully narrow it down? It'd also be
great to know if it always fails the same way or if it varies.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Chris Murphy
On Mon, Jan 23, 2017 at 2:50 PM, Chris Murphy
> I haven't found the commit for that patch, so maybe it's something
> with the combination of that patch and the previous commit.

I think that's provably not the case based on the bisect log, because
I hit the problem with kernel that has only the commit, as well as the
commit plus the updated patch. So the patch neither causes nor fixes
the problem I'm experiencing.

If it's useful the btrfs-image is here; mentioned in a previous
thread, this image mounts find, btrfs check --mode=original has no
complaints, but btrfs check --mode=lowmem has complaints. There's no
problem using the parent subvolume as rootfs. Only snapshots of that
subvolume result in the problem.
https://drive.google.com/open?id=0B_2Asp8DGjJ9ZmNxdEw1RDBPcTA




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Chris Murphy
On Mon, Jan 23, 2017 at 2:31 PM, Omar Sandoval  wrote:
> On Wed, Jan 18, 2017 at 02:27:13PM -0700, Chris Murphy wrote:
>> On Wed, Jan 11, 2017 at 4:13 PM, Chris Murphy  
>> wrote:
>> > Looks like there's some sort of xattr and Btrfs interaction happening
>> > here; but as it only happens with some subvolumes/snapshots not all
>> > (but 100% consistent) maybe the kernel version at the time the
>> > snapshot was taken is a factor?
>>
>> The kernel version at the time the snapshot is taken is not a factor.
>> I've taken a snapshot of a working subvolume, and booting the snapshot
>> fails during startup with the fs forced readonly with kernel 4.9 and
>> higher; the problem doesn't happen with kernel 4.8.17 and lower.
>>
>> As a further test I tried:
>>
>>
>> git checkout tags/v4.9
>> git revert 6c6ef9f26e598fb977f60935e109cd5b266c941a
>>
>> But I get a failure during compile:
>>
>> scripts/Makefile.build:293: recipe for target 'fs/xattr.o' failed
>> make[1]: *** [fs/xattr.o] Error 1
>> Makefile:988: recipe for target 'fs' failed
>> make: *** [fs] Error 2
>>
>> Anyway, the inability to boot snapshots means bootable rollbacks are
>> broken. I think this is a serious regression, what's the next step in
>> figuring out what's going on?
>
> Hm, so you're 100% sure that this exact commit caused the regression?
> I've stared at it for a little while and am not seeing anything obvious.

This is the git bisect log:
https://bugzilla.kernel.org/attachment.cgi?id=251271

I don't know how to evaluate your question. If there's a test that
helps answer the question, let me know. Searching for this commit
brings up an lkml thread:

https://lkml.org/lkml/2016/11/3/268

I haven't found the commit for that patch, so maybe it's something
with the combination of that patch and the previous commit. But I see
it's applied in at least 4.10-rc4 and this forced readonly event
happens with 4.10-rc4.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-23 Thread Omar Sandoval
On Wed, Jan 18, 2017 at 02:27:13PM -0700, Chris Murphy wrote:
> On Wed, Jan 11, 2017 at 4:13 PM, Chris Murphy  wrote:
> > Looks like there's some sort of xattr and Btrfs interaction happening
> > here; but as it only happens with some subvolumes/snapshots not all
> > (but 100% consistent) maybe the kernel version at the time the
> > snapshot was taken is a factor?
> 
> The kernel version at the time the snapshot is taken is not a factor.
> I've taken a snapshot of a working subvolume, and booting the snapshot
> fails during startup with the fs forced readonly with kernel 4.9 and
> higher; the problem doesn't happen with kernel 4.8.17 and lower.
> 
> As a further test I tried:
> 
> 
> git checkout tags/v4.9
> git revert 6c6ef9f26e598fb977f60935e109cd5b266c941a
> 
> But I get a failure during compile:
> 
> scripts/Makefile.build:293: recipe for target 'fs/xattr.o' failed
> make[1]: *** [fs/xattr.o] Error 1
> Makefile:988: recipe for target 'fs' failed
> make: *** [fs] Error 2
> 
> Anyway, the inability to boot snapshots means bootable rollbacks are
> broken. I think this is a serious regression, what's the next step in
> figuring out what's going on?

Hm, so you're 100% sure that this exact commit caused the regression?
I've stared at it for a little while and am not seeing anything obvious.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-19 Thread Imran Geriskovan
I don't know if it is btrfs related but I'm getting
hard freezes on 4.8.17.

So I went back to 4.8.14 (with identical .config file).
It is one of my kernels which is known to be trouble
free for a long time.

Since they are hard lock up for real, I can't provide
anything.. Does anyone experience anything like that?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-18 Thread Chris Murphy
On Wed, Jan 11, 2017 at 4:13 PM, Chris Murphy  wrote:
> Looks like there's some sort of xattr and Btrfs interaction happening
> here; but as it only happens with some subvolumes/snapshots not all
> (but 100% consistent) maybe the kernel version at the time the
> snapshot was taken is a factor?

The kernel version at the time the snapshot is taken is not a factor.
I've taken a snapshot of a working subvolume, and booting the snapshot
fails during startup with the fs forced readonly with kernel 4.9 and
higher; the problem doesn't happen with kernel 4.8.17 and lower.

As a further test I tried:


git checkout tags/v4.9
git revert 6c6ef9f26e598fb977f60935e109cd5b266c941a

But I get a failure during compile:

scripts/Makefile.build:293: recipe for target 'fs/xattr.o' failed
make[1]: *** [fs/xattr.o] Error 1
Makefile:988: recipe for target 'fs' failed
make: *** [fs] Error 2

Anyway, the inability to boot snapshots means bootable rollbacks are
broken. I think this is a serious regression, what's the next step in
figuring out what's going on?


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-11 Thread Chris Murphy
Looks like there's some sort of xattr and Btrfs interaction happening
here; but as it only happens with some subvolumes/snapshots not all
(but 100% consistent) maybe the kernel version at the time the
snapshot was taken is a factor? Anyway git bisect says

# first bad commit: [6c6ef9f26e598fb977f60935e109cd5b266c941a] xattr:
Stop calling {get,set,remove}xattr inode operations

btrfs-image produces a 159M file.

I've updated the bug report
https://bugzilla.kernel.org/show_bug.cgi?id=191761 and also adding
patch author to cc.


Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-10 Thread Chris Murphy
On Mon, Jan 2, 2017 at 11:50 AM, Chris Murphy  wrote:
> Attempt 2. The original post isn't showing up on spinics, just my followup.
>
>
> The Problem: The file system goes read-only soon after startup. It
> only happens with a particular subvolume used for root fs, and only
> with kernel 4.9.0 and 4.10-rc1.
>
> If I boot kernel 4.8.15 with this subvolume as root fs, the problem
> doesn't happen.
> If I boot kernel 4.9.0 with a different subvolume as root fs, the
> problem doesn't happen.
>
> btrfs-progs 4.8.5, btrfs check reports no problems
>
> Complete dmesg:
> https://drive.google.com/open?id=0B_2Asp8DGjJ9SXZhWXdWSGVRbE0
>
>
> And this is a snippet of the call trace.
> [   34.787957] [ cut here ]
> [   34.788019] WARNING: CPU: 1 PID: 398 at
> fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items+0x5f3/0x600
> [btrfs]
> [   34.788029] BTRFS: Transaction aborted (error -2)
> [   34.788031] Modules linked in:
> [...snipped...]
> [   34.788272] CPU: 1 PID: 398 Comm: btrfs-transacti Not tainted
> 4.9.0-1.fc26.x86_64 #1
> [   34.788276] Hardware name: HP HP Spectre Notebook/81A0, BIOS F.30 
> 12/15/2016
> [   34.788281]  9bb10134fca0 ab3ecfcd 9bb10134fcf0
> 
> [   34.788288]  9bb10134fce0 ab0a2fbb 04920002
> 8bed30862800
> [   34.788294]  fffe  8bed28194be0
> 8bed30ede1c0
> [   34.788301] Call Trace:
> [   34.788312]  [] dump_stack+0x63/0x86
> [   34.788318]  [] __warn+0xcb/0xf0
> [   34.788323]  [] warn_slowpath_fmt+0x5f/0x80
> [   34.788386]  [] ?
> __btrfs_release_delayed_node+0x70/0x1c0 [btrfs]
> [   34.788429]  []
> __btrfs_run_delayed_items+0x5f3/0x600 [btrfs]
> [   34.788436]  [] ? kmem_cache_free+0x1c0/0x1f0
> [   34.788473]  [] btrfs_run_delayed_items+0x13/0x20 [btrfs]
> [   34.788511]  []
> btrfs_commit_transaction+0x23a/0xa20 [btrfs]
> [   34.788549]  [] transaction_kthread+0x1ce/0x1f0 [btrfs]
> [   34.788583]  [] ?
> btrfs_cleanup_transaction+0x520/0x520 [btrfs]
> [   34.788590]  [] kthread+0xd9/0xf0
> [   34.788597]  [] ? kthread_park+0x60/0x60
> [   34.788603]  [] ret_from_fork+0x25/0x30
> [   34.788622] ---[ end trace d28d9fa7dece608b ]---
> [   34.788628] BTRFS: error (device nvme0n1p4) in
> __btrfs_run_delayed_items:1170: errno=-2 No such entry
> [   34.788632] BTRFS info (device nvme0n1p4): forced readonly
> [   34.788636] BTRFS warning (device nvme0n1p4): Skipping commit of
> aborted transaction.
> [   34.788640] BTRFS: error (device nvme0n1p4) in
> cleanup_transaction:1850: errno=-2 No such entry
> [   34.789000] BTRFS info (device nvme0n1p4): delayed_refs has NO entry



This still happens with 4.9.2 and 4.10-rc3. The file-system goes read
only within 30 seconds of fs mount.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


read-only fs, kernel 4.9.0, fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items,

2017-01-02 Thread Chris Murphy
Attempt 2. The original post isn't showing up on spinics, just my followup.


The Problem: The file system goes read-only soon after startup. It
only happens with a particular subvolume used for root fs, and only
with kernel 4.9.0 and 4.10-rc1.

If I boot kernel 4.8.15 with this subvolume as root fs, the problem
doesn't happen.
If I boot kernel 4.9.0 with a different subvolume as root fs, the
problem doesn't happen.

btrfs-progs 4.8.5, btrfs check reports no problems

Complete dmesg:
https://drive.google.com/open?id=0B_2Asp8DGjJ9SXZhWXdWSGVRbE0


And this is a snippet of the call trace.
[   34.787957] [ cut here ]
[   34.788019] WARNING: CPU: 1 PID: 398 at
fs/btrfs/delayed-inode.c:1170 __btrfs_run_delayed_items+0x5f3/0x600
[btrfs]
[   34.788029] BTRFS: Transaction aborted (error -2)
[   34.788031] Modules linked in:
[...snipped...]
[   34.788272] CPU: 1 PID: 398 Comm: btrfs-transacti Not tainted
4.9.0-1.fc26.x86_64 #1
[   34.788276] Hardware name: HP HP Spectre Notebook/81A0, BIOS F.30 12/15/2016
[   34.788281]  9bb10134fca0 ab3ecfcd 9bb10134fcf0

[   34.788288]  9bb10134fce0 ab0a2fbb 04920002
8bed30862800
[   34.788294]  fffe  8bed28194be0
8bed30ede1c0
[   34.788301] Call Trace:
[   34.788312]  [] dump_stack+0x63/0x86
[   34.788318]  [] __warn+0xcb/0xf0
[   34.788323]  [] warn_slowpath_fmt+0x5f/0x80
[   34.788386]  [] ?
__btrfs_release_delayed_node+0x70/0x1c0 [btrfs]
[   34.788429]  []
__btrfs_run_delayed_items+0x5f3/0x600 [btrfs]
[   34.788436]  [] ? kmem_cache_free+0x1c0/0x1f0
[   34.788473]  [] btrfs_run_delayed_items+0x13/0x20 [btrfs]
[   34.788511]  []
btrfs_commit_transaction+0x23a/0xa20 [btrfs]
[   34.788549]  [] transaction_kthread+0x1ce/0x1f0 [btrfs]
[   34.788583]  [] ?
btrfs_cleanup_transaction+0x520/0x520 [btrfs]
[   34.788590]  [] kthread+0xd9/0xf0
[   34.788597]  [] ? kthread_park+0x60/0x60
[   34.788603]  [] ret_from_fork+0x25/0x30
[   34.788622] ---[ end trace d28d9fa7dece608b ]---
[   34.788628] BTRFS: error (device nvme0n1p4) in
__btrfs_run_delayed_items:1170: errno=-2 No such entry
[   34.788632] BTRFS info (device nvme0n1p4): forced readonly
[   34.788636] BTRFS warning (device nvme0n1p4): Skipping commit of
aborted transaction.
[   34.788640] BTRFS: error (device nvme0n1p4) in
cleanup_transaction:1850: errno=-2 No such entry
[   34.789000] BTRFS info (device nvme0n1p4): delayed_refs has NO entry



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html